27.07.2013 Views

A Guide to the Language (PPL) P-STAT Programming - P-STAT, Inc.

A Guide to the Language (PPL) P-STAT Programming - P-STAT, Inc.

A Guide to the Language (PPL) P-STAT Programming - P-STAT, Inc.

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

P-<strong>STAT</strong><br />

A <strong>Guide</strong> <strong>to</strong> <strong>the</strong><br />

P-<strong>STAT</strong> <strong>Programming</strong><br />

<strong>Language</strong> (<strong>PPL</strong>)<br />

®<br />

$C.1<br />

P-<strong>STAT</strong><br />

®


P-<strong>STAT</strong>: A <strong>Guide</strong> <strong>to</strong> <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong> (<strong>PPL</strong>),<br />

Second Edition January 2013<br />

This publication corresponds <strong>to</strong> P-<strong>STAT</strong> Version 3, January 2013. This publication is designed for those<br />

already familiar with <strong>the</strong> P-<strong>STAT</strong> system, ei<strong>the</strong>r from <strong>the</strong> menu or <strong>the</strong> command language interface and is<br />

intended <strong>to</strong> be a complete description of <strong>the</strong> programming language.<br />

Please direct any questions <strong>to</strong>:<br />

P-<strong>STAT</strong>, <strong>Inc</strong>.<br />

230 Lambertville-Hopewell Rd.<br />

Hopewell, New Jersey 08525-2809<br />

U.S.A.<br />

Telephone: 609-466-9200<br />

Fax: 609-466-1688<br />

Internet: support@pstat.com<br />

Web Page URL: http://www.pstat.com<br />

All rights reserved. Except as permitted under <strong>the</strong> United States Copyright Act of 1976, no part of this<br />

publication may be reproduced or distributed in any form or by any means, electronic or mechanical,<br />

including pho<strong>to</strong>copying, recording, or any information s<strong>to</strong>rage and retrieval system without <strong>the</strong> prior written<br />

permission of P-<strong>STAT</strong>, <strong>Inc</strong>.<br />

P-<strong>STAT</strong> is a registered trademark of P-<strong>STAT</strong>, <strong>Inc</strong>. Windows is a registered trademark of MicroSoft Corp.<br />

Copyright © 1972-2013 P-<strong>STAT</strong>, <strong>Inc</strong>. Printed in <strong>the</strong> US. Published by P-<strong>STAT</strong>, <strong>Inc</strong>.


<strong>PPL</strong>: Introduction<br />

i<br />

CONTENTS<br />

THE VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2<br />

VECTORS AND ARRAYS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2<br />

THE COMMANDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.3<br />

P-<strong>STAT</strong> SYSTEM FILE: CURRENT OR PREVIOUS. . . . . . . . . . . . . . . . . . . . . . .1.3<br />

ORGANIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.4<br />

<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

CASE AND VARIABLE SELECTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1<br />

Case Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3<br />

Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3<br />

Using Ranges in Selection Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.4<br />

Multiple Variable Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.5<br />

Reordering Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.5<br />

Masks and Wildcards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.6<br />

MODIFYING AND GENERATING VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . .2.7<br />

Modifying Variables with SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.7<br />

Using INCREASE and DECREASE Instead of SET . . . . . . . . . . . . . . . . . . . . .2.8<br />

Creating New Variables with GENERATE. . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.9<br />

Numeric Opera<strong>to</strong>rs and <strong>the</strong>ir Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.9<br />

Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.10<br />

LOGICAL SELECTION OF CASES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.11<br />

Logical Opera<strong>to</strong>rs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.12<br />

The Special Opera<strong>to</strong>rs MISSING and GOOD. . . . . . . . . . . . . . . . . . . . . . . . . .2.12<br />

AND and OR Relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.13<br />

Common Errors in Complex Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.14<br />

AMONG and NOTAMONG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.15<br />

MISSING DATA with AMONG and NOTAMONG . . . . . . . . . . . . . . . . . . . .2.16<br />

INRANGE and OUTRANGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.16<br />

ANY and ALL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.16<br />

INSTRUCTIONS AFTER IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.17<br />

Conditional Case Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.18


Conditional Modification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.18<br />

Three-Way Logic of IF Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.18<br />

Renaming Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.19<br />

<strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

FILE MODIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1<br />

How Modifications Are Processed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1<br />

Temporary Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2<br />

Permanent Modifications and <strong>the</strong> MODIFY Command . . . . . . . . . . . . . . . . . . .3.2<br />

TEMPLATE Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3<br />

On-<strong>the</strong>-Fly Concatenation of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.4<br />

Repeating Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.6<br />

OTHER INSTRUCTIONS AFTER IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.7<br />

GOTO To Process Modifications Selectively . . . . . . . . . . . . . . . . . . . . . . . . . . .3.7<br />

Cleaning Data With PUT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.8<br />

Report Writing Using PUT and PUTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.9<br />

STANDALONE <strong>PPL</strong> COMMANDS AND PROCESS . . . . . . . . . . . . . . . . . . . . . .3.12<br />

Scratch Variables and Standalone <strong>PPL</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.12<br />

The PROCESS Command and More PUT Information . . . . . . . . . . . . . . . . . .3.13<br />

COMMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.14<br />

QUITTING A PROCESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.14<br />

<strong>PPL</strong>: NCOT and RECODE<br />

The NCOT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1<br />

The RECODE Function: Single Argument Usage . . . . . . . . . . . . . . . . . . . . . . .4.3<br />

COMPLEX RECODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6<br />

RECODE: The Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6<br />

The RECODE Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6<br />

Defining a Set of Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.8<br />

The Result Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.9<br />

RECODE or IF/SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.10<br />

RECODE Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.11<br />

XRECODE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.12<br />

<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

DO LOOPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1<br />

DO USING a Variable List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2<br />

DO Stepping Through a Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.4<br />

ii


DO Loops: O<strong>the</strong>r Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.6<br />

GENERATE AND RENAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.7<br />

Using GENERATE in DO Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.8<br />

Using RENAME in DO Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.9<br />

Masks for RENAME and GENERATE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.9<br />

IF-THEN-ELSE BLOCKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.14<br />

IF-THEN-ELSE: O<strong>the</strong>r Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.15<br />

IF-THEN-ELSE: Ano<strong>the</strong>r Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.18<br />

<strong>PPL</strong>: Functions and System Variables<br />

ONE-EXPRESSION FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.1<br />

Rounding Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2<br />

Floor and Ceiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2<br />

Exponential and Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.3<br />

The Fac<strong>to</strong>rial Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.3<br />

Creating Dummy Variables with <strong>the</strong> LOC Function . . . . . . . . . . . . . . . . . . . . . .6.3<br />

Creating a Single Variable from Dummy Variables . . . . . . . . . . . . . . . . . . . . . .6.5<br />

LIST FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.6<br />

Numeric List Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.6<br />

Character and Numeric List Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.7<br />

SPECIAL FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.7<br />

The LAG and DIF Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.8<br />

Modular (Remainder) Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.9<br />

Setting PLACES in Specific Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.10<br />

Extracting Digits Using NUMEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.10<br />

COMBINATIONS of N things, K at a time . . . . . . . . . . . . . . . . . . . . . . . . . . .6.11<br />

EXPAND ONE OR MORE VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.11<br />

Overall Syntax of a <strong>PPL</strong> EXPAND Statement . . . . . . . . . . . . . . . . . . . . . . . . .6.12<br />

Numeric Input Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.12<br />

Character Input Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.12<br />

The GENERATE or GEN phrase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.12<br />

Options With Several Input Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.13<br />

Options When <strong>the</strong> Input Variables Are Character . . . . . . . . . . . . . . . . . . . . . . .6.13<br />

SYSTEM VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.14<br />

Referencing Good and Missing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.14<br />

Selecting Variables with .NEW. and .OTHERS.. . . . . . . . . . . . . . . . . . . . . . . .6.15<br />

Referencing <strong>the</strong> Number of Variables in <strong>the</strong> File . . . . . . . . . . . . . . . . . . . . . . .6.15<br />

Referencing <strong>the</strong> Current Case Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.16<br />

Referencing Numeric and Character Variables . . . . . . . . . . . . . . . . . . . . . . . . .6.17<br />

iii


Accessing <strong>the</strong> PUT Counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.18<br />

File, Date, Page and Line References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.18<br />

Random Number and Distribution Functions<br />

RANDOM NUMBER FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.1<br />

Normal and Uniform Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.2<br />

Binary and User's Tabled Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.3<br />

DISTRIBUTION FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.3<br />

Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.4<br />

Inverse Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.5<br />

THE FUZZY EQUALS PROBLEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.6<br />

The Fuzzy Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.7<br />

Fuzzy Logical Opera<strong>to</strong>rs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.7<br />

How Fuzzy Opera<strong>to</strong>rs Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.8<br />

FUZZY Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.8<br />

<strong>PPL</strong>: Across-Case Modifications<br />

BASIC ACROSS-CASE AGGREGATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.2<br />

Accessing FIRST and LAST Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.2<br />

Scratch Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.3<br />

The Permanent Vec<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.5<br />

User-defined Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.7<br />

Interaction of FIRST, LAST and O<strong>the</strong>r <strong>PPL</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . .8.10<br />

Example: Checking a List of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.11<br />

Example: Selecting a Block of Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12<br />

THE SPLIT FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12<br />

Splitting a Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12<br />

CARRYing Identifying Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.13<br />

Selecting Variables To USE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.14<br />

Defining New Variables with CREATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.14<br />

Wildcard Notation and Masks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.15<br />

INDEXing Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.16<br />

Ordering Variables with STEP and CYCLE . . . . . . . . . . . . . . . . . . . . . . . . . . .8.17<br />

How SPLIT Interacts With O<strong>the</strong>r <strong>PPL</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.19<br />

THE COLLECT FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.19<br />

Collecting BY Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.20<br />

CARRYing Common Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.21<br />

Ordering Cases with INDEX and SORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.21<br />

COLLECT System Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.27<br />

iv


<strong>PPL</strong>: Modification of Character Variables<br />

BASIC CHARACTER PROCEDURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.1<br />

Generating New Character Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.1<br />

Modifying Existing Character Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.2<br />

Logical Selection of Character Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.2<br />

Locating Non-Missing Character Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.3<br />

CHARACTER OPERATORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.3<br />

The CONTAINS and XCONTAINS Opera<strong>to</strong>rs . . . . . . . . . . . . . . . . . . . . . . . . .9.4<br />

The Concatenate Opera<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.4<br />

The Trim Concatenate Opera<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.5<br />

Exactly Equal Opera<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.5<br />

CHARACTER FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.6<br />

Centering and Justifying Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.7<br />

Changing <strong>the</strong> Case of Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.7<br />

Length and Size of Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.8<br />

Locating Strings Within Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.8<br />

Extracting Substrings and Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.9<br />

Blanking Out and Changing Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.10<br />

Squeezing Out Specified Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.11<br />

Trimming Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.12<br />

Padding Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.12<br />

Converting Numbers <strong>to</strong> Characters and Vice Versa . . . . . . . . . . . . . . . . . . . . .9.13<br />

Character/Integer Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.14<br />

Complex Character Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.16<br />

Using <strong>the</strong> Name of a Variable as a Character Value . . . . . . . . . . . . . . . . . . . . .9.17<br />

The MATCHES and XMATCHES Opera<strong>to</strong>rs. . . . . . . . . . . . . . . . . . . . . . . . . .9.18<br />

MATCHES: Meta-Characters and Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.20<br />

CLAG: A Lag using a character argument . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.23<br />

CONCATENATION OF CHARACTER CONSTANTS . . . . . . . . . . . . . . . . . . . .9.23<br />

<strong>PPL</strong>: Date and Time Commands and Functions<br />

DATE ANDTIME FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.1<br />

Functions Which create or Use Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.1<br />

Six Simple Date Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.3<br />

DATE and TIME function details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.4<br />

DATE AND TIME COMMANDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.13<br />

The DATE.LANGUAGE Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.14<br />

The DATE.ORDER Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.14<br />

v


Changing <strong>the</strong> Case and Length of names. . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.15<br />

Month and Weekday Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.15<br />

DATE LOGICAL OPERATORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.16<br />

FORMAT.DATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.17<br />

TEXTWRITER: Report Writing<br />

OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1<br />

Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2<br />

The “No-Break” Character . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2<br />

<strong>PPL</strong> INSTRUCTIONS PUT AND PUTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2<br />

Character Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3<br />

Values of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3<br />

Expressions and Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4<br />

A Sample Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4<br />

Comments in <strong>PPL</strong> Clauses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.6<br />

OPTIONAL IDENTIFIERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.6<br />

CASE and STREAM: The Modes of Operation . . . . . . . . . . . . . . . . . . . . . . . .11.6<br />

JUSTIFY, BLANKS, PUTL.CHAR and SPREAD. . . . . . . . . . . . . . . . . . . . . .11.7<br />

MARGIN, LEADBLANK and WIDTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.7<br />

Optional Files: LABELS and OUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.8<br />

CONTROL WORDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.8<br />

Control Words <strong>to</strong> Produce a Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.8<br />

Positioning Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.9<br />

Positioning Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.10<br />

Positioning Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.10<br />

Labeling Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.13<br />

Specifying Missing Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.14<br />

A Complex Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.15<br />

Control Word Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.18<br />

COMPARING TEXTWRITER AND OTHER COMMANDS . . . . . . . . . . . . . .11.19<br />

OPTIONAL IDENTIFIERS: PostScript. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.20<br />

PostScript Page Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.21<br />

Setting <strong>the</strong> Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.21<br />

TEXTWRITER Control Words: The Fonts. . . . . . . . . . . . . . . . . . . . . . . . . . .11.22<br />

Control Words: Positioning <strong>the</strong> Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.24<br />

Indenting Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.27<br />

Colors in PostScript Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.28<br />

Underlining Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.29<br />

vi


P-<strong>STAT</strong> MACROS<br />

MACRO FORMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.1<br />

Types of Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.1<br />

S<strong>to</strong>ring and Activating Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.2<br />

Comments Within a Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.3<br />

Macros With Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.3<br />

Using Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.5<br />

Default Values for Arguments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.6<br />

Nested Instream Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.7<br />

Instream Macros in a Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.8<br />

Instream Macros in Subcommands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.8<br />

Using Lots of Instream Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.10<br />

MACRO COMMANDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.11<br />

CORRECTING MACROS IN THE EDITOR . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.11<br />

BLOCK MACROS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.12<br />

Executing a Block Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.12<br />

Macro Substitution Using Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.14<br />

Scope of Temporary Scratch Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.14<br />

Scratch Variables and Nested Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.14<br />

Temporary Files in Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.15<br />

Subcommands in Macros. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.15<br />

Conditional Execution of Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.18<br />

DIALOG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.19<br />

Format of <strong>the</strong> DIALOG command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.21<br />

Does <strong>the</strong> File Exist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.22<br />

SUBFILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.23<br />

SUBFILES Optional Identifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.23<br />

SUBFILES Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.25<br />

SUBFILES System Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.26<br />

vii


viii


ix<br />

FIGURES<br />

Basic Types of Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2<br />

Format of <strong>the</strong> SET Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8<br />

Format of <strong>the</strong> IF Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11<br />

AND and OR: Evaluations of Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.13<br />

IF and Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.17<br />

Permanent Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3<br />

Template Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4<br />

Renaming All <strong>the</strong> Variables in a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5<br />

Repeating Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6<br />

Using GOTO and PUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8<br />

Using PUT To Produce a Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10<br />

Accessing <strong>the</strong> Variable Name Within a Report . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11<br />

PROCESS: Counting Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13<br />

NCOT: Numeric Recodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2<br />

Multi-Variable RECODE With Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9<br />

RECODE or IF/SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11<br />

EQ and NE Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12<br />

Simple DO Loop with a List of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2<br />

DO With Two Scratch Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3<br />

DO: Range and Stepsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4<br />

DO Loops: An Example of Each Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5<br />

Labelled DO, EXITDO and NEXTDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6<br />

Rename Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10<br />

GENERATE: Generated Versus Original . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12<br />

Dynamic Array, Wildcard, Prefix and GENERATE . . . . . . . . . . . . . . . . . . . . . . . 5.12<br />

Complex MASK: Generate Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13<br />

IF or IF-THEN-ELSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14<br />

IF-THEN with F.ELSE and M.ELSE in a Simple Hot Deck Example . . . . . . . . . 5.16<br />

IF-THEN-ELSE: The Data and <strong>the</strong> Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.18


IF-THEN-ELSE Block with Nested IF and a DO Loop . . . . . . . . . . . . . . . . . . . . . 5.19<br />

Calculating Variable Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4<br />

Using LAG and DIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8<br />

Interaction of LAG and IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9<br />

EXPAND Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.14<br />

Showing <strong>the</strong> Differences Between .N., .HERE. and .USED. . . . . . . . . . . . . . . . 6.17<br />

FIRST and LAST with Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2<br />

Using Scratch Variables and FIRST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4<br />

Creating a Summary Case with FIRST and LAST . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5<br />

Moving Values Between Files with <strong>the</strong> P Vec<strong>to</strong>r . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6<br />

DEFINE.ARRAY and SHOW.ARRAYS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8<br />

One-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.9<br />

Two-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.10<br />

Checking Variables Using PUT and Scratch Variables . . . . . . . . . . . . . . . . . . . . . 8.11<br />

Using CARRY in <strong>the</strong> SPLIT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.13<br />

Selecting Variables for SPLIT with USE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.14<br />

Naming <strong>the</strong> New Variables with CREATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.15<br />

Multiple CREATE Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.16<br />

Producing an Index Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.17<br />

Using STEP and CYCLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.18<br />

A Simple COLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.19<br />

Collecting BY Group Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.20<br />

Collecting Cases in a Specified Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.21<br />

Sorting <strong>the</strong> Collected Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.22<br />

A Complex Modification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.23<br />

A Second Complex Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.25<br />

Before and After COLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.26<br />

The XEQ Opera<strong>to</strong>r for Tests that Respect Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6<br />

The CVAL Function for Bells and Whistles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.15<br />

Nesting Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.16<br />

Using VARNAME, SPLIT and COLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.18<br />

File of Character Data for MATCHES and XMATCHES . . . . . . . . . . . . . . . . . . . 9.19<br />

MATCHES and XMATCHES: Meta-Characters . . . . . . . . . . . . . . . . . . . . . . . . . 9.21<br />

DATE Logical Opera<strong>to</strong>rs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.16<br />

x


FORMAT.DATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.18<br />

FORMAT.DATE Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.19<br />

Producing a Report: The Input Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4<br />

Producing a Report: The TEXTWRITER Command . . . . . . . . . . . . . . . . . . . . . . 11.5<br />

Producing a Report: The Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6<br />

A Form Letter: The Input File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9<br />

A Form Letter: The TEXTWRITER Command . . . . . . . . . . . . . . . . . . . . . . . . . 11.11<br />

A Form Letter: One Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.12<br />

TEXTWRITER: Displaying all <strong>the</strong> Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.14<br />

A Complex Report: The Input and Labels Files . . . . . . . . . . . . . . . . . . . . . . . . . 11.15<br />

A Complex Report: The Report (Two Pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.16<br />

A Complex Report: The TEXTWRITER Command . . . . . . . . . . . . . . . . . . . . . 11.17<br />

PostScript Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.21<br />

Justification in PostScript Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.23<br />

Changing Fonts Text in a PostScript Paragraph . . . . . . . . . . . . . . . . . . . . . . . . . . 11.23<br />

Font Changes in a Justified PostScript Paragraph . . . . . . . . . . . . . . . . . . . . . . . . 11.24<br />

TEXTWRITER: Tabular Ouput with PostScript . . . . . . . . . . . . . . . . . . . . . . . . . 11.25<br />

PostScript: Tables with Proportional Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.26<br />

Indenting <strong>the</strong> Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.28<br />

Underlining <strong>the</strong> Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.30<br />

Activating Three Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3<br />

Block Macro With Keyword Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4<br />

Block Macro With Positional Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4<br />

Macro With Positional Arguments and Default Values . . . . . . . . . . . . . . . . . . . . . 12.6<br />

Macros Can Call Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7<br />

Instream Macros in Subcommand Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.9<br />

Lots of Instream Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.10<br />

Defining a Block Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.12<br />

The RUN Command and Partial Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.13<br />

Macros: Temporary File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.15<br />

Macros: Supplying Subcommands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.16<br />

Macro with Conditional Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.17<br />

Macros: Reversing <strong>the</strong> Order of Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.18<br />

Macros: DIALOG Provides an Interactive Front End . . . . . . . . . . . . . . . . . . . . . 12.19<br />

Macro With SUBFILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.23<br />

The SUBFILE Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.24<br />

xi


1<br />

<strong>PPL</strong>:<br />

Introduction<br />

P-<strong>STAT</strong> accepts information in many different forms. Information may be numeric, as is average yearly rainfall<br />

or <strong>to</strong>tal au<strong>to</strong>mobile production, or it may be text or character, as is a name or an address. P-<strong>STAT</strong> accepts information<br />

from a variety of sources, including disk, tape, or <strong>the</strong> users terminal and holds this information in a<br />

compressed rectangular format called a “P-<strong>STAT</strong> system file”. This file is composed of rows (cases or records)<br />

which contain one or more variables (fields). The first step in using P-<strong>STAT</strong> is <strong>to</strong> convert your data in<strong>to</strong> P-<strong>STAT</strong><br />

system file format. The commands which create a P-<strong>STAT</strong> system file are described in “P-<strong>STAT</strong> Introduc<strong>to</strong>ry<br />

Manual” and “P-<strong>STAT</strong>: Utility Commands”. They include:<br />

1. MAKE when <strong>the</strong> data are in ASCII format on an external disk or tape, or when full<br />

screen capabilities are not available on <strong>the</strong> terminal.<br />

2. TEXTFILE.IN when <strong>the</strong> data are in ASCII (text) format delimited by tabs, commas or blanks.<br />

The first row of data may contain variable labels.<br />

3. FILE.IN primarily used when <strong>the</strong> data in an external file are in a binary format.<br />

4. SPSS.IN when <strong>the</strong> data are in SPSS export format.<br />

The P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong>, called <strong>PPL</strong> for short, is a language within <strong>the</strong> P-<strong>STAT</strong> program. Both<br />

simple and complex manipulations may be done using <strong>PPL</strong> instructions, opera<strong>to</strong>rs, functions and system variables.<br />

<strong>PPL</strong> permits logical testing and selection of cases and variables, modification of existing variables, and generation<br />

of new variables.<br />

<strong>PPL</strong> may be used <strong>to</strong> modify any P-<strong>STAT</strong> system file as that file is read by any command. Modifications are<br />

temporary unless a new output file is produced. Both numeric and character variables may be tested, selected and<br />

modified. Most of <strong>the</strong> basic <strong>PPL</strong> instructions and opera<strong>to</strong>rs are applicable <strong>to</strong> ei<strong>the</strong>r numeric or character variables.<br />

However, <strong>the</strong>re is a class of functions such as SQRT, <strong>the</strong> square root function, which only applies <strong>to</strong> numeric data,<br />

and <strong>the</strong>re is ano<strong>the</strong>r class of functions such as MATCHES, <strong>the</strong> string matching function, which only applies <strong>to</strong><br />

character data.<br />

An important concept in understanding how <strong>PPL</strong> works is that an input file is not changed in any way by <strong>the</strong><br />

programming language statements. When you request a given P-<strong>STAT</strong> command such as LIST or MODIFY:<br />

1. <strong>the</strong> P-<strong>STAT</strong> executive routines determine which command is required and passes control <strong>to</strong> that<br />

command;<br />

2. <strong>the</strong> command prepares <strong>to</strong> do its job and <strong>the</strong>n, when it is ready, asks <strong>the</strong> executive routines for a row<br />

of data from <strong>the</strong> input file;<br />

3. <strong>the</strong> executive routines determine if <strong>the</strong>re is any <strong>PPL</strong>. It is <strong>the</strong>se routines which create new variables,<br />

recode existing variables, and process any logical selections;<br />

4. if, during <strong>the</strong> processing of <strong>the</strong> <strong>PPL</strong>, <strong>the</strong> executive routines determine that <strong>the</strong> current case has failed<br />

in some way and is not needed by <strong>the</strong> command, it s<strong>to</strong>ps processing that case and reads <strong>the</strong> next case.<br />

Thus <strong>the</strong> command that is currently executing has no knowledge of <strong>the</strong> original input case. It knows<br />

only about those cases which survive <strong>the</strong> <strong>PPL</strong>, and it knows about those cases only in <strong>the</strong>ir post-<strong>PPL</strong><br />

form.


1.2 <strong>PPL</strong>: Introduction<br />

When a command like MODIFY produces an output file, any <strong>PPL</strong> that is done <strong>to</strong> <strong>the</strong> input file is permanent in<br />

<strong>the</strong> sense that <strong>the</strong> output file reflects that <strong>PPL</strong>.<br />

While <strong>PPL</strong> is most frequently used <strong>to</strong> modify a case of data in an existing P-<strong>STAT</strong> system file, <strong>the</strong>re are provisions<br />

for passing data between cases within a file, between files, and even between P-<strong>STAT</strong> commands. This<br />

makes it possible <strong>to</strong> get summary information, <strong>to</strong> do conditional execution of <strong>the</strong> <strong>PPL</strong> within a command, and also<br />

<strong>to</strong> change <strong>the</strong> direction of a job stream depending on <strong>the</strong> data that are found or <strong>the</strong> results of a previous<br />

computation.<br />

1.1 THE VARIABLES<br />

There are three types of variables. The first type is a variable in a P-<strong>STAT</strong> system file. The variables in a P-<strong>STAT</strong><br />

file may be numbers or character strings. Every case (row) of a file contains 1 or more such variables. Each variable<br />

has a name. The name of a variable can contain letters, digits, dots, underscores and, if starting with a tag,<br />

two colons. It has at most 64 characters and must start with a letter. If a tag is supplied, it may be 1 <strong>to</strong> 16 characters<br />

long and MUST be followed by <strong>the</strong> double colon (::).<br />

The variables in a given P-<strong>STAT</strong> system file can only be modified when <strong>the</strong> rows of that file are rad by a<br />

P-<strong>STAT</strong> command. P-<strong>STAT</strong> system variables and scratch variables, described below, can only be 16 characters<br />

long and do not have a tag.<br />

The second type of variable is a P-<strong>STAT</strong> system variable. These variables are not part of a P-<strong>STAT</strong> file. Instead,<br />

<strong>the</strong>y reside in memory. They contain values such as <strong>the</strong> current date and <strong>the</strong> current page number. These<br />

variables, some of <strong>the</strong>m numeric and some of <strong>the</strong>m character strings, are created and maintained by <strong>the</strong> P-<strong>STAT</strong><br />

executive routines and are available for your use. For example, .DATE. is <strong>the</strong> system variable for <strong>the</strong> current date.<br />

Most of <strong>the</strong> system variables except for .PAGE., <strong>the</strong> current page number, cannot be changed by a user. System<br />

variable names look like regular variable names except that <strong>the</strong>y always begin and end with a decimal point.<br />

Scratch variables also exist in memory ra<strong>the</strong>r than in a P-<strong>STAT</strong> system file. Scratch variables, which can be<br />

ei<strong>the</strong>r numeric or character, are created by you as you need <strong>the</strong>m. Scratch variables come in two flavors which<br />

are distinguished by <strong>the</strong> way <strong>the</strong>y are named. A scratch variable with a name that begins with a single pound sign<br />

(#) only exists for <strong>the</strong> duration of <strong>the</strong> current command or macro. This temporary form of scratch variable is usually<br />

used ei<strong>the</strong>r <strong>to</strong> hold an intermediate results in a series of computations or <strong>to</strong> pass information between cases in<br />

a P-<strong>STAT</strong> file.<br />

A scratch variable with a name that begins with two pound signs (##) exists from <strong>the</strong> time it is created until<br />

<strong>the</strong> end of <strong>the</strong> P-<strong>STAT</strong> session. This permanent form of scratch variable allows information <strong>to</strong> be passed between<br />

files and between commands. Because a permanent scratch variable exists between commands, it can be created<br />

and changed even when <strong>the</strong>re is no active P-<strong>STAT</strong> file.<br />

1.2 MATCHING NAMES<br />

With <strong>the</strong> longer names <strong>the</strong>re is an increasing need <strong>to</strong> be able <strong>to</strong> refer <strong>to</strong> <strong>the</strong>m in some abbreviated manner. Wildcards<br />

are one way <strong>to</strong> do this. They can be used in Version 3, anywhere that a variable name is referenced. A<br />

wildcard reference contains at least one question mark (?). Wildcards can be used in both commands and subcommands.<br />

They are discussed in datail in <strong>the</strong> next chapter.<br />

1.3 VECTORS AND ARRAYS<br />

A vec<strong>to</strong>r is a one dimensional array of values. The variables that are represented in a case of data can be thought<br />

of as an array. This array is referenced as <strong>the</strong> V vec<strong>to</strong>r. Using <strong>the</strong> V vec<strong>to</strong>r, <strong>the</strong> variables in a case can be addressed<br />

with array notation. The variable V(1) is <strong>the</strong> first variable in <strong>the</strong> case. The variable V(23) is <strong>the</strong> twenty third variable<br />

in <strong>the</strong> case. The V vec<strong>to</strong>r has a dimension that corresponds <strong>to</strong> <strong>the</strong> number of variables in <strong>the</strong> current P-<strong>STAT</strong><br />

system file. The V vec<strong>to</strong>r can only be referenced as <strong>the</strong> P-<strong>STAT</strong> system file is being read in<strong>to</strong> a command.<br />

There is a second vec<strong>to</strong>r that is also available for your use. This vec<strong>to</strong>r is know as <strong>the</strong> P vec<strong>to</strong>r. It contains<br />

as many double precision numeric elements as <strong>the</strong> maximum number of variables in a file. In most versions of


<strong>PPL</strong>: Introduction 1.3<br />

P-<strong>STAT</strong> <strong>the</strong> P vec<strong>to</strong>r has 6000 elements. The elements of <strong>the</strong> P vec<strong>to</strong>r are initialized <strong>to</strong> missing when <strong>the</strong> P-<strong>STAT</strong><br />

run begins and remain missing until you change <strong>the</strong>m. The P vec<strong>to</strong>r provides an easy way <strong>to</strong> pass a large number<br />

of values across cases or between commands. Since <strong>the</strong> P vec<strong>to</strong>r exists in memory ra<strong>the</strong>r than in a P-<strong>STAT</strong> system<br />

file it can be referenced even when <strong>the</strong>re is no active P-<strong>STAT</strong> system file.<br />

A third type of vec<strong>to</strong>r, which uses <strong>the</strong> variables in your P-<strong>STAT</strong> system file, is also available. If you wish a<br />

group of variables <strong>to</strong> be addressed with vec<strong>to</strong>r notation, you must name <strong>the</strong>m in such a way that all <strong>the</strong> variables<br />

<strong>to</strong> be included in <strong>the</strong> vec<strong>to</strong>r and only those variables have <strong>the</strong> same prefix or suffix. This prefix or suffix, combined<br />

with <strong>the</strong> wildcard character “?”, is used <strong>to</strong> denote <strong>the</strong> members of a vec<strong>to</strong>r that can be addressed with a<br />

subscript. This feature is usually used ei<strong>the</strong>r <strong>to</strong> simplify <strong>the</strong> instructions when selecting variables with <strong>the</strong> KEEP<br />

or DROP instruction, or in conjunction with DO loops which provide a powerful mechanism for creating<br />

subscripts.<br />

Similar <strong>to</strong> <strong>the</strong> dynamic vec<strong>to</strong>rs are multi-dimensional user-defined arrays which can hold ei<strong>the</strong>r numbers or<br />

characters. These are discussed in full in Chapter 8 “<strong>PPL</strong>: Across-Case Modifications”.<br />

1.4 THE COMMANDS<br />

<strong>PPL</strong> can be used any time that a P-<strong>STAT</strong> system file is read by any P-<strong>STAT</strong> command. The input file remains<br />

unchanged, but <strong>the</strong> cases that are processed by <strong>the</strong> command reflect <strong>the</strong> modifications. There are five commands<br />

which do not have a statistical or display function but which are specifically associated with <strong>PPL</strong>. These commands<br />

are covered in detail in <strong>the</strong> following chapters.<br />

The MODIFY command is used <strong>to</strong> read an existing P-<strong>STAT</strong> system file and produce a new file which contains<br />

<strong>the</strong> cases after <strong>the</strong> <strong>PPL</strong> is applied. The MODIFY command is especially useful when you are preparing a<br />

new study for analysis and need <strong>to</strong> clean <strong>the</strong> data.<br />

The COMPARE command takes two files and compares <strong>the</strong> contents. A major use of COMPARE is <strong>to</strong> compare<br />

<strong>the</strong> input and output from a MODIFY command as a check that <strong>the</strong> resulting output file is as expected.<br />

The CHECK command examines an existing P-<strong>STAT</strong> file for problems and s<strong>to</strong>res <strong>the</strong> results in system variables<br />

that can <strong>the</strong>n be tested or printed. The CHECK command should always be used when <strong>the</strong>re has been a<br />

power failure or system crash while a P-<strong>STAT</strong> file was being processed. It is also useful when you need <strong>to</strong> know<br />

if a file has any remaining cases after a MODIFY.<br />

The TEXTWRITER command is a vehicle for <strong>PPL</strong>, with additional controls <strong>to</strong> format <strong>the</strong> printed page.<br />

The PROCESS command has a P-<strong>STAT</strong> system file as input but has no output file and does no computation.<br />

PROCESS is used <strong>to</strong> s<strong>to</strong>re <strong>the</strong> information in <strong>the</strong> P vec<strong>to</strong>r, arrays, or in permanent scratch variables which can <strong>the</strong>n<br />

be accessed by subsequent commands.<br />

In addition, a number of <strong>PPL</strong> opera<strong>to</strong>rs can be used as standalone commands. They can be used with system<br />

variables, scratch variables, <strong>the</strong> P vec<strong>to</strong>r and <strong>the</strong> user-defined arrays. These standalone <strong>PPL</strong> commands are: IF,<br />

SET, INCREASE, DECREASE, GENERATE, PUT, PUTL, IF-THEN-ELSE, DIALOG, BRANCH and DO<br />

loops.<br />

PUT .DATE. $<br />

GEN ##Project:C40 = 'ABC, <strong>Inc</strong>. January 2008 Report' $<br />

GEN ##Constant = SQRT ( 43.265 ) $<br />

1.5 P-<strong>STAT</strong> SYSTEM FILE: CURRENT OR PREVIOUS<br />

P-<strong>STAT</strong> keeps track of <strong>the</strong> previous and current versions of each P-<strong>STAT</strong> system file that you create. You supply<br />

a file name of sixteen or fewer characters, and P-<strong>STAT</strong> adds <strong>the</strong> extension (suffix) “.PS1” or “.PS2”. As that file<br />

is modified, <strong>the</strong> extension name alternates. However, at all times, P-<strong>STAT</strong> knows which file is <strong>the</strong> current one<br />

and which is <strong>the</strong> previous one. You use only <strong>the</strong> name you gave <strong>the</strong> file:<br />

PLOT Cells;


1.4 <strong>PPL</strong>: Introduction<br />

for example, and P-<strong>STAT</strong> inputs <strong>the</strong> current version <strong>to</strong> PLOT. However, if you want <strong>the</strong> prior version for some<br />

reason, use <strong>the</strong> <strong>PPL</strong> instruction PREVIOUS<br />

PLOT Cells [ PREVIOUS ] ;<br />

The <strong>PPL</strong> instruction PREVIOUS is enclosed in square brackets and follows directly after <strong>the</strong> file name. It must<br />

be <strong>the</strong> first <strong>PPL</strong> clause. Additional <strong>PPL</strong> clauses may follow. The comparable instruction CURRENT is also available.<br />

When nei<strong>the</strong>r is used, it is assumed that <strong>the</strong> current file is <strong>the</strong> desired one.<br />

1.6 ORGANIZATION<br />

This manual contains chapters describing <strong>the</strong> details of <strong>the</strong> programming language and <strong>the</strong> commands specifically<br />

associated with <strong>PPL</strong>.<br />

• “<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong>” covers <strong>PPL</strong> punctuation, case selection, variable selection<br />

and simple logical selection with “IF”.<br />

• “<strong>PPL</strong>: The Commands” explains more about temporary and permanent modifications and covers <strong>the</strong><br />

MODIFY, <strong>PPL</strong>, and PROCESS commands in detail.<br />

• “<strong>PPL</strong>: NCOT and RECODE” covers <strong>the</strong> NCOT and RECODE functions, including multi-variable<br />

recodes.<br />

• “<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks” covers those functions in detail. This chapter includes<br />

<strong>the</strong> use of DO loops <strong>to</strong> generate and rename lists of variables.<br />

• “<strong>PPL</strong>: Functions and System Variables” covers <strong>the</strong> numeric functions and many of <strong>the</strong> P-<strong>STAT</strong> system<br />

variables.<br />

• Random Number and Distribution Functions also covers <strong>the</strong> “Fuzzy equals problem” and <strong>the</strong> functions<br />

<strong>to</strong> protect against this problem.<br />

• “<strong>PPL</strong>: Across Case Modification” covers <strong>the</strong> use of <strong>the</strong> SPLIT and COLLECT functions as well as<br />

uses of <strong>the</strong> P vec<strong>to</strong>r, permanent scratch variables and user-defined arrays.<br />

• “<strong>PPL</strong>: Modification of Character Variables” covers <strong>the</strong> character functions including <strong>the</strong> MATCH<br />

function. MATCH provides string matching capabilities similar <strong>to</strong> those found in <strong>the</strong> Unix commands<br />

lex and yacc.<br />

• <strong>PPL</strong>: Date and Time Commands and Functions.<br />

• TEXTWRITER: A Vehicle for <strong>PPL</strong><br />

• MACROS


<strong>PPL</strong>: Introduction 1.5<br />

VARIABLES<br />

There are three different types of variables available in P-<strong>STAT</strong><br />

Fields in a P-<strong>STAT</strong> system file<br />

SUMMARY<br />

may be numeric or character strings. Variable names have 1-64 characters composed only of letters,<br />

numbers, underscores and decimal points. The first character must be a letter. Variable names may begin<br />

with a tag of 1-16 characters followed by 2 colons (::). These variables are only available when a P-<strong>STAT</strong><br />

system file is read by a P-<strong>STAT</strong> command<br />

System variables<br />

may be numeric or character. These variables are created and maintained by <strong>the</strong> P-<strong>STAT</strong> system itself<br />

<strong>to</strong> contain information such as <strong>the</strong> current date, current file name, or <strong>the</strong> results of <strong>the</strong> most recent command.<br />

These variables, which always have names that both begin and end with a period, for example<br />

.DATE. , are s<strong>to</strong>red in memory and can be used (printed, interrogated, etc.) but not changed by <strong>the</strong> user.<br />

Scratch variables<br />

may be numeric or character. These variables, which are created by <strong>the</strong> user as needed, reside in memory.<br />

Temporary scratch variables, which only exist for <strong>the</strong> duration of a command or macro, have names that<br />

begin with a single pound (#) sign. Permanent scratch variables exist for <strong>the</strong> remainder of <strong>the</strong> P-<strong>STAT</strong><br />

session and have names that begin with two pound (##) signs. Scratch variables are limited <strong>to</strong> 16 characters<br />

starting with a letter and containing letters, numbers, underscores and decimal points.<br />

VECTORS AND ARRAYS<br />

Groups of related variables may be considered a vec<strong>to</strong>r of values. These are typically used in DO loops.<br />

V vec<strong>to</strong>r<br />

P vec<strong>to</strong>r<br />

The V vec<strong>to</strong>r references <strong>the</strong> current row (case) of data in a P-<strong>STAT</strong> system file. Variables may be refereed<br />

<strong>to</strong> by <strong>the</strong>ir names (Age, Q1, Density, etc.) or by <strong>the</strong>ir position in <strong>the</strong> file, for example: v(3) or V(#j).<br />

The subscript may be a constant or an expression (such as a scratch variable) that evaluates <strong>to</strong> a position.<br />

This vec<strong>to</strong>r is only available when a file is being read by a P-<strong>STAT</strong> command.<br />

The P vec<strong>to</strong>r is a numeric vec<strong>to</strong>r whose size depends on <strong>the</strong> maximum number of variables allowed in<br />

<strong>the</strong> version of P-<strong>STAT</strong> that is being used. The values in <strong>the</strong> P vec<strong>to</strong>r are set <strong>to</strong> missing when <strong>the</strong> P-<strong>STAT</strong><br />

session begins. They are available for use in <strong>PPL</strong> and allow values <strong>to</strong> be passed between cases in a file<br />

and between commands.<br />

Dynamic vec<strong>to</strong>r<br />

Dynamic vec<strong>to</strong>rs depend on <strong>the</strong> naming of <strong>the</strong> variables in <strong>the</strong> P-<strong>STAT</strong> system file. Any group of variables<br />

with <strong>the</strong> same prefix or suffix can be referenced as a vec<strong>to</strong>r by combining <strong>the</strong> prefix or suffix with<br />

<strong>the</strong> wildcard character, <strong>the</strong> question mark (?). Thus Q1? refers <strong>to</strong> all variables in <strong>the</strong> file beginning with<br />

<strong>the</strong> characters “Q1”.


1.6 <strong>PPL</strong>: Introduction<br />

User-defined arrays<br />

Arrays, one-dimensional and multi-dimensional, for numeric and character data can be defined and used.<br />

They are described in full in Chapter 8 “<strong>PPL</strong>: Across-Case Modificiations”.<br />

COMMANDS<br />

The P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong> can be used with any P-<strong>STAT</strong> command. However, <strong>the</strong>re are 6 commands<br />

of particular importance when <strong>PPL</strong> is considered.<br />

MODIFY<br />

CHECK<br />

COMPARE<br />

PROCESS<br />

takes an input P-<strong>STAT</strong> system file and applies <strong>PPL</strong> <strong>to</strong> produce an output P-<strong>STAT</strong> system file that is<br />

changed in some way.<br />

MODIFY Myfile [ here goes <strong>PPL</strong> ], OUT Newfile $<br />

examines a P-<strong>STAT</strong> system file and reports on its status. It is very useful when a system crash has occurred.<br />

It is also useful for obtaining information such as <strong>the</strong> number of cases in <strong>the</strong> file. The information<br />

from CHECK is s<strong>to</strong>red in system variables which may be tested in subsequent <strong>PPL</strong>.<br />

takes two P-<strong>STAT</strong> system files and compares <strong>the</strong>ir contents. The resulting differences are s<strong>to</strong>red in a new<br />

P-<strong>STAT</strong> system file.<br />

uses a P-<strong>STAT</strong> system file but produces nei<strong>the</strong>r an output file nor a printed report. Is is used <strong>to</strong> accumulated<br />

information about <strong>the</strong> file and s<strong>to</strong>re it in <strong>the</strong> P vec<strong>to</strong>r or permanent scratch variables for use in<br />

subsequent commands.<br />

TEXTWRITER<br />

is a vehicle for <strong>PPL</strong> and <strong>the</strong> PUT function. It has additional features for formatting <strong>the</strong> output such as<br />

justification of <strong>the</strong> text, indenting, paragraph controls, and font changes for postscript output.<br />

STANDALONE <strong>PPL</strong> COMMANDS<br />

Many <strong>PPL</strong> opera<strong>to</strong>rs can be used as standalone commands. These standalone <strong>PPL</strong> commands are: IF,<br />

SET,INCREASE, DECREASE, GENERATE, PUT, PUTL, IF-THEN-ELSE, DIALOG, BRANCH and<br />

DO loops.<br />

<strong>PPL</strong> INSTRUCTIONS<br />

<strong>PPL</strong> instructions are enclosed in brackets and immediately follow <strong>the</strong> filename.<br />

CURRENT<br />

COMPARE Myfile [ CURRENT ] Myfile [ PREVIOUS ], OUT mydiffs $<br />

CURRENT selects <strong>the</strong> more recently created version of <strong>the</strong> P-<strong>STAT</strong> system file. CURRENT may be<br />

used with o<strong>the</strong>r <strong>PPL</strong>.


<strong>PPL</strong>: Introduction 1.7<br />

PREVIOUS<br />

PREVIOUS selects <strong>the</strong> previous version of <strong>the</strong> P-<strong>STAT</strong> system file. PREVIOUS may be used with o<strong>the</strong>r<br />

<strong>PPL</strong>


2<br />

<strong>PPL</strong>:<br />

Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

This chapter explains <strong>the</strong> syntax and punctuation of <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong>. Case and variable selection<br />

is covered in detail. In addition, generating new variables, recoding existing variables and logical selection<br />

using a simple “IF” are explained.<br />

2.1 CASE AND VARIABLE SELECTION<br />

<strong>PPL</strong> begins with a left bracket “[” and ends with a right bracket “]”. Individual clauses are terminated with ei<strong>the</strong>r<br />

a semicolon “;” or a right bracket “]”. Clauses within brackets are separated by semicolons as in <strong>the</strong> first example<br />

below. If <strong>the</strong> right bracket is used, <strong>the</strong> next clause (if any) must begin with ano<strong>the</strong>r left bracket. The following<br />

two command phrases are functionally equivalent.:<br />

SURVEY Patients [ CASES 1 TO 10 ;<br />

KEEP Age Sex Race ],<br />

SURVEY Patients [ CASES 1 <strong>to</strong> 10 ]<br />

[ KEEP Age Sex Race ],<br />

Each is a single phrase that contains two modification clauses. The command name is SURVEY. Its argument is<br />

<strong>the</strong> filename Patients and both of <strong>the</strong> modification clauses which are <strong>to</strong> be applied <strong>to</strong> that file. In this example, <strong>the</strong><br />

modifications are a case selection, indicated by <strong>the</strong> word CASES (ROWS is a synonym) in <strong>the</strong> first modification<br />

clause, and a variable selection, indicated by <strong>the</strong> word KEEP in <strong>the</strong> second modification clause.<br />

The first word in each clause tells P-<strong>STAT</strong> what kind of modification is involved. P-<strong>STAT</strong> recognizes CAS-<br />

ES or CASE as <strong>the</strong> keyword for case selection and ei<strong>the</strong>r KEEP or DROP as keywords for variable selection. IF<br />

is <strong>the</strong> keyword for logical selection. SET is <strong>the</strong> keyword for recoding or setting an existing variable <strong>to</strong> a new value.<br />

GENERATE is <strong>the</strong> keyword for generating or creating a new variable. Figure 2.1 contains examples of <strong>the</strong> basic<br />

types of modifications — case selection, variable selection, logical selection, recoding of existing variables, and<br />

creation of new variables. File Dogs contains five variables and three cases. The results of each modification<br />

clause are shown on <strong>the</strong> right.<br />

Many modification clauses may be used within <strong>the</strong> single command phrase which describes an input file.<br />

Each clause is used in turn <strong>to</strong> modify <strong>the</strong> cases of <strong>the</strong> file as it is read. The command itself is executed after <strong>the</strong><br />

modifications have taken place. A comma following a right bracket means that <strong>the</strong> <strong>PPL</strong> for that file is finished,<br />

and some <strong>to</strong>tally different command clause is about <strong>to</strong> begin. Therefore, you should NOT put commas between<br />

sets of <strong>PPL</strong> brackets.<br />

LIST Dogs [ KEEP Name Sex; IF Sex EQ 2, RETAIN ] $ is correct<br />

LIST Dogs [ KEEP Name Sex ] [ IF Sex EQ 2, RETAIN ] $ is correct<br />

LIST DOGS [ KEEP Name Sex ], [ IF Sex EQ 2, RETAIN ] $ is an ERROR<br />

The comma in a command is a signal that <strong>the</strong> next word is an identifier, a keyword, recognized by <strong>the</strong> command.<br />

The string “[ IF ...” is part of <strong>the</strong> <strong>PPL</strong> and not an identifier for <strong>the</strong> LIST command. It is easy <strong>to</strong> avoid this<br />

error if you use brackets only for major pieces of <strong>PPL</strong> and use <strong>the</strong> semicolon as <strong>the</strong> termina<strong>to</strong>r for individual<br />

clauses.


2.2 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

__________________________________________________________________________<br />

Figure 2.1 Basic Types of Modifications<br />

File Dogs: Before Modifications File Dogs: After Modifications<br />

Name Sex Age Wt Ht Diet<br />

Max 1 2 15 12 1<br />

Spot 2 7 24 18 1<br />

Rags 1 4 10 - 2<br />

CASES <strong>to</strong> select cases: Name Sex Age Wt Ht Diet<br />

LIST Dogs Max 1 2 15 12 1<br />

[ CASES 1 3 ] $ Rags 1 4 10 - 2<br />

KEEP <strong>to</strong> select variables: Name Diet<br />

LIST Dogs Max 1<br />

[ KEEP Name Diet ] $ Spot 1<br />

Rags 2<br />

DROP <strong>to</strong> omit variables: Name Sex Age Ht Diet<br />

LIST Dogs Max 1 2 12 1<br />

[ CASE 1 ; DROP Wt ] $<br />

IF for logical selection: Name Sex Age Wt Ht Diet<br />

LIST Dogs Spot 2 7 24 18 1<br />

[ IF Sex EQ 2, RETAIN ] $<br />

SET <strong>to</strong> modify existing variables: Name Sex Age Wt Ht Diet<br />

LIST Dogs Max 1 2 15 1.0 1<br />

[ SET Ht = Ht / 12 ] $ Spot 2 7 24 1.5 1<br />

Rags 1 4 10 - 2<br />

GENERATE <strong>to</strong> create new variables: Name Ratio<br />

LIST Dogs Max .80<br />

[ GEN Ratio = Ht / Wt ; Spot .75<br />

KEEP Name Ratio ] $ Rags -<br />

__________________________________________________________________________


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.3<br />

Modification clauses are part of <strong>the</strong> P-<strong>STAT</strong> command structure. As such, <strong>the</strong>y are free-format and may be<br />

continued on successive lines. However, each individual word or label must fit entirely on one line; it must not<br />

be broken across lines.<br />

There is a limit <strong>to</strong> <strong>the</strong> number of modifications which can be done at one time. This limit varies with <strong>the</strong> size<br />

of P-<strong>STAT</strong> that is being used. The size of <strong>the</strong> <strong>PPL</strong> workspace, measured in 4-byte words, is:<br />

Whopper II = 250,000 Whopper IV = 1,500,000<br />

An error message is printed when <strong>the</strong> limit is exceeded. The data modification area is adequate for most uses.<br />

However, if <strong>the</strong> space should prove <strong>to</strong>o small <strong>to</strong> do a particular series of modifications in a single pass of <strong>the</strong> data<br />

file, <strong>the</strong> modifications may be done using <strong>the</strong> MODIFY command several times, creating temporary intermediate<br />

files.<br />

2.2 Case Selection<br />

Cases in a P-<strong>STAT</strong> system file are synonymous with rows in a file, despite <strong>the</strong> fact that <strong>the</strong> data for each case may<br />

have originally been collected on multiple records or may list on a terminal or printer over several lines. Case<br />

selection takes <strong>the</strong> following form:<br />

[ CASES 125 TO 199 345 ]<br />

It is indicated by <strong>the</strong> word CASES or CASE immediately following <strong>the</strong> left bracket. (Ei<strong>the</strong>r ROWS or ROW may<br />

also be used.)<br />

Case selection uses <strong>the</strong> position of <strong>the</strong> case in <strong>the</strong> file <strong>to</strong> determine which cases are selected. Case references<br />

must be in ascending order whenever P-<strong>STAT</strong> files are accessed sequentially. Each of <strong>the</strong> following is a legal<br />

case selection clause:<br />

[ CASES 33 49 105 TO 200 223 300 TO 305 700 .ON. ]<br />

[ CASE 1 ]<br />

[ CASE 3 .ON. ]<br />

The use of <strong>the</strong> system variable .ON. in <strong>the</strong> first and third examples means “continue selecting cases from <strong>the</strong> current<br />

case onward until all <strong>the</strong> cases have been read”. You can tell that “.ON.” is a system variable because of <strong>the</strong><br />

name. System variables have names that look like legal P-<strong>STAT</strong> names except that <strong>the</strong>y always begin and end<br />

with a decimal point.<br />

A case may not be repeated in a case selection clause. (However, <strong>the</strong>re are o<strong>the</strong>r ways <strong>to</strong> include a case more<br />

than once. See <strong>the</strong> REPEAT instruction later in this manual.) Case selection acts as a filter on <strong>the</strong> file and is done<br />

before any o<strong>the</strong>r modifications take place, regardless of <strong>the</strong> position of <strong>the</strong> CASE clause among <strong>the</strong> o<strong>the</strong>r modifications.<br />

If ten cases are selected from a file with 2000 cases, <strong>the</strong> tenth of <strong>the</strong> selected cases is processed as if it<br />

were <strong>the</strong> last case in <strong>the</strong> file. A file may be modified by no more than one case selection clause.<br />

A major reason for using case selection is for test runs. If you have a large file and are doing transformations,<br />

it is prudent <strong>to</strong> do a trial run, selecting a few cases and printing <strong>the</strong> results so <strong>the</strong>y can be examined before <strong>the</strong> final<br />

run is made. When a trial run looks correct, <strong>the</strong> case selection is removed and <strong>the</strong> final<br />

run is done.<br />

2.3 Variable Selection<br />

There are two keywords which indicate variable selection: 1) KEEP, which is followed by a list of variables <strong>to</strong> be<br />

used, and 2) DROP, which is followed by a list of variables <strong>to</strong> be omitted. Variables may be selected by referencing<br />

ei<strong>the</strong>r <strong>the</strong>ir names or <strong>the</strong>ir positions in <strong>the</strong> file.<br />

These are selections of variables by <strong>the</strong>ir names (variable labels):<br />

LIST Myfile [ KEEP Sex Age Education ] $


2.4 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

SORT Myfile [ DROP <strong>Inc</strong>ome Rent ] ,<br />

BY Education, OUT SortEduc $<br />

LIST File3 [ KEEP Id Sex Age .ON. ] $<br />

Each selection clause begins with ei<strong>the</strong>r KEEP or DROP. These keywords identify a P-<strong>STAT</strong> variable selection.<br />

Each continues with a list of variable names, which are separated from each o<strong>the</strong>r by blanks. “.ON.” used for variable<br />

selection has <strong>the</strong> same meaning as it does for case selection. When used for variable selection it means<br />

“starting with <strong>the</strong> current variable do <strong>the</strong> KEEP or DROP <strong>to</strong> all <strong>the</strong> remaining variables”.<br />

It is often convenient <strong>to</strong> refer <strong>to</strong> a variable by position ra<strong>the</strong>r than by name, particularly when <strong>the</strong> variable<br />

name is long. There are some situations in which, by definition, a number can only refer <strong>to</strong> a position. There are<br />

o<strong>the</strong>r situations where a number could represent ei<strong>the</strong>r a constant or a variable position. To distinguish between<br />

<strong>the</strong>se two situations, <strong>the</strong> convention in P-<strong>STAT</strong> is that a constant is a number by itself, and a variable position is<br />

referenced with <strong>the</strong> notation V(n), where “n” is <strong>the</strong> position. V(1) is <strong>the</strong> variable in position 1 of <strong>the</strong> file. V(33) is<br />

<strong>the</strong> variable in position 33 of <strong>the</strong> file. With reference <strong>to</strong> <strong>the</strong> example in Figure 2.1,<br />

LIST Dogs [ KEEP V(1) V(2) V(6) ] $ is <strong>the</strong> same as<br />

LIST Dogs [ KEEP Name Sex Diet ] $<br />

Variable names and variable positions can be used in <strong>the</strong> same variable selection clause. The position of a variable<br />

is always <strong>the</strong> “current” position of that variable in <strong>the</strong> file. After variable selection or reordering, <strong>the</strong> initial positions<br />

of <strong>the</strong> variables may change. For example, this command:<br />

PLOT Tree [ KEEP V(10) V(3) TO V(6) ] ;<br />

inputs cases with five variables <strong>to</strong> <strong>the</strong> PLOT command. The variables are ordered as specified. A subsequent<br />

subcommand <strong>to</strong> plot variable 10 by variable 3:<br />

P V(10) * V(3) ;<br />

yields an error message, because <strong>the</strong>re are only five variables in <strong>the</strong> file given <strong>to</strong> <strong>the</strong> PLOT command. The variable<br />

that previously was in position 10 is now in position 1; <strong>the</strong> variable that was in position 3 is now in position 2, and<br />

so on.<br />

2.4 Variable Selection With WIldcards<br />

Consider a variable with <strong>the</strong> following name:<br />

A wildcard like<br />

age.oldest.surviving.child<br />

age?sur?ch?<br />

might be <strong>the</strong> most efficient way <strong>to</strong> refer <strong>to</strong> it. When compared <strong>to</strong> <strong>the</strong> above name, 'age' matches, <strong>the</strong> '?sur' says<br />

accept anything until 'sur' is found, <strong>the</strong> '?ch' says from <strong>the</strong>re accept anything through a 'ch', and <strong>the</strong> <strong>the</strong> final '?'<br />

says accept anything at all after that, if indeed <strong>the</strong>re is anything else. Thus,<br />

age.oldest.surviving.child<br />

is matched by age sur ch<br />

A wildcard usage can be thought of as a template for name matching. Differences in case do not matter. A<br />

wildcard template can be used <strong>to</strong> specify which variable (or, in some situations, variables) are <strong>to</strong> be used. The<br />

template is matched against <strong>the</strong> name of each variable in <strong>the</strong> file. The template uses single (?) or double (??) question<br />

marks <strong>to</strong> indicate how <strong>the</strong> matching should be done.<br />

A wildcard template contains at least one single (?) or double (??) question mark, and at least one string. The<br />

question marks serve as 'move until' opera<strong>to</strong>rs. A string consists of one or more ordinary characters that can be<br />

found in names. String matching ignores case. A template successfully matches a name when each template element<br />

progressively matches a part of <strong>the</strong> name, with <strong>the</strong> entire name being matched when <strong>the</strong> template is done.


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.5<br />

If a template starts with a string, <strong>the</strong> name being matched must also begin with that string. If <strong>the</strong> template<br />

ends with a single or double question mark, and <strong>the</strong> match has been successful so far, <strong>the</strong> rest of <strong>the</strong> name is accepted,<br />

and a match has occurred.<br />

A single question mark, followed by a string, matches <strong>the</strong> name through <strong>the</strong> NEXT remaining occurrence of that<br />

string. If no such string is found, <strong>the</strong> match fails. A double question mark, followed by a string, matches <strong>the</strong> name<br />

through <strong>the</strong> LAST remaining occurrence of that string. If no such string is found, <strong>the</strong> match fails.<br />

___________________________________________________________________________<br />

Figure 2.2 Examples of Wildcard Matching<br />

qq? will match any name that starts with 'qq'.<br />

?qq? will match any name that contains 'qq' anywhere.<br />

??qq will match any name that ends with 'qq' .<br />

ab?cde will match abxxcde, and also abcde.<br />

ab?cde will NOT match abcdecde, whereas<br />

ab??cde will.<br />

a?o?c will NOT match age.oldest.child .<br />

a?o?c? will because <strong>the</strong> final ? moved <strong>to</strong> <strong>the</strong> name's end<br />

a?ld will NOT, because it ends on <strong>the</strong> ld in old.<br />

a??ld will, because <strong>the</strong> ?? moved <strong>to</strong> <strong>the</strong> last ld.<br />

___________________________________________________________________________<br />

A wildcard can be used anywhere that <strong>the</strong> name of a variable could be used. A single match is usually what<br />

is expected. However:<br />

1. In KEEP or DROP phrases, and in LIST functions like SUM, a wildcard usage can have multiple<br />

matches, in which case all will be used.<br />

1. For example [KEEP ??income]<br />

2. There can be, in <strong>PPL</strong> expressions, multiple matches <strong>to</strong> a wildcard usage if a subscript follows, in paren<strong>the</strong>ses,<br />

<strong>to</strong> show which of <strong>the</strong> matches should be accessed at that point in <strong>the</strong> execution of <strong>the</strong> <strong>PPL</strong>.<br />

This permits looping through <strong>the</strong> matches.<br />

SORT xxx, BY pulse??pre pulse??post, OUT zzz $<br />

In <strong>the</strong> BY phrase each template should match one name, and <strong>the</strong> sort will be<br />

done on those two BY variables.<br />

[ SET <strong>to</strong>t? TO ?11?inc? + ?11?div? ]<br />

In <strong>the</strong> above, <strong>the</strong> actual variable names could be something like <strong>to</strong>tal_income_all_sources, year_2011.income<br />

and year_2011.dividends .<br />

[ KEEP ??income ]<br />

Wildcards can be used in KEEP or DROP phrases. The phrase shown above keeps all of <strong>the</strong> variables whose<br />

names end with INCOME. There can be one or more matches.<br />

[ SET <strong>to</strong>tal TO SUM( ??income) ]<br />

Wildcards may be used as input <strong>to</strong> <strong>the</strong> various LIST functions, which include sum, mean, max, first.good and<br />

such. The phrase shown above sets TOTAL <strong>to</strong> <strong>the</strong> sum of <strong>the</strong> variables whose names end with INCOME. There<br />

can be one or more matches. SET, INCREASE and DECREASE can be followed by a subscripted wildcard, as<br />

can <strong>the</strong> various operands in <strong>the</strong> rest of <strong>the</strong> expression.


2.6 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

[ DO #j = 1,5; SET ??income(#j) = 0; ENDDO ].<br />

Here, ??income remembers <strong>the</strong> positions of <strong>the</strong> variables whose names end with 'income'. In this example, <strong>the</strong>re<br />

are presumably five of <strong>the</strong>m. Using ??income(#j) when #j=2 accesses <strong>the</strong> second of <strong>the</strong>m, wherever in <strong>the</strong> file that<br />

variable may actually be. The example would set <strong>the</strong> 5 variables whose names end with 'income' <strong>to</strong> zero.<br />

2.5 Using Ranges in Selection Clauses<br />

Lists may contain single variables, many variables, and/or ranges of variables:<br />

[ KEEP Siblings TO Children Occup.Mo<strong>the</strong>r Race Age ]<br />

The cases received by <strong>the</strong> individual P-<strong>STAT</strong> commands contain all <strong>the</strong> variables from <strong>the</strong> variable named Siblings<br />

through <strong>the</strong> variable named Children, plus <strong>the</strong> three variables, Occup.Mo<strong>the</strong>r, Race and Age. In this case,<br />

<strong>the</strong>re will be an error message if <strong>the</strong> variable named Children has a position in <strong>the</strong> file before that of <strong>the</strong> variable<br />

named Siblings, or if Occup.Mo<strong>the</strong>r, Race or Age have positions in <strong>the</strong> file between Siblings and Children. O<strong>the</strong>r<br />

than <strong>the</strong>se situations, <strong>the</strong> order of <strong>the</strong> individual variables or ranges does not matter.<br />

O<strong>the</strong>r valid selections are:<br />

[ CASES 1 10 TO 50 56 ]<br />

[ DROP V(13) TO V(16) Occupation ]<br />

[ KEEP V(1) Education TO V(23) Region V(3) ]<br />

The system variable .ON. may be used <strong>to</strong> make <strong>the</strong> referencing and typing of variable selections easier. The<br />

clause:<br />

[ KEEP V(6) Children .ON. ]<br />

instructs P-<strong>STAT</strong> <strong>to</strong> use <strong>the</strong> sixth variable in <strong>the</strong> file and all <strong>the</strong> variables from <strong>the</strong> one named Children through<br />

<strong>the</strong> last variable in <strong>the</strong> file. .ON. means “from here on through <strong>the</strong> end.” This is particularly useful if you have<br />

added a number of new variables <strong>to</strong> <strong>the</strong> file and are not certain just how many you currently have.<br />

The use of: 1) TO <strong>to</strong> indicate a range, and 2) .ON. <strong>to</strong> indicate “from <strong>the</strong> current item on through <strong>the</strong> last item,”<br />

are valid in both variable and case selection clauses.<br />

2.6 Multiple Variable Selections<br />

DROP and KEEP, unlike CASES, may be used in more than one modification clause. Variable selections take<br />

place in a sequential and cumulative manner. An initial variable selection often winnows out <strong>the</strong> unnecessary variables.<br />

A second selection occurs after all <strong>the</strong> modifications are done and selects only those variables actually<br />

needed as input for a given command:<br />

LIST Dept<br />

[ KEEP Name TO Race Test1 Test2 ;<br />

GENERATE Pass = 1;<br />

GENERATE Test.Average = ( Test1 + Test2 ) / 2 ;<br />

IF Test.Average LT 65, SET Pass = 0 ;<br />

DROP Test1 Test2 ] $<br />

Variables should not be selected out of <strong>the</strong> file before <strong>the</strong>y are used. The following command causes an error<br />

because <strong>the</strong> variable named Year is not available when <strong>the</strong> IF clause is processed:<br />

LIST Produce [ DROP Year ;<br />

IF Year EQ 2002, RETAIN ] $<br />

The correct order of <strong>the</strong> variable selection clauses is:


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.7<br />

LIST Product [ IF Year EQ 2002, RETAIN ;<br />

DROP Year ] $<br />

The value of <strong>the</strong> variable Year is tested and <strong>the</strong> case retained when it is 2002, and <strong>the</strong>n variable Year is dropped<br />

from each case of <strong>the</strong> file as it is passed <strong>to</strong> <strong>the</strong> LIST command.<br />

DROP and KEEP require a great deal of overhead because both <strong>the</strong> variable names and <strong>the</strong> data are rearranged<br />

for each DROP or KEEP clause. Your run will be more efficient if you limit <strong>the</strong> number of DROP and KEEP<br />

clauses in any one command. For example:<br />

[ DROP V(1) ] [ DROP V(1) ] [ DROP V(1) ]<br />

is less efficent than<br />

[ DROP V(1) TO V(3) }<br />

even though <strong>the</strong>y do <strong>the</strong> same thing.<br />

RETAIN keeps <strong>the</strong> case. It is <strong>the</strong>n passed <strong>to</strong> <strong>the</strong> next <strong>PPL</strong> clause or, if <strong>the</strong>re are no more clauses, <strong>to</strong> <strong>the</strong> current<br />

command. DELETE does not pass a case <strong>to</strong> any subsequent <strong>PPL</strong> clauses or <strong>to</strong> <strong>the</strong> current command. <strong>PPL</strong> does<br />

not change <strong>the</strong> contents of <strong>the</strong> input file. <strong>PPL</strong> only affects what is sent <strong>to</strong> <strong>the</strong> current command. If <strong>the</strong> command<br />

creates an output file, <strong>the</strong> changes are “permanent”. If <strong>the</strong> command does not create an output file, <strong>the</strong> changes<br />

are temporary.<br />

2.7 Reordering Variables<br />

The KEEP instruction may be used <strong>to</strong> reorder variables For example, any of <strong>the</strong> following clauses reorders <strong>the</strong><br />

variables Rent, Sex and Age in listings of File1:<br />

LIST File1 [ KEEP Age Sex Rent ] $<br />

LIST File1 [ KEEP Sex Age Rent ] $<br />

LIST File1 [ KEEP Rent Age Sex ] $<br />

Often <strong>the</strong> rearrangement is done <strong>to</strong> place one or two variables at <strong>the</strong> left of <strong>the</strong> file. These two clauses are<br />

equivalent:<br />

[ KEEP V(16) V(23) V(1) TO V(15)<br />

V(17) TO V(22) V(24) .ON. ]<br />

[ KEEP V(16) V(23) .OTHERS. ]<br />

.OTHERS. is a system variable meaning all <strong>the</strong> variables which are not mentioned elsewhere in <strong>the</strong> KEEP<br />

clause. System variables are set by P-<strong>STAT</strong>. Most of <strong>the</strong> system variables cannot be changed by <strong>the</strong> user but are<br />

available for use and testing in <strong>PPL</strong> statements. System variable names always begin and end with a decimal point.<br />

Since variable names must begin with a letter, system variable names will never conflict with legal variable names.<br />

.NEW., .CHARACTER. and .NUMERIC. are system variables which can be used after KEEP <strong>to</strong> select or reorder<br />

variables. .NEW. refers <strong>to</strong> any new variables which have been created in <strong>the</strong> previous <strong>PPL</strong> clauses.<br />

.CHARACTER. refers <strong>to</strong> all <strong>the</strong> character variables and .NUMERIC. refers <strong>to</strong> all <strong>the</strong> numeric variables.<br />

LIST Dept<br />

[ KEEP Name Test.1 Test.2 ;<br />

GENERATE Pass = 1 ;<br />

GENERATE Test.Average = ( Test.1 + Test.2) / 2 ;<br />

IF Test.Average LT 65, SET Pass = 0 ;<br />

KEEP Name .NEW. ] $<br />

In this example, Name, <strong>the</strong> original variable, and <strong>the</strong> new variables created in this command, are included in <strong>the</strong><br />

list. .NEW. and .OTHERS. can be used both <strong>to</strong> rearrange and <strong>to</strong> select variables:


2.8 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

[ KEEP .NEW. .OTHERS. ]<br />

[ KEEP .OTHERS. Age .NEW. Race ]<br />

2.8 Masks and Wildcards<br />

Masks and wildcards are shortcuts that make it easier <strong>to</strong> refer <strong>to</strong> variables that have ei<strong>the</strong>r a pattern <strong>to</strong> <strong>the</strong>ir order<br />

in <strong>the</strong> input file or a common prefix or suffix in <strong>the</strong>ir names. Masks are strings of characters that mean “yes” and<br />

“no”. In this example, <strong>the</strong> inner paren<strong>the</strong>sis contains a mask (MASK 100) with a length of three:<br />

[ KEEP Test1 .ON. (MASK 100) ]<br />

A variable or case selection mask is a string of digits that are ei<strong>the</strong>r zeros or ones. Variable selection starts<br />

with <strong>the</strong> variable named Test1 and continues through <strong>the</strong> last variable in <strong>the</strong> file (.ON.), applying <strong>the</strong> mask <strong>to</strong> successive<br />

groups of three variables. Variables corresponding <strong>to</strong> mask values of 1 are kept (“yes”) and variables<br />

corresponding <strong>to</strong> mask values of 0 are dropped (“no”). In this example, if <strong>the</strong> variable named Test1 is in position<br />

6 in <strong>the</strong> file, variables 6, 9, 12, and so on, are selected. Variables 1 <strong>to</strong> 5, 7, 8, 10, 11, 13, 14, and so on, are not<br />

selected.<br />

Masks are particularly useful when <strong>the</strong> file contains repeating groups of variables and only some are needed<br />

for a particular analysis. Given <strong>the</strong>se variables:<br />

Date1, Grade1, Date2, Grade2, .... Date9, Grade9<br />

this variable selection clause selects <strong>the</strong> Grade variables:<br />

[ KEEP Grade1 TO Grade9 (MASK 10) ]<br />

The following variable selection clause could be used <strong>to</strong> reorder <strong>the</strong> variables so that all those whose names begin<br />

with “Date” are followed by those whose names begin with “Grade”:<br />

[ KEEP Date1 TO Grade9 (MASK 10)<br />

Date1 TO Grade9 (MASK 01) ]<br />

Masks may also be used in case selection clauses:<br />

[ CASES 5 .ON. ( MASK 1000 ) ]<br />

The question mark “?” is used as a wildcard, that is, <strong>to</strong> refer <strong>to</strong> any variables with a common prefix or suffix<br />

in <strong>the</strong>ir names. (The question mark replaces <strong>the</strong> asterisk, used in earlier versions of P-<strong>STAT</strong>, as <strong>the</strong> wildcard character.<br />

This avoids any possible confusion of <strong>the</strong> wildcard with <strong>the</strong> symbol for multiplication, which is <strong>the</strong><br />

asterisk.) This selection clause:<br />

[ KEEP Grade? ]<br />

keeps all variables beginning with <strong>the</strong> character string “Grade”. This clause:<br />

[ KEEP ?Batch ]<br />

keeps all variables ending with <strong>the</strong> character string “Batch”. Wildcard notation can be used <strong>to</strong> reorder <strong>the</strong> variables<br />

so that all <strong>the</strong> variables beginning <strong>the</strong> “Date” are followed by all <strong>the</strong> variables beginning with “Grade” and <strong>the</strong>n<br />

by any o<strong>the</strong>r variables in <strong>the</strong> same order that <strong>the</strong>y occur in <strong>the</strong> input file.<br />

[ KEEP Date? Grade? .OTHERS. ]<br />

The prefix or suffix used with <strong>the</strong> wildcard ? must be unique <strong>to</strong> <strong>the</strong> desired variables. This KEEP instruction:<br />

[ KEEP Family.ID <strong>Inc</strong>ome.Male.HH <strong>Inc</strong>ome.Fem.HH <strong>Inc</strong>ome.Total ]<br />

may be shortened <strong>to</strong>:<br />

[ KEEP Family.ID <strong>Inc</strong>ome.? ]<br />

However, if <strong>the</strong> file also contains <strong>the</strong> variables <strong>Inc</strong>ome.Last.Yr and <strong>Inc</strong>ome.Child, <strong>the</strong>y will also be kept. Sometimes,<br />

an error situation results because <strong>the</strong> wildcard reference is not unique:


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.9<br />

[ KEEP Family.ID <strong>Inc</strong>ome.? <strong>Inc</strong>ome.Total ]<br />

This KEEP instruction includes <strong>the</strong> variable <strong>Inc</strong>ome.Total twice, once in <strong>the</strong> middle of <strong>the</strong> file and a second time<br />

as <strong>the</strong> right-most variable. This is an error because each variable in a P-<strong>STAT</strong> system file must have a unique<br />

name.<br />

Case is ignored in wildcard selection. Variable ax and bx are both selected by ?x, or for that matter, by ?X.<br />

2.9 MODIFYING AND GENERATING VARIABLES<br />

Values are changed using <strong>the</strong> SET instruction, which “sets” an existing variable <strong>to</strong> new values. New variables are<br />

created using <strong>the</strong> GENERATE instruction, which “generates” a new variable with <strong>the</strong> specified values. If SET is<br />

used with a name that is not <strong>the</strong> name of a variable in <strong>the</strong> file, an error message is printed. If GENERATE is used<br />

with a name that already belongs <strong>to</strong> a variable in <strong>the</strong> file, an error message is printed. In general it is a good practice<br />

<strong>to</strong> generate all <strong>the</strong> variables that will be needed before any recodes or logical selections.<br />

2.10 Modifying Variables with SET<br />

The keyword SET indicates <strong>to</strong> P-<strong>STAT</strong> that modification is <strong>to</strong> be done <strong>to</strong> an existing variable. There are four elements<br />

<strong>to</strong> a SET clause:<br />

1. The keyword SET;<br />

2. The name or position of <strong>the</strong> variable that is <strong>to</strong> be modified;<br />

3. An equal-sign (=);<br />

4. The value or expression <strong>to</strong> be used as <strong>the</strong> new value of that variable.<br />

The format of <strong>the</strong> SET instruction is illustrated in Figure 2.3.<br />

__________________________________________________________________________<br />

Figure 2.3 Format of <strong>the</strong> SET Instruction<br />

SET Var Opera<strong>to</strong>r Expression<br />

[ SET Score = Test ]<br />

[ SET Score = Test1 + Test2 ]<br />

[ SET Score = SQRT ( Score ) ]<br />

[ SET Notes = 'Late 2 days' ]<br />

[ SET V(1) = V(1) + Test ]<br />

[ SET V(1) = 1 + V(1) ]<br />

A variable may be referred <strong>to</strong> by its name or by its position. Note that in a SET clause, constants are often<br />

used. Character constants must be enclosed in quotes. There is often no way <strong>to</strong> infer from <strong>the</strong> context whe<strong>the</strong>r a<br />

number is a constant or <strong>the</strong> position of a variable. Therefore, <strong>the</strong> <strong>PPL</strong> syntax rule is that a number by itself is a<br />

constant, and a number indicated with <strong>the</strong> V(n) notation refers <strong>to</strong> a variable position.<br />

In addition <strong>to</strong> distinguishing between constants and variable positions, <strong>the</strong> “V” notation references <strong>the</strong> vec<strong>to</strong>r<br />

containing <strong>the</strong> values for <strong>the</strong> current case. The subscript (<strong>the</strong> contents of <strong>the</strong> paren<strong>the</strong>sis) pointing in<strong>to</strong> <strong>the</strong> V vec<strong>to</strong>r<br />

may be a number or an expression. V(17) points <strong>to</strong> <strong>the</strong> value of <strong>the</strong> variable in <strong>the</strong> 17th position of a given case,<br />

in o<strong>the</strong>r words <strong>to</strong> its 17th variable. V(Region) points <strong>to</strong> <strong>the</strong> value of <strong>the</strong> first variable if Region is equal <strong>to</strong> 1 and<br />

<strong>to</strong> <strong>the</strong> value of variable 33 if Region is equal <strong>to</strong> 33. Calculation of variable positions is discussed in detail later in<br />

this manual.<br />

If <strong>the</strong> variable that follows <strong>the</strong> SET instruction is not found in <strong>the</strong> file, an error occurs:


2.10 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

ERROR... Variable Bad.Label, used in a <strong>PPL</strong> phrase,<br />

is not found in <strong>the</strong> file.<br />

In an interactive session, control is given <strong>to</strong> <strong>the</strong> internal P-<strong>STAT</strong> edi<strong>to</strong>r so that <strong>the</strong> error may be corrected and execution<br />

can continue.<br />

A series of SET instructions can be separated in<strong>to</strong> individual clauses or grouped <strong>to</strong>ge<strong>the</strong>r in <strong>the</strong> same modification<br />

clause:<br />

LIST File [ SET Score = SQRT (Score),<br />

SET Test = Test + 1,<br />

SET <strong>Inc</strong>hes = Feet / 12 ] $<br />

2.11 Using INCREASE and DECREASE Instead of SET<br />

These usages of SET increase or decrease <strong>the</strong> value of an existing variable ei<strong>the</strong>r by a constant or by an expression:<br />

[ SET Count = Count + 1 ]<br />

[ SET Total = Total + Score ]<br />

[ SET Used = Used - 3 ]<br />

SET clauses like <strong>the</strong>se may be expressed more simply using <strong>the</strong> instructions INCREASE and DECREASE:<br />

[ INCREASE Count ]<br />

[ INCREASE Total BY Score ]<br />

[ DECREASE Used BY 3 ]<br />

When BY is omitted, BY 1 is assumed. INCREASE may be abbreviated <strong>to</strong> INC and DECREASE may be abbreviated<br />

<strong>to</strong> DEC. Wearing new pants in <strong>the</strong> rain is an example of DECREASE.<br />

2.12 Creating New Variables with GENERATE<br />

The GENERATE instruction indicates that a new variable is <strong>to</strong> be created. It may be abbreviated <strong>to</strong> GEN. The<br />

format is like that of <strong>the</strong> SET instruction. GENERATE is immediately followed by <strong>the</strong> name of <strong>the</strong> variable <strong>to</strong> be<br />

created. This name must be one that does not already exist within <strong>the</strong> file. If a question mark (?) is used instead<br />

of a name, P-<strong>STAT</strong> generates a variable name. This name is <strong>the</strong> position of <strong>the</strong> variable in <strong>the</strong> file with <strong>the</strong> prefix<br />

VAR:<br />

LIST File [ GENERATE Total = Score1 + Score2 ] $<br />

LIST File [ GENERATE ID:C = Last.Name ] $<br />

LIST File [ GENERATE ? = MEAN ( XA TO XE ) ] $<br />

Character variables need “:C” or “:Cnn”, where nn is <strong>the</strong> maximum number of characters, directly after <strong>the</strong>ir<br />

names. When <strong>the</strong> number is not supplied, 16 is assumed. The following creates a new variable which can contain<br />

up <strong>to</strong> 30 characters:<br />

[ GENERATE ?:C30 = 'generated character variable';<br />

Once generated, <strong>the</strong> variables are referenced by just <strong>the</strong>ir names. The expression following <strong>the</strong> “=” in GENER-<br />

ATE is exactly like that following <strong>the</strong> “=” in SET. The MEAN function in <strong>the</strong> example above computes <strong>the</strong> means<br />

of <strong>the</strong> variables in <strong>the</strong> list following <strong>the</strong> function name.<br />

The difference between SET and GENERATE is that <strong>the</strong> variable referenced by SET must already exist while<br />

<strong>the</strong> variable referenced by GENERATE must not yet exist. If <strong>the</strong> variable referenced by GENERATE does exist,<br />

an error occurs:<br />

Error... Attempting <strong>to</strong> GENERATE a new variable named var4,<br />

but <strong>the</strong> name already exists in position 4.<br />

The variable name (label) and <strong>the</strong> position it currently occupies (n) are both supplied in <strong>the</strong> error message.


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.11<br />

The expression which follows <strong>the</strong> “=” in SET and GENERATE instructions can be ano<strong>the</strong>r variable, a constant,<br />

or a complicated expression involving variables, constants and functions. These are all valid expressions<br />

after an equal-sign:<br />

Age<br />

3.33<br />

'Sarah Wilson'<br />

SQRT ( V(3) + Age / 12 )<br />

RECODE ( Age, 80 TO 99 = 80 )<br />

The “+” and “/” are numeric opera<strong>to</strong>rs, whereas SQRT and RECODE are functions. SQRT is <strong>the</strong> square root function.<br />

The RECODE function allows individual values of a variable <strong>to</strong> be changed; it is discussed in detail later in<br />

this manual.<br />

If <strong>the</strong> GENERATED variable is not set <strong>to</strong> anything as in:<br />

[ GENERATE abc ]<br />

it is set <strong>to</strong> Missing 1.<br />

2.13 Numeric Opera<strong>to</strong>rs and <strong>the</strong>ir Order<br />

In <strong>the</strong> example above, <strong>the</strong> expression “Age / 12” is a numeric expression which requests <strong>the</strong> value of <strong>the</strong> variable<br />

Age, divided by <strong>the</strong> constant 12. The slash (/) is <strong>the</strong> symbol for division. The numeric or arithmetic opera<strong>to</strong>rs are:<br />

+ for addition<br />

- for subtraction<br />

* for multiplication<br />

/ for division<br />

** for exponentiation<br />

A series of unparen<strong>the</strong>sized numeric operations may not necessarily be performed from left <strong>to</strong> right. All exponentiation<br />

at a given paren<strong>the</strong>sis level is done first, followed by all multiplication and division, followed by all<br />

addition and subtraction. If <strong>the</strong>re is a series of additions and subtractions, <strong>the</strong>y are performed from left <strong>to</strong> right.<br />

If <strong>the</strong>re is a series of multiplications and divisions, <strong>the</strong>y are also performed from left <strong>to</strong> right. A series of exponentiations,<br />

however, is done from right <strong>to</strong> left. Therefore:<br />

A - B + C is done as ( A - B ) + C<br />

A / B * C is done as ( A / B ) * C<br />

A ** B ** C is done as A ** ( B ** C )<br />

A + B * C is done as A + ( B * C )<br />

A * B ** C is done as A * ( B ** C )<br />

If this order of execution is not <strong>the</strong> desired order, paren<strong>the</strong>ses may be used <strong>to</strong> enclose portions of a numeric<br />

expression. Operations within a pair of paren<strong>the</strong>ses are performed before operations outside, regardless of <strong>the</strong> order<br />

defined above. Thus,<br />

( A + B + C ) / 3<br />

would take <strong>the</strong> sum of A, B, and C and divide <strong>the</strong> result by 3. Without <strong>the</strong> paren<strong>the</strong>ses, <strong>the</strong> result would be C<br />

divided by 3 plus A and B, that is, A + B + ( C / 3 ).<br />

2.14 Functions<br />

P-<strong>STAT</strong> functions are special expressions which transform variables according <strong>to</strong> particular rules. For instance,<br />

<strong>the</strong> SQRT function calculates <strong>the</strong> square root of a variable or an expression. A variable or expression used by a<br />

function is called <strong>the</strong> “argument” of that function.<br />

Functions require at least one argument. Arguments follow <strong>the</strong> function name and are enclosed in paren<strong>the</strong>ses.<br />

Some of <strong>the</strong> functions, like <strong>the</strong> square-root function SQRT, require only a single argument, an expression


2.12 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

which is <strong>to</strong> be used by <strong>the</strong> function. This expression can be a variable name or position, a constant, ano<strong>the</strong>r function<br />

with its arguments, or a combination of such elements:<br />

[ SET Score = SQRT ( Score ) ]<br />

[ SET Score = SQRT ( 55 ) ]<br />

[ SET Score = SQRT ( Score + 33 ) ]<br />

[ SET Score = SQRT ( V(1) * .5 ) ]<br />

Functions like <strong>the</strong> square root function SQRT are called “numeric functions”. The argument for a numeric<br />

function is a single expression. If any of <strong>the</strong> elements in <strong>the</strong> expression is a missing value, <strong>the</strong> result is a missing<br />

value. If <strong>the</strong> expression yields a good value which is legal for <strong>the</strong> function, <strong>the</strong> function will produce an appropriate<br />

result. If <strong>the</strong> argument is invalid, like SQRT(-3), <strong>the</strong> result is set <strong>to</strong> missing.<br />

A number of functions, such as <strong>the</strong> MEAN function, operate on a list of arguments:<br />

[ SET AVERAGE = MEAN ( V(2) Test1 Test2 Test3 ) ]<br />

[ SET AVERAGE = MEAN ( V(2) Test1 TO Test3 ) ]<br />

Each argument is a numeric variable name or position. The function takes <strong>the</strong> list of arguments and yields a single<br />

value. For instance, <strong>the</strong> MEAN function illustrated above calculates <strong>the</strong> mean of <strong>the</strong> variables in <strong>the</strong> list.<br />

The functions, which are covered in detail later in this manual, can be broadly classified as:<br />

1. numeric functions such as SQRT<br />

2. list functions which operate on a list of variables such as MEAN<br />

3. character functions such as UPPER, LOWER and CAPS<br />

4. special functions. For example, RECODE and NCOT are used <strong>to</strong> recode <strong>the</strong> values of one or more<br />

variables. SPLIT and COLLECT are used for cross case data manipulation. Date and time functions<br />

have a chapter of <strong>the</strong>ir own later in this manual.<br />

2.15 LOGICAL SELECTION OF CASES<br />

Cases in a P-<strong>STAT</strong> system file may be selected or deleted from processing by logical testing. This is sometimes<br />

referred <strong>to</strong> as “filtering.” IF is <strong>the</strong> keyword that precedes all logical selections and modifications The following<br />

is a discussion of <strong>the</strong> simple logical IF. Full IF-THEN-ELSE blocks are discussed later in this manual.<br />

__________________________________________________________________________<br />

Figure 2.4 Format of <strong>the</strong> IF Clause<br />

Logical<br />

IF Exp 1 Opera<strong>to</strong>r Exp 2 Action<br />

[ IF Test1 EQ Test2, DELETE ]<br />

[ IF Test1 LT 3, RETAIN ]<br />

[ IF Test1 - 3 GE Test4 * .5, SET .... ]<br />

[ IF Test1 - V(3) GT SQRT (Test3), SET .... ]<br />

[ IF SQRT (Test1) GT .2, SET .... ]<br />

[ IF School EQ 'Longwood', SET .... ]<br />

__________________________________________________________________________<br />

The IF itself is usually composed of five parts:<br />

1. <strong>the</strong> keyword IF<br />

2. an expression


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.13<br />

3. an opera<strong>to</strong>r indicating <strong>the</strong> relationship between <strong>the</strong> expressions<br />

4. a second expression<br />

5. one or more action instructions <strong>to</strong> be taken.<br />

The format of <strong>the</strong> IF clause is illustrated in Figure 2.4. Expressions may be as simple as a variable name or a<br />

constant, or <strong>the</strong>y may be complex numeric or character expressions combining variables, constants and functions.<br />

Character constants need <strong>to</strong> be enclosed in single or double quotes.<br />

The V(n) notation provides a consistent means for differentiating between a constant and a variable position.<br />

Test1 - 3 means <strong>the</strong> value of <strong>the</strong> variable named Test1 minus <strong>the</strong> constant 3; Test1 - V(3) means <strong>the</strong> value of <strong>the</strong><br />

variable named Test1 minus <strong>the</strong> value of <strong>the</strong> variable located in position 3.<br />

Any of <strong>the</strong> expressions in <strong>the</strong> IF can be complex and refer <strong>to</strong> variables, constants or functions. The functions<br />

<strong>the</strong>mselves can call functions:<br />

[ IF INT ( SQRT (XA ) ) GT SQRT (XC) + 5, SET XF = 1 ]<br />

Here <strong>the</strong> square root of XA is computed. Then that result is truncated <strong>to</strong> an integer using <strong>the</strong> INT function. Finally,<br />

<strong>the</strong> result is compared with <strong>the</strong> result of <strong>the</strong> second expression, namely, <strong>the</strong> sum of <strong>the</strong> square root of XC and <strong>the</strong><br />

constant 5. If <strong>the</strong> comparison is evaluated as true, that is, if <strong>the</strong> value of <strong>the</strong> first expression is greater than <strong>the</strong><br />

value of <strong>the</strong> second expression when both expressions are non-missing, <strong>the</strong> specified action (SET XF = 1) occurs.<br />

The action taken after <strong>the</strong> evaluation of an IF clause is typically <strong>the</strong> modification of a variable’s value (SET),<br />

<strong>the</strong> keeping of a case (RETAIN), or <strong>the</strong> exclusion of a case ( DELETE).<br />

RETAIN keeps a case and passes it <strong>to</strong> <strong>the</strong> next <strong>PPL</strong> clause or, if <strong>the</strong>re are no more clauses, <strong>to</strong> <strong>the</strong> current<br />

command. The case is passed through <strong>to</strong> <strong>the</strong> current command, unless it is deleted in a subsequent <strong>PPL</strong> clause.<br />

DELETE does not pass a case <strong>to</strong> any subsequent <strong>PPL</strong> clauses or <strong>to</strong> <strong>the</strong> current command. It deletes <strong>the</strong> case<br />

from any fur<strong>the</strong>r modification or testing, and from <strong>the</strong> current command. In o<strong>the</strong>r words, <strong>the</strong> processing of <strong>PPL</strong><br />

ceases for that case. The next case is read and <strong>PPL</strong> is restarted with <strong>the</strong> new case.<br />

The action that follows <strong>the</strong> IF test is usually taken only if <strong>the</strong> expression is true.<br />

IF Test GE 65, SET Pass = 'true', is <strong>the</strong> same as:<br />

IF Test GE 65, T.SET Pass = 'true',<br />

Any action can be prefaced by any combination of <strong>the</strong> letters “T” for true, “F” for false, and “M” for missing <strong>to</strong><br />

control how <strong>the</strong> results of <strong>the</strong> IF test are <strong>to</strong> be evaluated.<br />

IF Test LT 65, MF.SET Pass = 'false', T.SET Pass = 'true';<br />

The action section can contain multiple actions, each one prefaced with appropriate “TMF” combinations.<br />

2.16 Logical Opera<strong>to</strong>rs<br />

The basic logical opera<strong>to</strong>rs are <strong>the</strong> following:<br />

Meaning Symbol<br />

equal EQ<br />

not equal NE<br />

less than LT<br />

less than or equal LE<br />

greater than GT<br />

greater than or equal GE<br />

Each expression in an IF clause is analyzed and a value is computed. The expressions are <strong>the</strong>n compared according<br />

<strong>to</strong> <strong>the</strong> logical opera<strong>to</strong>r that was used. If <strong>the</strong> logical opera<strong>to</strong>r correctly describes <strong>the</strong> relationship between<br />

<strong>the</strong> expressions, <strong>the</strong> IF statement is evaluated as true. If it is incorrect, <strong>the</strong> IF statement is false. If ei<strong>the</strong>r expression<br />

is missing so that <strong>the</strong> comparison cannot be made, <strong>the</strong> IF is evaluated as missing.<br />

The logical opera<strong>to</strong>rs may be prefaced with “X” for eXact comparisons of character strings:


2.14 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

[ IF Symp<strong>to</strong>m XEQ 'a', RETAIN ]<br />

When an exact comparison is specified, <strong>the</strong> case of <strong>the</strong> character string must be exactly <strong>the</strong> same as that of <strong>the</strong> test<br />

string for <strong>the</strong> IF statement <strong>to</strong> be true. In <strong>the</strong> example above, <strong>the</strong> value of Symp<strong>to</strong>m must be a lower case “a” for<br />

<strong>the</strong> case <strong>to</strong> be retained; an upper case “A” would be evaluated as false. When <strong>the</strong> logical opera<strong>to</strong>rs are not prefaced<br />

with “X”, <strong>the</strong> case of character strings is not relevant.<br />

In addition <strong>to</strong> <strong>the</strong>se opera<strong>to</strong>rs <strong>the</strong>re are 6 logical opera<strong>to</strong>rs which are used <strong>to</strong> compare dates and times. They<br />

are described in Chapter 10, “<strong>PPL</strong>: Date and Time Commands and Functions”.<br />

2.17 The Special Opera<strong>to</strong>rs MISSING and GOOD<br />

The values .M. and .G. are <strong>the</strong> system values for missing and good. Missing can be fur<strong>the</strong>r specified as .M1., .M2.<br />

and .M3. . Note that names for system values and system variables look much like variable names except that <strong>the</strong>y<br />

begin and end with a decimal point. This:<br />

[ IF Age EQ .M., DELETE ]<br />

may be used <strong>to</strong> delete any case with a missing value of Age. Note that when an IF statement is used <strong>to</strong> explicitly<br />

test for missing or good values, it has only a true or false result.<br />

The two special opera<strong>to</strong>rs MISSING and GOOD can also be used <strong>to</strong> test whe<strong>the</strong>r or not missing data are present<br />

in an expression:<br />

[ IF Test1 MISSING, is <strong>the</strong> same as<br />

[ IF Test1 EQ .M.,<br />

[ IF Test1 GOOD, is <strong>the</strong> same as<br />

[ IF Test1 EQ .G.,<br />

The special opera<strong>to</strong>rs MISSING and GOOD combine <strong>the</strong> “EQ” (=) opera<strong>to</strong>r and <strong>the</strong> system value .M. or .G.<br />

in<strong>to</strong> a single keyword. MISSING1, MISSING2, and MISSING3 can be used in <strong>the</strong> same way as MISSING <strong>to</strong> test<br />

specifically for <strong>the</strong> individual types of missing.<br />

[ IF Age MISSING3, DELETE ]<br />

Here, a case is deleted if Age equals <strong>the</strong> system value for missing type 3.<br />

2.18 AND and OR Relationships<br />

An IF may consist of a series of logical relationships linked by AND or OR. For example:<br />

[ IF Age GE 14 AND Sex EQ 1, SET Membership = 2 ]<br />

[ IF Age LT 14 AND Sex EQ 1 OR V(1) EQ 77, DELETE ]<br />

There can be many ANDs and ORs and <strong>the</strong>y can be nested. Paren<strong>the</strong>ses control <strong>the</strong> order in which <strong>the</strong> parts of <strong>the</strong><br />

expression are evaluated:<br />

[ IF<br />

( Age GT 21 OR ( Voter EQ 2 AND Married EQ 1 ) )<br />

AND<br />

( Education GT 12 OR ( Job EQ 4 AND <strong>Inc</strong>ome GT 20000 ) ),<br />

RETAIN ]<br />

This example illustrates <strong>the</strong> types of complex expression that are possible. However, a frequent cause of an empty<br />

file (no cases found) is an IF with expressions so complex that <strong>the</strong> user cannot follow <strong>the</strong> logic.


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.15<br />

__________________________________________________________________________<br />

Figure 2.5 AND and OR: Evaluations of Expressions<br />

In <strong>the</strong> following table, <strong>the</strong> evaluations of <strong>the</strong> expressions are:<br />

t for true, f for false and m for missing.<br />

EXPRESSIONS: EVALUATIONS:<br />

Exp1 Exp2 Exp1 AND Exp2 Exp1 OR Exp2<br />

t t t t<br />

t f f t<br />

t m m t<br />

f t f t<br />

f f f f<br />

f m f m<br />

m t m t<br />

m f f m<br />

m m m m<br />

__________________________________________________________________________<br />

Unless paren<strong>the</strong>ses indicate o<strong>the</strong>rwise, ANDs are done before ORs.<br />

Paren<strong>the</strong>ses determine clusters of logic that get evaluated as a piece. In <strong>the</strong> previous example, if <strong>the</strong> value of<br />

Age is not greater than 21 or if Age is missing, <strong>the</strong> expression:<br />

( Voter EQ 2 AND Married EQ 1 )<br />

needs <strong>to</strong> be evaluated. If this expression is also not true, <strong>the</strong> entire modification clause cannot be true, and <strong>the</strong> rest<br />

of <strong>the</strong> clause does not need <strong>to</strong> be processed. However, if this expression is true, <strong>the</strong> next expression:<br />

( Education GT 12 OR ( .... ) )<br />

is evaluated in <strong>the</strong> same manner. If Education is greater than 12, <strong>the</strong>re is no need <strong>to</strong> complete <strong>the</strong> evaluation of<br />

<strong>the</strong> expression. A true result is returned and <strong>the</strong> case continues <strong>to</strong> <strong>the</strong> next <strong>PPL</strong> clause. However, if Education is<br />

not greater than 12 or is missing, <strong>the</strong> evaluation proceeds because <strong>the</strong> expression following <strong>the</strong> OR might be true.<br />

Figure 2.5 contains a table which shows <strong>the</strong> interaction of true, false and missing evaluations with <strong>the</strong> AND<br />

and OR opera<strong>to</strong>rs. The following example illustrates OR with three different evaluations:<br />

( IF Occupation EQ 40 OR Education EQ 12 )<br />

Occupation Education Evaluation<br />

43 (f) 12 (t) true<br />

43 (f) 16 (f) false<br />

43 (f) - (m) missing<br />

In <strong>the</strong> third example, <strong>the</strong> first expression is false (Occupation is not 40), but <strong>the</strong> second expression is nei<strong>the</strong>r<br />

false nor true because <strong>the</strong> value for Education is missing.<br />

Because AND has precedence over OR some of <strong>the</strong> paren<strong>the</strong>ses in <strong>the</strong> previous example can be omitted.


2.16 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

can be written as<br />

[ IF<br />

( Age GT 21 OR ( Voter EQ 2 AND Married EQ 1 ) )<br />

[ IF ( Age GT 21 OR Voter EQ 2 AND Married EQ 1 )<br />

The full statement reduces <strong>to</strong>:<br />

[ IF ( Age GT 21 OR Voter EQ 2 AND Married EQ 1 )<br />

AND<br />

( Education GT 12 OR Job EQ 4 AND <strong>Inc</strong>ome GT 20000 ),<br />

However, <strong>the</strong> use of <strong>the</strong> paren<strong>the</strong>ses is recommended whenever <strong>the</strong> logic is complex with a mixture of AND and<br />

OR phrases.<br />

2.19 Common Errors in Complex Expressions<br />

The most common errors in constructing a complex expression occur because <strong>the</strong> relationship that follows <strong>the</strong> IF,<br />

OR, or AND is not complete. For example:<br />

[ IF Occupation EQ 3 OR Occupation EQ 4, .... is correct<br />

[ IF Occupation EQ 3 OR 4, .... is incorrect<br />

In <strong>the</strong> second example above, “Occupation EQ 3” is complete. It has <strong>the</strong> proper three parts with “Occupation” as<br />

<strong>the</strong> first expression, “EQ” as <strong>the</strong> opera<strong>to</strong>r, and “3” as <strong>the</strong> second expression. However,<br />

[ IF Occupation EQ 3 OR 4 EQ Occupation,<br />

is also allowed. Since a number on ei<strong>the</strong>r side of <strong>the</strong> opera<strong>to</strong>r is a valid expression, <strong>the</strong> 4 following <strong>the</strong> OR (in <strong>the</strong><br />

earlier example):<br />

[ IF Occupation EQ 3 OR 4, ....<br />

is interpreted as <strong>the</strong> first expression in a clause. The “,” which follows it is not a legal opera<strong>to</strong>r. Error messages<br />

indicate what was expected in <strong>the</strong> clause and what was found:<br />

LIST Patients [ IF Occupation EQ 3 or 4, RETAIN ] $<br />

ERROR... Expected a logical opera<strong>to</strong>r like EQ<br />

RETAIN ] $<br />

A second common source of error is <strong>to</strong> include <strong>the</strong> IF for each relationship in <strong>the</strong> complex statement. The<br />

following is correct:<br />

This is incorrect:<br />

[ IF Occupation EQ 3 OR Occupation EQ 4, ....<br />

[ IF Occupation EQ 3 OR IF Occupation EQ 4, ....<br />

In this example, an error message results because IF is a legal name for a variable:<br />

LIST KK [ IF Occupation EQ 3 OR IF Occupation EQ 4,<br />

RETAIN ] $<br />

ERROR... Expected a logical opera<strong>to</strong>r like EQ<br />

Occupation EQ 4,<br />

Since IF is a legal variable name, and a variable is a valid expression, P-<strong>STAT</strong> is expecting <strong>the</strong> next character<br />

string <strong>to</strong> be an opera<strong>to</strong>r such as EQ or LT. The variable name Occupation is not a legal opera<strong>to</strong>r and an error condition<br />

occurs.


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.17<br />

2.20 AMONG and NOTAMONG<br />

Two o<strong>the</strong>r logical opera<strong>to</strong>rs, AMONG and NOTAMONG, simplify <strong>the</strong> specification of logical relationships.<br />

They follow an initial expression and require a list of values and variables (not a second expression) as <strong>the</strong>ir<br />

argument.<br />

The argument list for AMONG and NOTAMONG contains individual values and ranges of values. This logical<br />

clause:<br />

[ IF Test.Score AMONG ( 90 TO 100 ),<br />

SET High = 1 ]<br />

produces exactly <strong>the</strong> same result as:<br />

[ IF Test.Score GE 90 AND<br />

Test.Score LE 100, SET High = 1 ]<br />

AMONG is easier <strong>to</strong> type and <strong>to</strong> understand.<br />

NOTAMONG is used similarly:<br />

[ IF Test.Score NOTAMONG ( 90 TO 100 ), DELETE ]<br />

Any cases with values on Test.Score that are below 90 or over 100 are deleted. Cases with missing values are<br />

not deleted. Prefixing <strong>the</strong> consequence:<br />

[ IF Test.Score NOTAMONG ( 90 TO 100 ), TM.DELETE ]<br />

deletes cases with missing values as well. The system variables for missing values (.M.) may not be included in<br />

<strong>the</strong> argument list for AMONG or NOTAMONG.<br />

AMONG and NOTAMONG are particularly powerful when multiple values are specified. Thus:<br />

[ IF Religion EQ 1 OR ( Religion GE 3<br />

AND Religion LE 5 ) OR Religion EQ 7<br />

OR Religion EQ 9, SET Protestant = 1 ]<br />

is exactly <strong>the</strong> same as:<br />

[ IF Religion AMONG ( 1, 3 TO 5, 7, 9 ),<br />

SET Protestant = 1 ]<br />

The arguments for <strong>the</strong> opera<strong>to</strong>rs AMONG and NOTAMONG are lists of values (constants) and variables;<br />

<strong>the</strong>y cannot be complex expressions but <strong>the</strong>y can be scratch variables. The use of commas separating <strong>the</strong> AMONG<br />

values is optional. In this example, <strong>the</strong> arguments for NOTAMONG are variable names:<br />

[ SET Low.Score = MIN ( Test1 TO Test10 );<br />

SET High.Score = MAX ( Test1 TO Test10 );<br />

IF Final.Exam NOTAMONG ( Low.Score TO High.Score ),<br />

RETAIN ]<br />

The MIN function yields <strong>the</strong> minimum value of a list of variables, which can include ranges, wildcards and<br />

.ON. . The MAX function yields <strong>the</strong> maximum value. Here <strong>the</strong>se functions are used <strong>to</strong> find <strong>the</strong> lowest and highest<br />

scores on a series of tests. If <strong>the</strong> value of <strong>the</strong> variable named Final.Exam is less than <strong>the</strong> lowest value or above <strong>the</strong><br />

highest value, <strong>the</strong> case is retained. The retained cases are students who have done ei<strong>the</strong>r better or worse than expected,<br />

given <strong>the</strong>ir scores on Test1 <strong>to</strong> Test10.<br />

AMONG and NOTAMONG may be prefaced with “X” for eXact comparisons of character strings. When<br />

exact comparisons are specified, <strong>the</strong> string must be identical and <strong>the</strong> case (upper, lower or mixed) must also be<br />

identical:<br />

[ IF Symp<strong>to</strong>m XAMONG ( 'a' 'A' 'Aa' ), RETAIN ]<br />

For example, cases with “aa”, “AA” and “aA” as values of Symp<strong>to</strong>m would not be retained.


2.18 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

2.21 MISSING DATA with AMONG and NOTAMONG<br />

If <strong>the</strong> value being tested is a missing value, <strong>the</strong> result of <strong>the</strong> IF will be missing unless it matches a missing value<br />

in <strong>the</strong> argument list. If variable TestScore has a value of MISSING2 <strong>the</strong> following <strong>PPL</strong>:<br />

[ IF TestScore AMONG ( 60 TO 100 ) T.SET Grade = 'Pass',<br />

F.SET Grade = 'Fail'.<br />

M.SET Grade = '<strong>Inc</strong>omplete' ]<br />

as expected, produces <strong>the</strong> missing result of “incomplete”.<br />

[ IF TestScore AMONG ( 60 TO 100, .M1. ) ....<br />

also produces <strong>the</strong> missing result when variable TestScore is MISSING2. However, <strong>the</strong> statement:<br />

[ IF TestScore AMONG ( 60 TO 100, .M2. ) ....<br />

produces a result of true and variable Grade has a value of “Pass” for that case.<br />

2.22 INRANGE and OUTRANGE<br />

INRANGE and OUTRANGE can be used when <strong>the</strong> test <strong>to</strong> be done is for a single range of values.<br />

[ IF TestScore INRANGE ( 60, 100 ), SET Grade = 'Pass' ]<br />

[ IF Age OUTRANGE ( 13, 19 ), DELETE ]<br />

These two examples can also be done using AMONG with <strong>the</strong> keyword “TO”. The reason for including functions<br />

INRANGE and OUTRANGE is because <strong>the</strong> names of <strong>the</strong>se functions are more intuitive for some situations than<br />

AMONG and NOTAMONG.<br />

2.23 ANY and ALL<br />

There are two o<strong>the</strong>r logical opera<strong>to</strong>rs that may follow an IF: ANY and ALL. They must be followed by a list of<br />

variables. ANY is equivalent <strong>to</strong> a series of ORs. This example:<br />

is <strong>the</strong> same as:<br />

[ IF Q11 GT 10 OR Q12 GT 10 OR Q13 GT 10 OR Q14 GT 10, DELETE ]<br />

[ IF ANY ( Q11 TO Q14 ) GT 10, DELETE ]<br />

ALL is equivalent <strong>to</strong> a series of ANDs. This example:<br />

is <strong>the</strong> same as:<br />

[ IF Q11 GOOD AND Q12 GOOD AND Q13 GOOD<br />

AND Q14 GOOD, RETAIN ]<br />

[ IF ALL ( Ql1 TO Ql4 ) GOOD, RETAIN ]<br />

The argument list which follows ALL and ANY may contain variable names or variable positions. The variable<br />

positions are indicated by V(n). A common use of ANY or ALL selects cases with good (non-missing) data<br />

on all of <strong>the</strong> variables. Ei<strong>the</strong>r of <strong>the</strong>se statements does this:<br />

[ IF ALL ( V(1) .ON. ) GOOD, RETAIN ]<br />

[ IF ANY ( V(1) .ON. ) MISSING, DELETE ]<br />

2.24 INSTRUCTIONS AFTER IF<br />

The IF statement is incomplete by itself. It must be followed by an instruction that describes <strong>the</strong> action <strong>to</strong> be taken<br />

as a consequence of <strong>the</strong> IF test. A comma (,) is <strong>the</strong> punctuation that separates <strong>the</strong> IF and <strong>the</strong> instruction which<br />

follows. The three most common instructions that follow an IF are: DELETE and RETAIN for conditional case


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.19<br />

selection, and SET for variable recoding. Functions such as LAG and DIF which work across cases should seldom<br />

be used following an IF. See <strong>the</strong> discussions of LAG and DIF for an example.<br />

__________________________________________________________________________<br />

Figure 2.6 IF and Missing Data<br />

Case<br />

Num Age Race<br />

Given <strong>the</strong>se The cases below are <strong>the</strong> ones<br />

five cases: 1 29 2 given <strong>to</strong> a P-<strong>STAT</strong> command<br />

2 31 4 as a result of evaluation of<br />

3 - 2 <strong>the</strong> IF clauses on <strong>the</strong> left:<br />

4 - 4<br />

5 32 -<br />

Case<br />

Num Age Race<br />

1. IF Age LE 30, RETAIN 1 29 2<br />

2. IF Age GT 30, DELETE 1 29 2<br />

3 - 2<br />

4 - 4<br />

3. IF Age GOOD AND Race GT 3, RETAIN 2 31 4<br />

4. IF Age MISSING OR Race LE 3, DELETE 2 31 4<br />

5 32 -<br />

__________________________________________________________________________<br />

2.25 Conditional Case Selection<br />

Cases may be retained or deleted as <strong>the</strong> result of logical evaluations. Figure 2.6 shows four conditional case selections<br />

which appear straightforward. In <strong>the</strong> first IF clause, <strong>the</strong>re is no ambiguity. Only <strong>the</strong> first of <strong>the</strong> five cases<br />

in <strong>the</strong> figure has a value for variable Age which is both non-missing and less than or equal <strong>to</strong> 30. Since action is<br />

taken only when <strong>the</strong> result of an IF is true, only that one case is retained.<br />

The second IF in Figure 2.6 looks like <strong>the</strong> first IF. The second and fifth cases, which have non-missing values<br />

greater than 30, are deleted. However, <strong>the</strong> third and fourth cases, which were not retained in <strong>the</strong> first IF because<br />

of missing values on Age, are not deleted in <strong>the</strong> second IF for <strong>the</strong> same reason. When a value is missing, <strong>the</strong> result<br />

of an IF is missing ra<strong>the</strong>r than true. Unless explicitly specified o<strong>the</strong>rwise, actions that follow an IF are done only<br />

when <strong>the</strong> result of <strong>the</strong> IF is true. If <strong>the</strong> result is false or missing, <strong>the</strong> action is not done.<br />

The fourth IF in Figure 2.6 is similarly affected because <strong>the</strong>re is a missing value of Race in case 5. Since <strong>the</strong><br />

result of <strong>the</strong> IF will be missing for that case, it is not deleted from <strong>the</strong> file.<br />

2.26 Conditional Modification<br />

The keyword SET can ei<strong>the</strong>r begin a modification clause or it can be used as an instruction following an IF. In<br />

each of <strong>the</strong>se examples, <strong>the</strong> IF expression is evaluated first:<br />

-------- THE IF -------- -------- THE SET --------<br />

[IF Age EQ 1, SET Age = 99 ]


2.20 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

[IF <strong>Inc</strong>ome MISSING, SET <strong>Inc</strong>ome = 0 ]<br />

[IF Test3 LT Test1, SET Sum = Test2 / 2 ]<br />

[IF Total EQ V(1) + 3, SET Sum = Count * 2 ]<br />

[IF Sum - .5 LT 0, F.SET V(1) = V(3) ]<br />

[IF MEAN (T1 TO T3) LT 65, SET Sum = .M. ]<br />

Except in <strong>the</strong> fifth example, <strong>the</strong> SET is done only if <strong>the</strong> expression is true. In <strong>the</strong> fifth example, F.SET causes <strong>the</strong><br />

SET <strong>to</strong> occur only if <strong>the</strong> expression is false.<br />

In some situations, an IF test may have more than one desired consequence. Multiple instructions may directly<br />

follow <strong>the</strong> IF, within <strong>the</strong> same clause. In <strong>the</strong> following example, if Work.Status equals 3, four instructions<br />

follow — a RETAIN, a and three SETS:<br />

[ IF Work.Status EQ 3, RETAIN,<br />

SET Current.Job = 0,<br />

SET Current.<strong>Inc</strong>ome = 0,<br />

SET Total.Hours = .M1. ]<br />

2.27 Three-Way Logic of IF Statements<br />

Three-way (true, false and missing) logic in <strong>the</strong> evaluation of IF statements is powerful and gives precise control<br />

over data:<br />

[ IF Age GE 18, T.SET Voter = 1,<br />

FM.SET Voter = 0 ]<br />

However, <strong>to</strong> use this power and obtain <strong>the</strong> expected results, consideration must be given <strong>to</strong> <strong>the</strong> treatment of missing<br />

data.<br />

This is especially true with logical selection of cases. DELETE is occasionally useful, but RETAIN is better<br />

because its treatment of missing data is more natural. The action which follows <strong>the</strong> IF is normally done only if<br />

<strong>the</strong> result of <strong>the</strong> IF is true. However, it is possible <strong>to</strong> direct <strong>the</strong> action explicitly by using <strong>the</strong> prefixes T, F and M<br />

before <strong>the</strong> action instruction. The consequence DELETE actually means T.DELETE or delete if true. TM.DE-<br />

LETE deletes a case if <strong>the</strong> result of <strong>the</strong> IF is ei<strong>the</strong>r true or missing and yields <strong>the</strong> expected result. Thus:<br />

[ IF Age GT 30, TM.DELETE ] is <strong>the</strong> same as<br />

[ IF Age LE 30, T.RETAIN ] which is <strong>the</strong> same as<br />

[ IF Age LE 30, RETAIN ]<br />

Similarly, F.DELETE means delete if <strong>the</strong> result of <strong>the</strong> IF is false.<br />

There may be multiple consequences of a given IF. he following are possible combinations of instructions:<br />

[ IF logical expression, T.SET ..., F.SET ..., M.SET ... ]<br />

[ IF logical expression, TM.SET ..., F.SET ... ]<br />

[ IF logical expression, TFM.SET ..., FM.SET ... ]<br />

All combinations of T, F and M, in any order, are permitted as prefixes <strong>to</strong> <strong>the</strong> consequences of an IF. TFM.SET<br />

causes <strong>the</strong> action <strong>to</strong> occur, whatever <strong>the</strong> result of <strong>the</strong> IF.<br />

If a prefix is not given, T is always assumed no matter what prefix was used in <strong>the</strong> previous consequence:<br />

[ IF Age GT 18, F.SET Minor.Child = 1,<br />

SET Voter = 1 ]<br />

The variable Voter is set <strong>to</strong> 1 if <strong>the</strong> expression is true.<br />

In this example, <strong>the</strong> consequences are more complex:<br />

[ IF Sex EQ 1 AND Work.Status GE 2,<br />

T.SET Occupation = Last.Occup,<br />

F.SET Occupation = Current.Occup,


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.21<br />

M.DELETE ]<br />

The evaluation still returns a single logical result of true, false or missing, and <strong>the</strong> various actions are done<br />

accordingly.<br />

2.28 Renaming Variables<br />

RENAME is <strong>the</strong> <strong>PPL</strong> instruction that is used <strong>to</strong> rename individual variables.<br />

[ RENAME Test1 TO Math121;<br />

RENAME V(2) TO Chem34 ]<br />

RENAME requires <strong>the</strong> existing name, TO, and <strong>the</strong> new name, which must be a unique name in <strong>the</strong> file. If you<br />

wish <strong>to</strong> rename most of <strong>the</strong> variables in <strong>the</strong> file with names that have no particular pattern you can use a MODIFY<br />

with an on-<strong>the</strong>-fly concatenation of files which is described in <strong>the</strong> chapter “<strong>PPL</strong>:MODIFY, PROCESS and PUT”..<br />

If you wish <strong>to</strong> rename a group of variables using a pattern such as a prefix, suffix, or sequence number, see <strong>the</strong><br />

chapter “<strong>PPL</strong>:DO LOOPS and IF-THEN-ELSE Blocks”.


2.22 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

<strong>PPL</strong><br />

SUMMARY<br />

<strong>Programming</strong> language modifications may be used in <strong>the</strong> MODIFY command or in any o<strong>the</strong>r P-<strong>STAT</strong><br />

command. <strong>PPL</strong> statements begin with a left bracket and end with a right bracket. They follow <strong>the</strong> input<br />

file directly (with no intervening punctuation):<br />

LIST Patients [ DROP Hospital ;<br />

IF Age GE 65, RETAIN ] ,<br />

BY Diagnosis, MEAN Length.of.Stay $<br />

Modifications are done in <strong>the</strong> programming language using:<br />

• Instructions<br />

• Opera<strong>to</strong>rs<br />

• Functions<br />

• System variables<br />

Both character and numeric variables may be modified. Some instructions, opera<strong>to</strong>rs and functions apply<br />

<strong>to</strong> both types of variables, and o<strong>the</strong>rs apply only <strong>to</strong> one type. Some system variables take on both character<br />

and numeric values, and o<strong>the</strong>rs take on only one type of value.<br />

Wildcards may be used anywhere that <strong>the</strong> name of a variable could be used<br />

[ KEEP ?test Weight?]<br />

or <strong>to</strong> request that P-<strong>STAT</strong> supply a name for a new variable. For example:<br />

[ GENERATE ? = V(2) / V(3) ]<br />

[ GEN ?:C = 'No comments' ]<br />

The “question mark” (?) is <strong>the</strong> wildcard character. Wildcards may be used in lists — in KEEP, DROP,<br />

SPLIT, COLLECT and DO loop instructions, after ANY and ALL opera<strong>to</strong>rs, and following list functions<br />

such as MEAN, SUM, MAX, MIN and SDEV.<br />

Comments may be interspersed among <strong>PPL</strong> clauses:<br />

[/* Selecting cases with outstanding balances */ ;<br />

IF Amount.Owed GT 0, RETAIN ]<br />

The whole comment is a <strong>PPL</strong> clause following ei<strong>the</strong>r a left bracket or a semicolon. The comment text<br />

follows “/*” and is followed by “*/”. <strong>PPL</strong> comments document modifications within a command.<br />

The C.TRANSPOSE command may be used <strong>to</strong> rotate a newly-modified file,<br />

C.TRANSPOSE File12 [ CASES 1 TO 10 ], OUT File12.Chr $<br />

producing an output file containing character representations of <strong>the</strong> data in <strong>the</strong> original file. In <strong>the</strong> transposed<br />

file, <strong>the</strong> variables (columns) are Variable, Case.1, Case.2, Case.3 and so on. The cases (rows) are<br />

<strong>the</strong> names and values of all of <strong>the</strong> variables. Thus, <strong>the</strong> first 10 or so cases in <strong>the</strong> file may be examined in<br />

a concise prin<strong>to</strong>ut — use LIST with FOLD, if necessary. FOLD causes long character variables <strong>to</strong> be<br />

broken in<strong>to</strong> pieces and printed on several lines.<br />

nn=number variable name/position vn=variable name exp=expression


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.23<br />

<strong>PPL</strong> Instructions<br />

The following instructions may begin a modification clause. CASE selections are done before o<strong>the</strong>r<br />

modifications.<br />

CASES nn nn<br />

specifies a list of <strong>the</strong> positions (in ascending order) of cases <strong>to</strong> be selected:<br />

[ CASES 2 5 11 TO 99 333 .ON. ]<br />

Ei<strong>the</strong>r CASE or CASES may be used for case selection. (ROW and ROWS are synonyms.) Case selections<br />

are done before all o<strong>the</strong>r <strong>PPL</strong> modifications.<br />

DECREASE vnp<br />

recodes an existing numeric variable by decreasing its value by 1 or a specified amount:<br />

[ DECREASE Counter ;<br />

DEC Days BY 7 ;<br />

DEC Profit BY Expenses ]<br />

DEC is an abbreviation for DECREASE.<br />

DROP vnp vnp<br />

DELETE<br />

specifies a list of variables, by name or position, <strong>to</strong> be dropped:<br />

[ DROP <strong>Inc</strong>ome V(4) TO V(10) V(26) .ON. ]<br />

Unspecified variables in <strong>the</strong> input file are kept. .ON. means “on through <strong>the</strong> end of <strong>the</strong> variables in <strong>the</strong><br />

file.” Ei<strong>the</strong>r KEEP or DROP may be used for variable selection. Wildcards, .NUMERIC., .CHARAC-<br />

TER., and .NEW. can be used.<br />

specifies that <strong>the</strong> current case not pass <strong>to</strong> any subsequent <strong>PPL</strong> clauses or <strong>to</strong> <strong>the</strong> command in use. Cases<br />

not deleted are retained. DELETE is used as a consequence of an IF test.<br />

GENERATE vn = exp<br />

creates a new numeric or character variable:<br />

[ GENERATE Average = MEAN ( Score1 Score2 ) ;<br />

GEN Current.Age = Year - Birth.Year ;<br />

GEN Area.Code:C = '609' ]<br />

GENERATE requires a new variable name. If <strong>the</strong> new variable is a character variable, <strong>the</strong> name must be<br />

followed by “:C” ,“:Cnn”, “:nn” or “:cnn”, where nn is a number indicating <strong>the</strong> maximum number of<br />

characters in <strong>the</strong> variable. When <strong>the</strong> number (nn) is not supplied, 16 is assumed. GENERATE may be<br />

abbreviated <strong>to</strong> GEN.<br />

IF exp op exp, consequence<br />

specifies a logical selection. The format of an IF clause is:<br />

[ IF exp logical opera<strong>to</strong>r exp , consequence ]<br />

[ IF Age LE 65 , RETAIN ]<br />

[ IF City EQ 'Miami' , DELETE ]<br />

[ IF (V(4) + 1) EQ V(5) , DEC V(4) ]<br />

[ IF 'yes' EQ Answer.4 , DELETE ]<br />

vn=variable name exp=expression nn=number variable name/position


2.24 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

The expressions may be simple or complex numeric or character expressions. Character strings must be<br />

enclosed in single or double quotes. The logical opera<strong>to</strong>rs may be any of those in <strong>the</strong> subsequent section:<br />

<strong>PPL</strong> Logical Opera<strong>to</strong>rs. The consequences may be any of <strong>the</strong>se <strong>PPL</strong> instructions: DELETE, RETAIN,<br />

SET, INCREASE or DECREASE. (Additional instructions which may be used as consequences of an<br />

IF are explained and summarized in <strong>the</strong> second of <strong>the</strong> <strong>PPL</strong> chapters.)<br />

Consequences may be prefixed with T, F, or M, singly (T.SET ... , F.DELETE ... ) or in combination<br />

(FM.SET ... , TFM.INCREASE ... ), <strong>to</strong> direct whe<strong>the</strong>r <strong>the</strong> consequence should be performed when <strong>the</strong><br />

result of <strong>the</strong> IF is true, false or missing. T is assumed if no prefix is supplied.<br />

INCREASE vnp<br />

recodes an existing numeric variable by increasing its value by 1 or a specified amount:<br />

[ INCREASE Counter ;<br />

INC Days BY 7 ;<br />

INC Profit BY Sales ]<br />

INC is an abbreviation for INCREASE.<br />

KEEP vnp vnp<br />

specifies a list of variables, by name or position, <strong>to</strong> be kept or simply reordered:<br />

[ KEEP Name V(13) TO V(44) Education V(49) ]<br />

Unspecified variables in <strong>the</strong> input file are dropped. The system variables .NEW. and .OTHERS. may be<br />

used with KEEP <strong>to</strong> refer <strong>to</strong> variables newly generated in this command and any variables not explicitly<br />

mentioned:<br />

[ KEEP .NEW. ID.Number .OTHERS. ]<br />

Ei<strong>the</strong>r KEEP or DROP may be used for variable selection.<br />

Variables may be referenced with a subscript-type notation: V(2) means <strong>the</strong> variable in <strong>the</strong> second position<br />

from <strong>the</strong> left of <strong>the</strong> file. KEEP may also be followed by a wildcard <strong>to</strong> reference variables with a<br />

common prefix or suffix, and <strong>the</strong> system variables .NUMERIC., .CHARACTER, .NEW., .ON., and<br />

.OTHERS. .<br />

[ KEEP V(3) Score.? .CHARACTER. ]<br />

[ KEEP .NUMERIC. .OTHERS. ]<br />

RENAME vn TO vn<br />

RETAIN<br />

renames an existing variable with a new variable name<br />

[ RENAME VAR1 TO Age; RENAME VAR2 TO <strong>Inc</strong>ome ]<br />

specifies that <strong>the</strong> current case pass <strong>to</strong> <strong>the</strong> next <strong>PPL</strong> clause, or if <strong>the</strong>re are no additional clauses, <strong>to</strong> <strong>the</strong> current<br />

command. Cases not retained are deleted. RETAIN is generally used as a consequence of an IF test.<br />

(CONTINUE is a synonym for RETAIN.)<br />

SET vnp = exp<br />

recodes an existing numeric or character variable:<br />

[ SET Height = Height / 12 ;<br />

SET City = 'Prince<strong>to</strong>n' ;<br />

SET V(2) = V(1) ]<br />

The expression following <strong>the</strong> equal-sign may be a simple or complex numeric or character expression.<br />

Character constants must be enclosed in single or double quotes.<br />

nn=number variable name/position vn=variable name exp=expression


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.25<br />

<strong>PPL</strong> Opera<strong>to</strong>rs: Logical<br />

Logical opera<strong>to</strong>rs are used in logical selection (IF) clauses. They permit comparisons between two expressions.<br />

The expressions may be ei<strong>the</strong>r both numeric or both character expressions. The evaluation of<br />

<strong>the</strong> comparison is true, false or missing: <strong>the</strong> evaluation is missing when one of <strong>the</strong> expressions is missing.<br />

Ei<strong>the</strong>r <strong>the</strong> character representation of <strong>the</strong> logical opera<strong>to</strong>r ( EQ ) or <strong>the</strong> equivalent symbol ( = ), where it<br />

exists, may be used.<br />

The logical opera<strong>to</strong>rs may be prefaced with “X” for eXact comparisons of character strings — <strong>the</strong>se comparisons<br />

respect <strong>the</strong> case (upper, lower or mixed) of <strong>the</strong> string as well as <strong>the</strong> literal characters. See<br />

Chapter 9, “<strong>PPL</strong>:Date and Time Commands and Functions” for a description of <strong>the</strong> 6 date/time logical<br />

opera<strong>to</strong>rs.<br />

EQ = equal<br />

[ IF City EQ 'Tray', SET City = 'Troy' ]<br />

The result may be true, false or missing. Comparisons of character strings are case independent: “troy”<br />

equals “Troy”. Leading blanks are characters: “ Troy” does not equal “Troy”. O<strong>the</strong>r opera<strong>to</strong>rs that are<br />

supported are: LE, LT, GE, and GT.<br />

XEQ exactly EQ<br />

[ IF Initial XEQ 'R', RETAIN ]<br />

Comparisons of character strings respect case when <strong>the</strong> logical opera<strong>to</strong>r is prefaced with “X”. O<strong>the</strong>r opera<strong>to</strong>rs<br />

that are supported are XLE, XLT, XGE, and XGT.<br />

NE ^= not equal<br />

[ IF Zip NE 11234, DELETE ]<br />

Missing values of <strong>the</strong> variable Zip, in <strong>the</strong> example above, are not deleted. If <strong>the</strong> consequence is TM.DE-<br />

LETE ra<strong>the</strong>r than DELETE, deletion occurs when <strong>the</strong> consequence is ei<strong>the</strong>r true or missing.<br />

XNE exactly NE<br />

[ IF Accept.Reject XNE 'F', SET Score = 'Pass' ]<br />

ALL (vnp list)<br />

tests all <strong>the</strong> values of <strong>the</strong> variables in <strong>the</strong> list:<br />

[ IF ALL ( Test.1 TO Test.5 ) GOOD, RETAIN ]<br />

All <strong>the</strong> relationships must be evaluated as true for <strong>the</strong> clause <strong>to</strong> be true. ALL is equivalent <strong>to</strong> a series of<br />

ANDs.<br />

AMONG (list of values and variables)<br />

tests whe<strong>the</strong>r <strong>the</strong> value of <strong>the</strong> specified variable is among <strong>the</strong> values in <strong>the</strong> list:<br />

[ IF Area AMONG ( 201 609 908 ), SET State = 'NJ' ]<br />

[ IF Name AMONG ( 'A' TO 'Mz' ), TM.DELETE ]<br />

The system variables for missing values (.M., .M1., etc.) may be included in <strong>the</strong> list of values following<br />

AMONG.<br />

vn=variable name exp=expression nn=number variable name/position


XAMONG (list of values and variables)<br />

AND<br />

respects case in testing character values:<br />

[ IF Symp<strong>to</strong>m XAMONG ( 'a', 'A', 'Aa' ), RETAIN ]<br />

links two logical relationships:<br />

[ IF Sex EQ 1 AND Age GE 21, RETAIN ]<br />

Both relationships must evaluate as true, or <strong>the</strong> entire clause is false or missing.<br />

ANY (vnp list)<br />

GOOD<br />

tests <strong>the</strong> values of <strong>the</strong> variables in <strong>the</strong> list, until one relationship is evaluated as true:<br />

[ IF ANY ( Test.1 TO Test.5 ) LT 65, RETAIN ]<br />

ANY is equivalent <strong>to</strong> a series of ORs.<br />

tests for good (non-missing) values. GOOD combines “=” with .G.. , <strong>the</strong> system value for good values.<br />

The following are equivalent:<br />

[ IF ID GOOD , RETAIN ]<br />

[ IF ID EQ .G. , RETAIN ]<br />

INRANGE ( exp, exp )<br />

MISSING<br />

tests whe<strong>the</strong>r an expression is within <strong>the</strong> range expressed by <strong>the</strong> first (low) value and <strong>the</strong> second (high<br />

value).<br />

[ IF TestScore INRANGE [ 91, 100 ], SET Grade = 'A' ]<br />

tests for missing (non-good) values. MISSING combines “=” with .M. , <strong>the</strong> system value for missing or<br />

non-good values. The following are equivalent:<br />

[ IF ID MISSING , DELETE ]<br />

[ IF ID EQ .M. , DELETE ]<br />

NOTAMONG (list of values and variables)<br />

tests whe<strong>the</strong>r <strong>the</strong> values of <strong>the</strong> specified variable are not among <strong>the</strong> values in <strong>the</strong> list:<br />

[ IF Age NOTAMONG ( 5, 7 TO 10 ), DELETE ]<br />

[ IF Sex NOTAMONG ('f', 'female'), DELETE ]<br />

In <strong>the</strong> above examples, cases with values not among <strong>the</strong> specified values are deleted. Cases with missing<br />

values are retained.<br />

XNOTAMONG (list of values and variables)<br />

OR<br />

respects case in testing character string values.<br />

links two logical relationships:<br />

[ IF Sex EQ 2 OR Age LT 21, DELETE ]<br />

Only one of <strong>the</strong> relationships need be true for <strong>the</strong> entire clause <strong>to</strong> be true.


<strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong> 2.27<br />

OUTRANGE ( exp, exp )<br />

tests whe<strong>the</strong>r an expression is outside that range specified by <strong>the</strong> two expressions.<br />

[ IF Age OUTRANGE ( 19, 65 ), DELETE ]<br />

<strong>PPL</strong> Opera<strong>to</strong>rs: Numeric<br />

Numeric (arithmetic) opera<strong>to</strong>rs are used between numeric values. Paren<strong>the</strong>ses (as well as nested paren<strong>the</strong>ses)<br />

indicate <strong>the</strong> desired order of operations. When paren<strong>the</strong>ses are not used, <strong>the</strong> precedence or order<br />

of operations is: exponentiation, multiplication and division, addition and subtraction. When <strong>the</strong>re is a<br />

series of additions and subtractions or multiplications and divisions, <strong>the</strong>y are performed from left <strong>to</strong> right.<br />

When <strong>the</strong>re is a series of exponentiations, <strong>the</strong>y are performed from right <strong>to</strong> left.<br />

** exponentiation<br />

[ SET Type = Code ** 2 ]<br />

* multiplication<br />

[ GENERATE Circumference = Pi * Diameter ]<br />

/ division<br />

[ IF V(4) NE 0, SET V(6) = 56089 / V(4) ]<br />

+ addition<br />

[ GENERATE F = ( ( 9/5 ) * C ) + 32 ]<br />

- subtraction<br />

[ SET Commission = .25 * Sales - 5 ]<br />

vn=variable name exp=expression nn=number variable name/position


2.28 <strong>PPL</strong>: Basics of <strong>the</strong> <strong>Programming</strong> <strong>Language</strong><br />

nn=number variable name/position vn=variable name exp=expression


3<br />

<strong>PPL</strong>:<br />

MODIFY, PROCESS<br />

and PUT<br />

The previous <strong>PPL</strong> chapter covered <strong>the</strong> basics of data modification using <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong><br />

(<strong>PPL</strong>). This chapter provides information about:<br />

• using <strong>the</strong> MODIFY command <strong>to</strong> save <strong>the</strong> results<br />

• “on-<strong>the</strong>-fly” concatenation and modification of multiple files<br />

• repeating cases in a file using <strong>the</strong> REPEAT instruction<br />

• <strong>the</strong> instructions GOTO, PUT, PUTL, QUITFILE, QUITCOMMAND, QUITRUN.<br />

• use of <strong>the</strong> PROCESS and standalone <strong>PPL</strong> commands.<br />

3.1 FILE MODIFICATION<br />

Files are typically modified <strong>to</strong> “clean data” — that is, <strong>to</strong> detect and correct errors, and <strong>to</strong> select and possibly transform<br />

<strong>the</strong> variables needed for analysis. While it is <strong>the</strong>oretically possible <strong>to</strong> code and enter data and <strong>to</strong> make a file<br />

that is satisfac<strong>to</strong>ry for a series of runs, it is unlikely <strong>to</strong> happen. Sometimes <strong>the</strong>re are variables in <strong>the</strong> data which<br />

contain more information than is needed. O<strong>the</strong>r times, <strong>the</strong> variables desired are logical transformations of one or<br />

more of <strong>the</strong> original variables.<br />

If <strong>the</strong> data values are modified during <strong>the</strong> initial stages, <strong>the</strong> original information is no longer readily available.<br />

Thus, it is generally best <strong>to</strong> enter all <strong>the</strong> original data and <strong>the</strong>n, using data modifications and selections, create a<br />

second file that contains only <strong>the</strong> necessary modified variables. It is easier <strong>to</strong> search <strong>the</strong> first P-<strong>STAT</strong> file for any<br />

information that is needed later, than it is <strong>to</strong> go back <strong>to</strong> <strong>the</strong> original input records or coding sheets.<br />

A common sequence in readying a file for analysis is <strong>to</strong> make a P-<strong>STAT</strong> file, examine it, and <strong>the</strong>n <strong>to</strong> clean it<br />

up by modifying it as necessary. Appropriate variables are selected and changed in<strong>to</strong> <strong>the</strong> desired form. New variables<br />

are generated. Consistency checks for possible errors are made.<br />

For example, <strong>the</strong> variable Age, coded in years, could be collapsed in<strong>to</strong> five-year age groups. At <strong>the</strong> same time,<br />

a new variable, Age.Sex.Groups, can be generated with four categories: men under 30, men 30 and over, women<br />

under 30, and women 30 and over. A consistency check can be made <strong>to</strong> see that no males have had pregnancies,<br />

and inconsistent data can be converted <strong>to</strong> one of <strong>the</strong> missing values.<br />

The goal is <strong>to</strong> obtain a good file that can be saved and used as <strong>the</strong> basis for <strong>the</strong> rest of <strong>the</strong> analyses. After data<br />

cleaning, <strong>the</strong> number of transformations and selections needed for any given analysis is minimized. For example,<br />

you may wish <strong>to</strong> select only women for some runs and only respondents with good (non-missing) values on particular<br />

variables for o<strong>the</strong>r runs. Or you may want <strong>to</strong> use <strong>the</strong> natural log of <strong>the</strong> variable <strong>Inc</strong>ome. These selections<br />

and transformations may be done “on-<strong>the</strong>-fly”, as <strong>the</strong> analysis proceeds.<br />

3.2 How Modifications Are Processed<br />

When a P-<strong>STAT</strong> command program reads a case of data, it calls a system routine that reads P-<strong>STAT</strong> files. This<br />

routine reads a case of data, applies any specified modifications <strong>to</strong> <strong>the</strong> case, and sends <strong>the</strong> modified case of data<br />

<strong>to</strong> <strong>the</strong> calling program. In this example, each case of data received by <strong>the</strong> COUNT command contains only two<br />

variables, Age and Sex, in positions 1 and 2, respectively:


3.2 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

COUNT Survey [ KEEP Age Sex ] $<br />

The file Survey, which contains 50 variables including Age in position 45 and Sex in position 46, remains<br />

unchanged.<br />

Modification clauses are executed in <strong>the</strong> order in which <strong>the</strong>y appear, except for case (row) selection which is<br />

processed first. Variable selection, followed by additional modification clauses, should include all variables needed<br />

by <strong>the</strong> clauses which follow. If <strong>the</strong>y are not included, an error message states that <strong>the</strong> variable was not found.<br />

Modifications apply <strong>to</strong> input files, not <strong>to</strong> output files. The modification clauses (enclosed in brackets) follow<br />

directly after <strong>the</strong> input filename:<br />

MODIFY TestFile [ KEEP ID TO Occupation ], OUT TestNew $<br />

LIST TestNew [ CASES 1 TO 10 ] $<br />

When an output file is produced, any modifications made <strong>to</strong> <strong>the</strong> input cases are reflected in it. File TestNew has<br />

only <strong>the</strong> variables ID through Occupation. The listing of TestNew shows only <strong>the</strong> first ten cases even though all<br />

<strong>the</strong> cases from file TestFile are represented in file TestNew.<br />

3.3 Temporary Modifications<br />

In P-<strong>STAT</strong>, temporary modifications may be done <strong>to</strong> any file when it is read by any P-<strong>STAT</strong> command. However,<br />

unless output files are created, <strong>the</strong>se modifications are not saved in new P-<strong>STAT</strong> files. They are not available for<br />

use during <strong>the</strong> remainder of <strong>the</strong> run or in a subsequent run. For example:<br />

SURVEY S1099<br />

[ SET <strong>Inc</strong>ome = <strong>Inc</strong>ome / 100 ] ;<br />

When <strong>the</strong> cases of data in file S1099 are sent <strong>to</strong> <strong>the</strong> SURVEY command, <strong>the</strong> values of <strong>the</strong> variable <strong>Inc</strong>ome are<br />

divided by 100. SURVEY uses <strong>the</strong>se new values for <strong>Inc</strong>ome when processing <strong>the</strong> data. However, <strong>the</strong> values in<br />

<strong>the</strong> input file S1099 remain in <strong>the</strong>ir original form. If you wish <strong>to</strong> do ano<strong>the</strong>r operation with <strong>Inc</strong>ome similarly modified,<br />

<strong>the</strong> modification must be done again:<br />

LIST S1099<br />

[ SET <strong>Inc</strong>ome = <strong>Inc</strong>ome / 100 ] $<br />

These modifications are sometimes called “on-<strong>the</strong>-fly” modifications because <strong>the</strong>y are done at <strong>the</strong> spur of <strong>the</strong> moment<br />

or just as <strong>the</strong>y are needed. This on-<strong>the</strong>-fly modification:<br />

SURVEY Families<br />

[ GEN Family.<strong>Inc</strong>ome = Fa<strong>the</strong>rs.<strong>Inc</strong>ome + Mo<strong>the</strong>rs.<strong>Inc</strong>ome] ;<br />

STUB Social.Class, BANNER Children, MEANS Family.<strong>Inc</strong>ome $<br />

creates <strong>the</strong> new variable, Family.<strong>Inc</strong>ome, which exists only as each case of <strong>the</strong> file is passed <strong>to</strong> <strong>the</strong> SURVEY command.<br />

It is not available for use after exiting from SURVEY.<br />

3.4 Permanent Modifications and <strong>the</strong> MODIFY Command<br />

If a file with <strong>the</strong> same modifications is <strong>to</strong> be used over and over, it makes sense <strong>to</strong> do <strong>the</strong> modifications only once<br />

and save <strong>the</strong> results as a new P-<strong>STAT</strong> file. An output file reflects <strong>the</strong> modifications done <strong>to</strong> <strong>the</strong> input file or files<br />

used <strong>to</strong> create it. This is true whe<strong>the</strong>r <strong>the</strong> output file comes from <strong>the</strong> MODIFY, CONCAT, SORT or LOOKUP<br />

commands, or any o<strong>the</strong>r P-<strong>STAT</strong> command which produces output files.<br />

The modification procedure is <strong>the</strong> same, regardless of which command is used. However, only <strong>the</strong> MODIFY<br />

command processes <strong>the</strong> specified commands without doing anything else but producing an output file. (The<br />

SORT command sorts <strong>the</strong> cases in addition <strong>to</strong> producing an output file; <strong>the</strong> CONCAT command joins several files<br />

in producing <strong>the</strong> output file, and so on.)<br />

The MODIFY command produces an output file when <strong>the</strong> identifier OUT is used. It also produces a description<br />

file when <strong>the</strong> identifier DES is used. It describes <strong>the</strong> modified or output file. MODIFY usually requires <strong>the</strong>


<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.3<br />

name of <strong>the</strong> input file <strong>to</strong> be modified, various modification clauses, and <strong>the</strong> identifier OUT followed by a name<br />

for <strong>the</strong> new output file. The modifications are permanent because <strong>the</strong>ir results are contained in <strong>the</strong> output file,<br />

which can be used throughout <strong>the</strong> remainder of <strong>the</strong> run and in subsequent runs.<br />

The MODIFY command reads and writes cases of data. Since it receives each case after any modifications<br />

have been done, <strong>the</strong> cases that are written out reflect all changes and selections. The first phrase begins with <strong>the</strong><br />

command name MODIFY and includes <strong>the</strong> input filename and all <strong>the</strong> modification clauses which are <strong>to</strong> be applied<br />

<strong>to</strong> that file. It is <strong>the</strong> comma following <strong>the</strong> final modification clause which signals that <strong>the</strong> phrase is completed.<br />

The format of a phrase is:<br />

MODIFY FileName [ ; ; ; ] [ ; ],<br />

Figure 3.1 illustrates using <strong>the</strong> MODIFY command <strong>to</strong> produce a new output file, which is a permanent modification<br />

of <strong>the</strong> input file. The MODIFY command has two phrases. The first is <strong>the</strong> command MODIFY and its<br />

argument — <strong>the</strong> name of <strong>the</strong> input file followed by <strong>the</strong> modification clauses. The second is “OUT S1099B”,<br />

which supplies <strong>the</strong> name <strong>to</strong> be given <strong>to</strong> <strong>the</strong> output file.<br />

__________________________________________________________________________<br />

Figure 3.1 Permanent Modifications<br />

MODIFY S1099<br />

[ GENERATE Coded.Age;<br />

SET Occupation = Occupation / 100 ;<br />

SET Coded.Age = INT ( Age / 10 ) ;<br />

KEEP Occupation Sex Coded.Age Race Children Siblings ],<br />

OUT S1099B $<br />

__________________________________________________________________________<br />

The modification clauses create <strong>the</strong> new variable Coded.Age, modify Occupation and Coded.Age, and select<br />

specific variables. The cases written in <strong>the</strong> new output file S1099B contain only six variables, including <strong>the</strong> new<br />

values for variable Occupation and <strong>the</strong> newly created variable Coded.Age. At this point, files S1099 and S1099B<br />

are both available <strong>to</strong> any P-<strong>STAT</strong> commands that follow. S1099 contains all <strong>the</strong> original data. S1099B contains<br />

<strong>the</strong> modified data.<br />

3.5 TEMPLATE Files<br />

A template file may be given <strong>to</strong> <strong>the</strong> MODIFY command <strong>to</strong> select <strong>the</strong> desired variables and, perhaps, <strong>to</strong> specify a<br />

changed ordering of those variables. It has much <strong>the</strong> same effect as a KEEP phrase ending <strong>the</strong> <strong>PPL</strong>.<br />

MODIFY Class89<br />

[ IF ANY ( V(1) .ON. ) MISSING, DELETE ],<br />

TEMPLATE Class88,<br />

OUT Classes $<br />

Figure 3.2 shows an input file, a template file, a MODIFY command and <strong>the</strong> resulting output file. The output<br />

file contains all <strong>the</strong> variable in <strong>the</strong> template file in <strong>the</strong> order of <strong>the</strong> template file. Because variable “b” in file Testfile<br />

is not one of <strong>the</strong> variables in <strong>the</strong> template, it is not moved <strong>to</strong> <strong>the</strong> output file. Because variable “d” is a template<br />

file variable that is not present in file Testfile, it is set <strong>to</strong> missing for all <strong>the</strong> cases in <strong>the</strong> output file.<br />

The IF test deletes any case that is missing on any variable in <strong>the</strong> input file. Therefore, <strong>the</strong> second case is not<br />

written <strong>to</strong> <strong>the</strong> output file even though <strong>the</strong> only missing value is on variable “b” which is not one of <strong>the</strong> variables<br />

in <strong>the</strong> template file.


3.4 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

__________________________________________________________________________<br />

Figure 3.2 Template Files<br />

FILE Testfile<br />

a b c<br />

1 1 1<br />

2 - 2<br />

3 3 3<br />

MAKE Temp, VARS a c d;<br />

$<br />

----------------MAKE completed----------------<br />

| P-<strong>STAT</strong> file temp has been created. |<br />

| It has 0 cases and 3 variables. |<br />

| |<br />

| Two delimiters were used: BLANK and COMMA. |<br />

----------------------------------------------<br />

MODIFY Testfile<br />

[ IF ANY ( V(1) .ON. MISSING, DELETE ],<br />

TEMPLATE Temp,<br />

OUT Newtest $<br />

FILE Newtest<br />

a c d<br />

1 1 -<br />

3 3 -<br />

__________________________________________________________________________<br />

If a file does not already exist with appropriate variable names, a null file, a file with only variable names and<br />

no cases, may be created <strong>to</strong> serve as a template. This MAKE of <strong>the</strong> template file in Figure 3.2 shows both <strong>the</strong><br />

command and <strong>the</strong> report from <strong>the</strong> MAKE command.<br />

Sometimes an additional copy of a file is all that is desired. Here, <strong>the</strong>re are no modification clauses, so file B<br />

is an exact copy of A:<br />

MODIFY A, OUT B $<br />

The number and names of <strong>the</strong> variables, as well as <strong>the</strong> data, are <strong>the</strong> same in both files.<br />

3.6 On-<strong>the</strong>-Fly Concatenation of Files<br />

Multiple files may be read by any P-<strong>STAT</strong> command. The “+” opera<strong>to</strong>r concatenates <strong>the</strong> files “on-<strong>the</strong>-fly” — as<br />

<strong>the</strong>y are read by a command:<br />

MODIFY A + B + C + D, OUT ABCD $<br />

The data from files A, B, C and D are passed <strong>to</strong> <strong>the</strong> MODIFY command, one case after <strong>the</strong> o<strong>the</strong>r.


<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.5<br />

__________________________________________________________________________<br />

Figure 3.3 Renaming All <strong>the</strong> Variables in a File<br />

File F1<br />

VAR1 VAR2 VAR3 VAR4<br />

22 20100 40 f<br />

24 18400 36 f<br />

31 31000 40 m<br />

33 35000 49 f<br />

MAKE Names;<br />

VARS Age <strong>Inc</strong>ome Hours Sex:c;<br />

$<br />

MODIFY Names + F1, OUT F1 $<br />

File F1<br />

Age <strong>Inc</strong>ome Hours Sex<br />

22 20100 40 f<br />

24 18400 36 f<br />

31 31000 40 m<br />

33 35000 49 f<br />

__________________________________________________________________________<br />

On-<strong>the</strong>-fly concatenation of files requires that <strong>the</strong> number of variables be <strong>the</strong> same in all <strong>the</strong> input files, and<br />

that corresponding variables be of <strong>the</strong> same data type (numeric or character). The variables may have different<br />

names. Therefore, it is up <strong>to</strong> you <strong>to</strong> ensure that <strong>the</strong> contents are <strong>the</strong> same. The first file may be a template file,<br />

that is, a file with no cases used only <strong>to</strong> supply variable names for <strong>the</strong> output file. Figure 3.3 illustrates <strong>the</strong> use of<br />

a file with no data records <strong>to</strong> rename <strong>the</strong> variables in an existing P-<strong>STAT</strong> system file.<br />

Often <strong>the</strong> modifications done <strong>to</strong> one file in on-<strong>the</strong>-fly concatenation are necessary for every file. The notation<br />

[ * ] is a shortcut specifying that <strong>the</strong> modifications done <strong>to</strong> <strong>the</strong> previous file be applied <strong>to</strong> <strong>the</strong> current file. This<br />

command:<br />

MODIFY A [ GENERATE Profit = Gross - Expenses ;<br />

KEEP Company TO Zip, Expenses ]<br />

+ B [ GENERATE Profit = Gross - Expenses ;<br />

KEEP Company TO Zip, Expenses ), OUT C $<br />

may be shortened <strong>to</strong> this equivalent one:<br />

MODIFY A [ GENERATE Profit = Gross - Expenses ;<br />

KEEP Company TO Zip, Expenses ]<br />

+ B [ * ], OUT C $<br />

In on-<strong>the</strong>-fly concatenation, when files are referenced without modification or with <strong>the</strong> [ * ] indicating exactly<br />

<strong>the</strong> same modifications, <strong>the</strong> files are treated as one single file. Thus, across-case functions, such as FIRST, LAST,<br />

SPLIT and COLLECT, operate across <strong>the</strong> files. If <strong>the</strong> files have different modifications, across-case functions


3.6 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

operate within only one file. For example, FIRST and LAST are reset as each file is processed. Across case modification<br />

is discussed in detail in <strong>the</strong> chapter “<strong>PPL</strong>: Across-Case Modifications”.<br />

Sometimes it is useful <strong>to</strong> include <strong>the</strong> cases in a file two or more times. For example, this permits estimation<br />

of <strong>the</strong> time it takes <strong>to</strong> process a large file without actually creating it. On-<strong>the</strong>-fly concatenation of a file with itself<br />

accomplishes this. The file is reread, in effect, joined <strong>to</strong> itself:<br />

MODIFY Y + Y + Y, OUT C $<br />

The MODIFY command receives every case of file Y, followed by every case of Y a second time, followed by<br />

every case of Y a third time.<br />

3.7 Repeating Cases<br />

Cases in a P-<strong>STAT</strong> system file may be read more than once using <strong>the</strong> REPEAT instruction. REPEAT is followed<br />

by an integer, a variable that has an integer value, or an expression that reduces <strong>to</strong> an integer. This instruction<br />

repeats each case in <strong>the</strong> file five times:<br />

[ REPEAT 5 ]<br />

A case is repeated at <strong>the</strong> point REPEAT is encountered in a sequence of <strong>PPL</strong> instructions. Thus, some instructions<br />

may precede <strong>the</strong> REPEAT and some may follow it. The system variable .N., which is <strong>the</strong> case number,<br />

is not changed by repetition. The system variable .HERE., which is <strong>the</strong> count of cases processed at a given point,<br />

is changed by repetition. Thus, it is possible <strong>to</strong> test both <strong>the</strong> input case number and <strong>the</strong> output case number. Figure<br />

3.4 illustrates this procedure with a LIST command.<br />

__________________________________________________________________________<br />

Figure 3.4 Repeating Cases<br />

LIST Subjects<br />

[ CASES 1 TO 3 ; REPEAT 2;<br />

GEN Input.Case = .N., GEN Output.Case = .HERE. ;<br />

IF MOD (Output.Case, 2) = 0,<br />

SET Test.1 = .M., SET Test.2 = .M., SET Test.3 = .M. ] $<br />

Test Test Test Input Output<br />

ID Age Sex .1 .2 .3 Case Case<br />

785001 1 1 94 89 97 1 1<br />

785001 1 1 - - - 1 2<br />

785002 2 1 78 82 85 2 3<br />

785002 2 1 - - - 2 4<br />

785006 1 1 71 70 75 3 5<br />

785006 1 1 - - - 3 6<br />

__________________________________________________________________________<br />

In Figure 3.4, <strong>the</strong> MOD function, which does modulo arithmetic, is used <strong>to</strong> test for an even number — that<br />

is, a second case. Test values are set <strong>to</strong> missing in <strong>the</strong>se cases. Second semester test results could be added <strong>to</strong><br />

<strong>the</strong>se cases as <strong>the</strong>y become available. (Functions and system variables are discussed fully in later chapters of this<br />

manual.) REPEAT may be used with any o<strong>the</strong>r <strong>PPL</strong> instructions and functions except after IF, and in <strong>the</strong> same<br />

command as SPLIT, COLLECT, FIRST or LAST.<br />

A file of random data may be generated by building a file with just one case and repeating it as many times<br />

as desired. Here, <strong>the</strong> single case in <strong>the</strong> file Random is repeated 100 times:


<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.7<br />

MOD Random<br />

[ REPEAT 100 ;<br />

SET Random.Number = ( RANNORM (0) * 2.8 ) + 24 ],<br />

OUT RandomX $<br />

Then, <strong>the</strong> single variable Random.Number is set equal <strong>to</strong> a random number generated by <strong>the</strong> RANNORM function.<br />

The random number is multiplied by 2.8 and added <strong>to</strong> 24 <strong>to</strong> shift <strong>the</strong> standard deviation of 1 and <strong>the</strong> mean<br />

of 0 <strong>to</strong> <strong>the</strong>se values. (Random numbers functions are discussed in more detail later in this manual.)<br />

Simple integer weighting of cases may be done using REPEAT. These instructions weight younger respondents<br />

(cases) twice as much as older ones:<br />

[ GEN #Wt = 1 ;<br />

IF Age LT 21, SET #Wt = 2 ;<br />

REPEAT #Wt ]<br />

The scratch (temporary) variable #Wt is generated equal <strong>to</strong> 1. It it reset <strong>to</strong> 2 for younger respondents. A case is<br />

repeated as many times as <strong>the</strong> value of #Wt for that case. (Scratch variables are explained in more detail in <strong>the</strong><br />

discussion of across case modification later in this manual.)<br />

Weighting is often done when different population subgroups have been sampled and <strong>the</strong>y are not representative<br />

of <strong>the</strong>ir real proportion in <strong>the</strong> population. Using REPEAT is not necessarily good weighting technique,<br />

because only integer weighting of each case is possible. Non-integer weighting is done using <strong>the</strong> WEIGHT identifier<br />

and a weighting variable in commands such as COUNTS and SURVEY. The WEIGHT identifier permits<br />

appropriate fractional weights <strong>to</strong> be applied <strong>to</strong> each case of data as it is processed by a command. Also, using<br />

WEIGHT is faster than using REPEAT <strong>to</strong> weight cases. See <strong>the</strong> description of <strong>the</strong> BALANCE command, which<br />

computes weights for sample balancing, in <strong>the</strong> manual “P-<strong>STAT</strong>: The SURVEY, BALANCE and SAMPLE<br />

Commands:”.<br />

3.8 OTHER INSTRUCTIONS AFTER IF<br />

Of <strong>the</strong> instructions that specify <strong>the</strong> action <strong>to</strong> take as a consequence of an IF test, RETAIN, DELETE, SET, IN-<br />

CREASE and DECREASE, are <strong>the</strong> most useful and common. (These are fully explained in <strong>the</strong> second of <strong>the</strong> <strong>PPL</strong><br />

chapters.) However, <strong>the</strong>re are additional instructions that may ei<strong>the</strong>r follow an IF test as possible consequences,<br />

or, in some situations, used alone.<br />

3.9 GOTO To Process Modifications Selectively<br />

The GOTO instruction (GO TO, with a blank between <strong>the</strong> two words, is a synonym) permits modification clauses<br />

<strong>to</strong> be conditionally omitted or repeated. Clauses are generally omitted when control transfers <strong>to</strong> clauses after <strong>the</strong><br />

current one (downward), and repeated when control transfers <strong>to</strong> clauses before <strong>the</strong> current one (upward). The only<br />

constraint is that <strong>the</strong> omitted section cannot contain phrases such as KEEP, DROP, or GENERATE that change<br />

<strong>the</strong> number or order of <strong>the</strong> variables.<br />

GOTO is generally <strong>the</strong> consequence of an IF test:<br />

[ IF Sex EQ 1, GOTO Male;<br />

GOTO is followed by <strong>the</strong> label of <strong>the</strong> clause <strong>to</strong> which control is <strong>to</strong> pass. If <strong>the</strong> value of Sex is 1, control passes <strong>to</strong><br />

<strong>the</strong> modification clause beginning with <strong>the</strong> label “Male:”:<br />

Male: IF Live.Births GOOD, ... ;<br />

The label must be followed by a colon (:) and it must begin with a modification clause. Figure 3.5 illustrates using<br />

GOTO <strong>to</strong> execute different modification clauses for males and females.<br />

Labels may be followed by an instruction or <strong>the</strong>y may be null labels (not followed by an instruction) as in “Next:<br />

]”, <strong>the</strong> last clause in Figure 3.5. Null labels often provide a destination. Labels may also be simply informative, as<br />

in “Female: ... ”, <strong>the</strong> third clause in Figure 3.5, and not a destination.


3.8 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

__________________________________________________________________________<br />

Figure 3.5 Using GOTO and PUT<br />

MODIFY People<br />

[ IF Region = 3, RETAIN;<br />

IF Sex = 1, GOTO Male ]<br />

[ Female: IF Occupation AMONG ( 43, 56 TO 59, 72, 78 ),<br />

PUT 'Check Occupation of ' Name '. Occupation: ' Occupation ;<br />

IF Military.Service = 4,<br />

PUT 'Check Service Record of ' Name '.' ;<br />

GOTO Next ]<br />

[ Male: IF Live.Births GOOD,<br />

PUT 'Invalid Live.Births of ' Live.Births ' for ' Name '.',<br />

SET Live.Births = .M3. ;<br />

SET Military.Service = NCOT ( Military.Service, 2 ) ]<br />

[ Next: ], OUT People2 $<br />

__________________________________________________________________________<br />

Using GOTO makes for clearer logic when <strong>the</strong>re are a series of modifications <strong>to</strong> be performed as <strong>the</strong> result of<br />

a specific IF test. In Figure 3.5, left and right brackets have been used within <strong>the</strong> <strong>PPL</strong> <strong>to</strong> emphasize <strong>the</strong> structure<br />

of an IF, a section for cases coded as female, a section for cases coded as male, and a final section.<br />

3.10 Cleaning Data With PUT<br />

The PUT instruction prints informative messages and <strong>the</strong> values of cited variables. This is useful when cleaning<br />

up data prior <strong>to</strong> an analysis or when constructing a report. There is a complete list of <strong>the</strong> PUT control words in<br />

<strong>the</strong> summary section at <strong>the</strong> end of this chapter.<br />

PUT prints text or error messages and <strong>the</strong> values of <strong>the</strong> variables specified:<br />

[ Female: IF Occupation AMONG ( 43, 56 TO 59, 72, 78 ),<br />

PUT Name Occupation ]<br />

If Occupation is equal <strong>to</strong> 57, for example, <strong>the</strong> text and variable values are printed. The message strings, with <strong>the</strong><br />

values of <strong>the</strong> variables Name and Occupation inserted, appear on <strong>the</strong> current output device.<br />

The text is supplied as character strings enclosed in angle brackets or in single or double quotes. It is usually<br />

easier <strong>to</strong> check <strong>the</strong> text for a proper beginning and end when angle brackets are used instead of <strong>the</strong> quotes. You<br />

can see it better and so can <strong>the</strong> scanning program. Use of <strong>the</strong> angle brackets reduces <strong>the</strong> chances for error and is,<br />

<strong>the</strong>refore, highly recommended.<br />

When variable names are used outside of a text string, <strong>the</strong> values are substituted at that point. For example:<br />

Check occupation of Sandy Sweet. Occupation: 57<br />

Using .ALL. causes all <strong>the</strong> variables of a case <strong>to</strong> be written, one after <strong>the</strong> o<strong>the</strong>r. Trailing blanks (on <strong>the</strong> right end)<br />

of <strong>the</strong> text are removed from character values. Note that variable names are not enclosed in quotes or angle brackets.<br />

When you wish <strong>to</strong> use an expression that is more complex than a variable name enclose it in paren<strong>the</strong>ses:<br />

[ PUT ( GNP / 1000. );<br />

Figure 3.5 illustrates using PUT with IF tests and GOTOs <strong>to</strong> locate a series of cases with miscodings and obtain<br />

a printed list of <strong>the</strong> errors. A new output file, containing corrections and recoded values, is also produced.<br />

PUT is commonly used after an IF test, although it may also be used by itself.


<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.9<br />

See also <strong>the</strong> section later in this manual on using .PUT., <strong>the</strong> system variable which keeps a count of <strong>the</strong> puts.<br />

It is possible <strong>to</strong> create a new file containing only <strong>the</strong> cases with questionable values on <strong>the</strong> tested variables. The<br />

values in this error file may <strong>the</strong>n be corrected and <strong>the</strong> UPDATE command used <strong>to</strong> produce a corrected master file.<br />

It is also possible <strong>to</strong> print text without creating any output file. The PROCESS command works just like MODI-<br />

FY, but produces no output file. It is used merely <strong>to</strong> process <strong>PPL</strong> instructions, such as those that produce text<br />

describing erroneous variable values.<br />

3.11 Report Writing Using PUT and PUTL<br />

The PUT instruction writes reports by putting text and variables at specified locations in <strong>the</strong> output line. PUT is<br />

used in <strong>PPL</strong> clauses; thus, a report may be produced any time a file is read by any command.<br />

Figure 3.6 illustrates <strong>the</strong> use of PUT for reports. PUT and any o<strong>the</strong>r <strong>PPL</strong> are enclosed in brackets which follow<br />

<strong>the</strong> filename. Since no output file is desired, <strong>the</strong> PROCESS command is used here <strong>to</strong> process <strong>the</strong> <strong>PPL</strong><br />

instructions.<br />

Text <strong>to</strong> be put in <strong>the</strong> output line is enclosed in paired angle brackets:<br />

.... PUT @5 ....<br />

although single or double quotes can also be used:<br />

.... PUT @5 'The claim for ' ....<br />

The column pointer, <strong>the</strong> “at sign” (@), specifies a column location. “@5” specifies that <strong>the</strong> next string or<br />

expression is <strong>to</strong> be placed starting in column 5. When a column location is not given, <strong>the</strong> text begins in column 1.<br />

Expressions must be placed within paren<strong>the</strong>ses. A variable name by itself or a scratch variable name (like<br />

#Count) is used without paren<strong>the</strong>ses:<br />

[ .... (First.Name /// Last.Name) .... ;<br />

[ .... 'A check for $' Amt.Due .... ;<br />

The value of First.Name is concatenated with <strong>the</strong> value of Last.Name and placed in <strong>the</strong> output line. (The concatenated<br />

names are one expression, and thus need paren<strong>the</strong>ses.) The value of Amt.Due is placed in <strong>the</strong> output line<br />

after <strong>the</strong> appropriate text.<br />

PUT may be an instruction by itself:<br />

PUT > ;<br />

or it may follow an IF test:<br />

IF Claim.Num MISSING OR Claim.Amt MISSING,<br />

PUT @5 (First.Name /// Last.Name )<br />

>, GOTO End;<br />

The first PUT places a blank line in <strong>the</strong> output. The second PUT is done only when <strong>the</strong> IF test is true. If ei<strong>the</strong>r<br />

Claim.Num or Claim.Amt is missing, <strong>the</strong> specified text is put at column 5 in <strong>the</strong> output line, and control passes <strong>to</strong><br />

<strong>the</strong> <strong>PPL</strong> clause with <strong>the</strong> label “End:”. The GOTO is also a consequence of <strong>the</strong> IF because <strong>the</strong> PUT phrase ended<br />

with a comma ra<strong>the</strong>r than a semicolon.<br />

The text produced by PUT continues on<strong>to</strong> subsequent lines as needed. The current line is printed when a given<br />

PUT finishes, unless <strong>the</strong> PUT ended with an @. In this event, <strong>the</strong> next PUT continues on <strong>the</strong> same line:<br />

Thus, subsequent text:<br />

IF Deduct.Amt GT 0,<br />

T.PUT @5 Deduct.Amt @ ,<br />

F.PUT @5 @ )<br />

PUT > Policy.Num <br />

(First.Name /// Last.Name) @ ;


3.10 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

__________________________________________________________________________<br />

Figure 3.6 Using PUT To Produce a Report<br />

File Insurance:<br />

First Policy Deduct Claim Claim<br />

Name Last Name Num Amt Num Amt<br />

Sharon Wilson 8090564 250 6024 654.25<br />

Claire Mc Donald 7035631 500 8122 -<br />

Neil Haroldson 7469421 0 1005 490.56<br />

The Commands<br />

OUTPUT.WIDTH 70 $<br />

PROCESS Insurance<br />

[ GENERATE Amt.Due = Claim.Amt - Deduct.Amt ;<br />

PUT > ;<br />

IF Claim.Num MISSING OR Claim.Amt MISSING,<br />

PUT @5 ( First.Name /// Last.Name )<br />

>, GOTO End ;<br />

The Report<br />

IF Deduct.Amt GT 0,<br />

T.PUT @5 Deduct.Amt @ ,<br />

F.PUT @5 @ ;<br />

PUT > Policy.Num <br />

(First.Name /// Last.Name) @ ;<br />

PUT Amt.Due<br />

> Claim.Num<br />

> Claim.Amt ;<br />

End: ] $<br />

There is a deductible amount of $250 on Policy Number 8090564,<br />

issued <strong>to</strong> Sharon Wilson. A check for $404.25 is required in payment<br />

of Claim Number 6024 for $654.25.<br />

The claim for Claire Mc Donald is awaiting additional<br />

information.<br />

There is no deductible on Policy Number 7469421, issued <strong>to</strong> Neil<br />

Haroldson. A check for $490.56 is required in payment of Claim<br />

Number 1005 for $490.56.<br />

__________________________________________________________________________


<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.11<br />

follows directly after this text with no intervening spaces. Column references may position <strong>the</strong> column pointer<br />

both backwards and forwards on <strong>the</strong> line. Thus, <strong>the</strong> column reference @40 may be followed by @1, and text will<br />

be placed in column 40 and <strong>the</strong>n in column 1 of <strong>the</strong> same line. The reference @NEXT positions text at <strong>the</strong> beginning<br />

of <strong>the</strong> next line.<br />

System variables may be referenced with PUT:<br />

PUT 'Dated: ' .DATE. ;<br />

System variables do not need <strong>to</strong> be enclosed in paren<strong>the</strong>ses. The current value of <strong>the</strong> system variable .DATE. is<br />

output in <strong>the</strong> report. Using an IF test for <strong>the</strong> first case results in <strong>the</strong> date being output only once, ra<strong>the</strong>r than for<br />

each case. Character values (o<strong>the</strong>r than quoted text) au<strong>to</strong>matically have blanks trimmed from <strong>the</strong> right end.<br />

PUTL puts <strong>the</strong> variable name, as well as <strong>the</strong> variable value, in <strong>the</strong> output line:<br />

PUTL Policy.Num Last.Name Claim.Num ;<br />

The text is on one line:<br />

Policy.Num = 8090564 Last.Name = Wilson Claim.Num = 6024<br />

unless it extends <strong>to</strong> subsequent lines. Placement of <strong>the</strong> variables on separate lines, centered about <strong>the</strong> equal-sign<br />

in column 22:<br />

Policy.Num = 8090564<br />

Last.Name = Wilson<br />

Claim.Num = 6024<br />

may be requested using @EQUAL22:<br />

PUTL @EQUAL22 Policy.Num Last.Name Claim.Num ;<br />

The column location of <strong>the</strong> equal-sign follows directly after <strong>the</strong> @EQUAL. Use of @EQUAL22:50 places two<br />

labeled values per line.<br />

The following puts all values of <strong>the</strong> case, in variable name = value format, three per line, with <strong>the</strong> equals<br />

placed in positions 22, 44, and 66:<br />

PUTL @EQUAL22:44:66 .ALL.;<br />

__________________________________________________________________________<br />

Figure 3.7 Accessing <strong>the</strong> Variable Name Within a Report<br />

PROCESS Rawdata[ CASE 1;<br />

PUT .file.<br />

@SKIP @3 ( VARNAME (1) ) @20 V(1)<br />

@NEXT @3 ( VARNAME (2) ) @20 V(2)<br />

@NEXT @3 ( VARNAME (3) ) @20 v(3)<br />

@NEXT @3 ( VARNAME (4) ) @20 V(4) @SKIP ]$<br />

Values from Rawdata<br />

Age = 13<br />

<strong>Inc</strong>ome = 1350<br />

Hours = -<br />

Sex = m<br />

__________________________________________________________________________<br />

PUTL refers <strong>to</strong> variables (including scratch variables) by name. However, <strong>the</strong> VARNAME function may be<br />

used <strong>to</strong> label values, printed by PUT, that are referenced by position. Figure 3.7 illustrates <strong>the</strong> VARNAME function<br />

in a PUT using <strong>the</strong> PROCESS command <strong>to</strong> access <strong>the</strong> first case of data. When <strong>the</strong>re are only 4 variables <strong>the</strong>


3.12 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

brute force usage in Figure 3.7 is possibly acceptable. However, <strong>the</strong> chapter “<strong>PPL</strong>: DO LOOPS and IF-THEN-<br />

ELSE Blocks” provides a much easier way when <strong>the</strong>re are many repetitions <strong>to</strong> be done.<br />

3.12 STANDALONE <strong>PPL</strong> COMMANDS AND PROCESS<br />

Standalone <strong>PPL</strong> commands, which have nei<strong>the</strong>r an input nor an output file, are used <strong>to</strong> work with scratch variables,<br />

<strong>the</strong> P vec<strong>to</strong>r or user-defined arrays. The PROCESS command is used when you need information from a<br />

P-<strong>STAT</strong> system file but do not need an output file. This can also be done by using MODIFY with no output files,<br />

but MODIFY provides a brief report of its activity and PROCESS is silent.<br />

3.13 Scratch Variables and Standalone <strong>PPL</strong><br />

The variable #Wt in <strong>the</strong> previous section was created as a “scratch” variable. This is a variable that does not exist<br />

in a file. Since it is independent of <strong>the</strong> file, it can be set and <strong>the</strong>n used within a case, across cases or, in <strong>the</strong> case<br />

of a permanent scratch variable across commands.<br />

Scratch variable Permanent Scratch Variable<br />

Rules for name #scratch ##scratch<br />

Exists for a single command across commands<br />

The scratch variable can be ei<strong>the</strong>r character or numeric<br />

GEN #Wt = SQRT ( Age/10 + Sex );<br />

GEN ##Study:C23 = “Study 1034: August 1994”<br />

If <strong>the</strong> scratch variable is created for use in later commands, it must have <strong>the</strong> double ## as a prefix. Variables of<br />

this type are often used <strong>to</strong> move information between commands. A scratch variable can be moved in<strong>to</strong> a file as<br />

a regular variable by including it in a KEEP. The initial # or ## is removed <strong>to</strong> create a legal name which must not<br />

conflict with <strong>the</strong> names of o<strong>the</strong>r variables in <strong>the</strong> file.<br />

The MODIFY command is designed <strong>to</strong> take an input file, modify it in some way, and produce an output file<br />

which reflects <strong>the</strong> modifications. The PROCESS command is designed <strong>to</strong> take an input file and use <strong>the</strong> values in<br />

<strong>the</strong> cases <strong>to</strong> create scratch variables for use in subsequent commands or as a vehicle for <strong>the</strong> PUT and PUTL commands<br />

<strong>to</strong> create a report. Standalone <strong>PPL</strong> commands are used <strong>to</strong> manipulate elements such as scratch variables<br />

and system variables that are not associated with a file. #Num works here because this is 1 standalone <strong>PPL</strong>.<br />

GEN #Num = 154362;<br />

PUT #Num > ( SQRT(#Num)) $<br />

The <strong>PPL</strong> keywords that can be used as standalone <strong>PPL</strong> commands are:<br />

1. IF<br />

2. SET, INCREASE and DECREASE<br />

3. GENERATE<br />

4. PUT and PUTL<br />

5. BRANCH<br />

6. DIALOG<br />

7. IF-THEN-ELSE-ENDIF.<br />

8. DO LOOPS<br />

The last three are covered in <strong>the</strong> next chapters. The following are examples of standalone <strong>PPL</strong> commands<br />

PUT SQRT ( 13562 ) $<br />

PUT .DATE. $<br />

GENERATE ##COUNTER = 0 $


<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.13<br />

PUT SQRT ( 13562 ) $<br />

PUT .DATE. $<br />

GENERATE ##COUNTER = 0 $<br />

3.14 The PROCESS Command and More PUT Information<br />

The process command is often used <strong>to</strong> accumulate summary information about <strong>the</strong> input file. This information is<br />

s<strong>to</strong>red in scratch variables or <strong>the</strong> permanent vec<strong>to</strong>r and is <strong>the</strong>n available for subsequent commands.<br />

Figure 3.8 shows <strong>the</strong> use of <strong>the</strong> PROCESS command <strong>to</strong> count <strong>the</strong> <strong>to</strong>tal number of cases in <strong>the</strong> file as well as<br />

<strong>the</strong> number of cases with non-missing data. Scratch variables ##cases and ##good are first created with GENER-<br />

ATE used as a stand-alone command. Both of <strong>the</strong>se variables must be created as permanent scratch variables with<br />

<strong>the</strong> double pound (##) sign so that <strong>the</strong>y will exist across commands.<br />

__________________________________________________________________________<br />

Figure 3.8 PROCESS: Counting Cases<br />

File Testfile<br />

a b c<br />

1 - 1<br />

2 2 2<br />

3 3 3<br />

GEN ##cases = 0, GEN ##GOOD = 0$<br />

. PROCESS Testfile<br />

[ INCREASE ##cases;<br />

IF ALL ( V(1) .ON. ) GOOD, INCREASE ##good; ]<br />

$<br />

PUT ##CASES ><br />

@NEXT <br />

##good > $<br />

File Testfile has 3 cases.<br />

There are 2 cases with no missing data.<br />

__________________________________________________________________________<br />

The PROCESS command increases ##cases as each row is read. ##good is only increased when <strong>the</strong> IF test<br />

is true. When <strong>the</strong> PROCESS command is complete, PUT is used <strong>to</strong> write <strong>the</strong> results. Each time a PUT is executed<br />

it starts on a new line unless <strong>the</strong> “@” sign was used <strong>to</strong> end a previous PUT. Each PUT usually continues across<br />

lines until it is complete unless <strong>the</strong> @NEXT instruction is used <strong>to</strong> cause a line change. @SKIP may be used <strong>to</strong><br />

cause a blank line. @PAGE may be used <strong>to</strong> cause a page change. Many of <strong>the</strong> controls that can be used with <strong>the</strong><br />

TEXTWRITER command can also be used with <strong>the</strong> PUT instruction. See <strong>the</strong> chapter TEXTWRITER for more<br />

details.


3.14 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

3.15 COMMENTS<br />

Comments may be included ei<strong>the</strong>r between commands or as phrases within <strong>the</strong> <strong>PPL</strong>. A comment begins with /*<br />

and ends with */ . For example.<br />

/* Feb. 16, 2010. Clean <strong>the</strong> data<br />

and generate new variables.<br />

*/<br />

MODIFY Myfile [<br />

/* Age recoded in<strong>to</strong> 3 age groups */<br />

GEN Coded.Age = 1;<br />

IF Age GT 20, SET Coded.Age = 2;<br />

IF Age GT 30, SET Coded Age = 3;<br />

/* Note: This assumes Age is never missing */<br />

], OUT Myfile $<br />

The first of <strong>the</strong> three comments in this example occurs between commands. The o<strong>the</strong>r two comments are in <strong>the</strong><br />

<strong>PPL</strong> of <strong>the</strong> MODIFY command. Once <strong>the</strong> /* is found <strong>the</strong> P-<strong>STAT</strong> executive routines look for <strong>the</strong> terminating */<br />

and <strong>the</strong>n blank out <strong>the</strong> entire area including <strong>the</strong> /* and <strong>the</strong> */. Comments can extend across lines as in <strong>the</strong> example<br />

above or <strong>the</strong>y can be part of a line. For example:<br />

/* List <strong>the</strong> output file */ LIST Myfile $<br />

MODIFY Myfile [<br />

GEN Coded.Age; /* 10 year age groups */ Gen Coded.<strong>Inc</strong>ome; ]<br />

are both legal uses of comments.<br />

Because <strong>the</strong> comments are blanked out when a command is executed, <strong>the</strong>y must be entered in<strong>to</strong> a command<br />

stream using an external edi<strong>to</strong>r. If <strong>the</strong>y are entered interactively <strong>the</strong>y disappear when <strong>the</strong> command is executed.<br />

However, because <strong>the</strong>y can be insert freely both within <strong>the</strong> <strong>PPL</strong> and between commands, <strong>the</strong>y provide an excellent<br />

way <strong>to</strong> document a run. Any thing except <strong>the</strong> terminating characters can be entered in <strong>the</strong> comment:<br />

/* The following group of commands might better<br />

be packaged as a macro and executed by using<br />

RUN Mymacro $<br />

with /* style comments <strong>to</strong> document <strong>the</strong> macro<br />

parameters.<br />

*/<br />

3.16 QUITTING A PROCESS<br />

There are three instructions that cause <strong>the</strong> processing of data <strong>to</strong> s<strong>to</strong>p:<br />

1. QUITFILE requests that processing of <strong>the</strong> current file s<strong>to</strong>p<br />

2. QUITCOMMAND requests that processing of <strong>the</strong> current command s<strong>to</strong>p<br />

3. QUITRUN requests that <strong>the</strong> entire P-<strong>STAT</strong> run s<strong>to</strong>p.<br />

The QUIT instructions are typically used after an IF test, although <strong>the</strong>y may be used in a DO loop or alone.<br />

QUITFILE causes processing of a file <strong>to</strong> s<strong>to</strong>p. Only cases prior <strong>to</strong> this point are passed <strong>to</strong> <strong>the</strong> current command.<br />

QUITFILE causes processing <strong>to</strong> s<strong>to</strong>p if <strong>the</strong> result of <strong>the</strong> IF test is true:<br />

LIST Bonded.Personnel<br />

[ IF Bonded EQ 'Yes' and Prison.Record GT 0, QUITFILE ] $<br />

Only bonded employees with a value of 0 on Prison.Record are listed. If a value greater than 0 is found, only<br />

employees prior <strong>to</strong> that case are listed.


<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.15<br />

QUITCOMMAND causes a command <strong>to</strong> be aborted. If QUITCOMMAND was used in <strong>the</strong> prior example,<br />

no listing would be produced if any employee had a value greater than 0 on Prison.Record. The LIST command<br />

would s<strong>to</strong>p without receiving any cases.<br />

QUITRUN causes an entire P-<strong>STAT</strong> run <strong>to</strong> s<strong>to</strong>p. This is most useful when many commands are executed in<br />

succession, possibly from a transfer file or a macro, or in batch mode. Quitting <strong>the</strong> entire run, ra<strong>the</strong>r than continuing<br />

processing, may save resources if a grave error is encountered.


3.16 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

<strong>PPL</strong><br />

SUMMARY<br />

Data selection and modification may be done <strong>to</strong> any file as it is read by any P-<strong>STAT</strong> command. They<br />

may not be done <strong>to</strong> output files. When any P-<strong>STAT</strong> command is executed, each case of <strong>the</strong> input file is<br />

read and optionally modified before it is passed <strong>to</strong> <strong>the</strong> current command. The current command operates<br />

on <strong>the</strong> modified data while <strong>the</strong> input file remains unchanged. Thus, <strong>the</strong> modifications are temporary “on<strong>the</strong>-fly”<br />

modifications.<br />

MODIFY<br />

Required:<br />

Permanent modifications are usually done using <strong>the</strong> MODIFY command but may be accomplished with<br />

any command that produces an output file incorporating <strong>the</strong> modifications. MODIFY does no particular<br />

statistical or file maintenance procedures, but it produces an output file of <strong>the</strong> data after all modifications<br />

and selections have been completed.<br />

MODIFY File<br />

[ KEEP ID Age Score ;<br />

GENERATE Coded.Age =<br />

RECODE ( Age, 1 TO 17 = 1, 18 TO 100 = 2 ) ],<br />

OUT New.File $<br />

MODIFY Males [ KEEP Test.ID Time Dexterity ]<br />

+ Females [ * ], OUT Students $<br />

Multiple files may be read by MODIFY, as well as <strong>to</strong> o<strong>the</strong>r commands, using <strong>the</strong> “+” opera<strong>to</strong>r. This produces<br />

“on-<strong>the</strong>-fly concatenation” of <strong>the</strong> files. The files should have <strong>the</strong> same number of variables with<br />

corresponding data types <strong>the</strong> same. If <strong>the</strong> names of <strong>the</strong> variables differ, <strong>the</strong> variable names in <strong>the</strong> first<br />

input file are used. Different <strong>PPL</strong> modification phrases may follow each of <strong>the</strong> input files. If <strong>the</strong> same<br />

modifications are desired, as in <strong>the</strong> second example above, an asterisk in brackets should follow <strong>the</strong> additional<br />

file or files.<br />

MODIFY fn<br />

supplies <strong>the</strong> name of <strong>the</strong> required input file. MODIFY is described in more detail in <strong>the</strong> chapter<br />

“<strong>PPL</strong>:MODIFY and COMPARE”.<br />

Optional Identifiers:<br />

OUT fn<br />

provides a name for <strong>the</strong> requested output file. The output file will reflect <strong>the</strong> input file after all selections<br />

and modifications are performed.<br />

TEMPLATE fn<br />

specifies an input file which indicates <strong>the</strong> variables <strong>to</strong> be selected for <strong>the</strong> output file. Additional variables<br />

are ignored:<br />

fn=file name vn=variable name exp=expression


<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.17<br />

MODIFY Diet2 [ ROWS 50 .ON. ],<br />

TEMPLATE Diet1, OUT Diet3 $<br />

If a file does not already exist with appropriate variable names, a null file, a file with only variable names<br />

and no cases, may be created <strong>to</strong> serve as a template.<br />

STANDALONE <strong>PPL</strong> COMMANDS<br />

PUT @PAGE (CVAL(27)) 'G' 'Bold On' $<br />

When <strong>PPL</strong> instructions do not require any information from a P-<strong>STAT</strong> file <strong>the</strong>y can be used as standalone<br />

commands. No input file is required and no output file is produced. These commands are typically used<br />

<strong>to</strong> pass instructions <strong>to</strong> a printer or <strong>to</strong> set values in <strong>the</strong> permanent vec<strong>to</strong>r, scratch variable or user-defined<br />

arrays. These are often tasks that do not require a P-<strong>STAT</strong> file. In <strong>the</strong> example above <strong>the</strong> decimal value<br />

27 is an ASCII ESCAPE character. On some printers <strong>the</strong> combination of ESCAPE and <strong>the</strong> letter “G” is<br />

a signal <strong>to</strong> use a BOLD font. CVAL, a function which converts a number <strong>to</strong> its character equivalent is<br />

described in <strong>the</strong> chapter on character functions.<br />

The <strong>PPL</strong> instructions that can be used as commands are:IF, SET, INCREASE, DECREASE, GENER-<br />

ATE, PUT, PUTL, IF-THEN-ELSE, DIALOG, BRANCH and DO loops.<br />

PROCESS<br />

Required:<br />

PROCESS Hist123a<br />

[ IF Term.Paper MISSING,<br />

PUT Last.Name ><br />

Paper.Due.Date ] $<br />

The PROCESS command processes <strong>PPL</strong> instructions. No output file is produced. It is typically used<br />

when <strong>the</strong> objective is printed text giving information about <strong>the</strong> values of <strong>the</strong> variables in <strong>the</strong> input file.<br />

PROCESS fn<br />

specifies <strong>the</strong> name of <strong>the</strong> required input file.<br />

<strong>PPL</strong> Instructions<br />

The <strong>PPL</strong> instructions DECREASE, DELETE, DROP, GENERATE, IF, INCREASE, KEEP, RETAIN,<br />

ROWS and SET are explained in <strong>the</strong> second <strong>PPL</strong> chapter. The additional instructions GOTO, PUT,<br />

PUTL, QUITFILE, QUITCOMMAND, QUITRUN and REPEAT are summarized below. The list of instructions<br />

which may follow after an IF test includes:<br />

CONTINUE FOR INCREASE QUITFILE SET<br />

DECREASE GENERATE PUT QUITRUN<br />

DELETE GOTO PUTL QUITCOMMAND<br />

GOTO label<br />

directs that <strong>the</strong> <strong>PPL</strong> processor go ei<strong>the</strong>r up or down <strong>to</strong> wherever <strong>the</strong> <strong>PPL</strong> clause with <strong>the</strong> specified label<br />

is located. The label must be at <strong>the</strong> beginning of a <strong>PPL</strong> clause, and it must be followed by a colon (:) .<br />

The bypassed phrases cannot change <strong>the</strong> number, order, or names of <strong>the</strong> variables (i.e., KEEP, DROP,<br />

exp=expression fn=file name vn=variable name


3.18 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

QUITFILE<br />

GENERATE), nor can it bypass REPEAT, SPLIT or COLLECT. GOTO may be used following an IF<br />

test:<br />

IF Sex EQ 1, GOTO Male;<br />

The label may be followed by an instruction or it may be a “null” label.<br />

[ GENERATE Fac<strong>to</strong>r;<br />

IF Treatment.Group EQ 'placebo', GOTO Not.Drug;<br />

SET Drug = RECODE ( Drug, 1 TO 3 = 1, G = 2 );<br />

SET Fac<strong>to</strong>r = SUM ( Test1 TO Test2 );<br />

GOTO Next.Test;<br />

Not.Drug: SET Drug = 0, SET Fac<strong>to</strong>r = 0 ;<br />

Next.Test: ; ..... ]<br />

specifies that processing of <strong>the</strong> current file s<strong>to</strong>p. Only cases prior <strong>to</strong> this point are passed <strong>to</strong> <strong>the</strong> command<br />

processor. QUITFILE is commonly used following an IF test.<br />

QUITCOMMAND<br />

QUITRUN<br />

specifies that processing of <strong>the</strong> current command s<strong>to</strong>p. The command is aborted at that point. QUIT-<br />

COMMAND is commonly used following an IF test.<br />

specifies that <strong>the</strong> P-<strong>STAT</strong> run s<strong>to</strong>p. QUITRUN is commonly used following an IF test. The entire run<br />

ends at this point.<br />

REPEAT exp<br />

requests that each case be repeated <strong>the</strong> specified number of times. The argument for repeat should be an<br />

expression (constant, variable, function or combination of <strong>the</strong>se) that reduces <strong>to</strong> an integer. A case is repeated<br />

at that point in <strong>the</strong> <strong>PPL</strong> in which <strong>the</strong> REPEAT instruction is encountered. REPEAT is useful in<br />

generating a set of random data:<br />

MOD Random<br />

[ REPEAT 100;<br />

SET Random.Number = ( RANNORM (0) * 2.8 ) + 24 ],<br />

OUT RandomX $<br />

An initial file with one case (Random) is built and <strong>the</strong>n it is modified <strong>to</strong> generate an output file with 100<br />

cases. The variable Random.Number is set equal <strong>to</strong> random numbers with mean 24 and standard deviation<br />

2.8. REPEAT may not be used as a consequent of an IF, within a DO loop, or within an IF-THEN-<br />

ELSE block. It also cannot be used in conjunction with SPLIT, COLLECT, FIRST or LAST.<br />

PUT AND PUTL CONTROL ELEMENTS<br />

The following printing elements can follow a PUT or PUTL:<br />

1. Age . The name of a variable. Its value will be printed. PUTL will also label it (Age = 22)<br />

2. #name or ##name. A scratch variable, also labelled by PUTL<br />

3. V(3). A variable reference with a constant subscript, also labelled by PUTL<br />

4. .ALL. All <strong>the</strong> values of a case, also labelled by PUTL<br />

5. V(#j+2). PUTL does not label this<br />

fn=file name vn=variable name exp=expression


<strong>PPL</strong>: MODIFY, PROCESS and PUT 3.19<br />

6. P(3) or P(#J+2) or (expression) of any complexity<br />

7. 'string' or “string” or .<br />

These control elements can follow PUT or PUTL.<br />

PUT<br />

@NEXT move <strong>to</strong> <strong>the</strong> next line.<br />

@SKIP=3 write <strong>the</strong> current line and <strong>the</strong>n three blank lines.<br />

@PARA write <strong>the</strong> current line, write a blank line, and indent three positions in <strong>the</strong> new line.<br />

@20 moves <strong>the</strong> pointer (which is where <strong>the</strong> next value will be written) <strong>to</strong> that position.<br />

@PLUS=(5) move <strong>the</strong> pointer that far. This can be an expression.<br />

@MINUS=(3) move <strong>the</strong> pointer back that far.<br />

@ can be used as <strong>the</strong> last element in a PUT or PUTL. The line is not flushed, so <strong>the</strong><br />

next PUT or PUTL statement adds <strong>to</strong> it instead of starting a new line.<br />

@BEFORE=40 causes <strong>the</strong> next value <strong>to</strong> be places so it ends at position 40. The string or value must<br />

be <strong>the</strong> next PUT element.<br />

@PLACES=3 causes succeeding numeric values <strong>to</strong> print with 3 places.<br />

@NOPLACES returns <strong>to</strong> <strong>the</strong> default mode, where integers print without places and fractional values<br />

get some number of places depending on <strong>the</strong> actual value.<br />

@COMMAS inserts commas in<strong>to</strong> <strong>the</strong> integer part of numbers.<br />

@NOCOMMAS turns it off. Default is off.<br />

@LABEL turns PUTL mode on. (@NAME is synonym).<br />

@NOLABEL turns PUTL mode off. PUT default is off. PUTL default is on.<br />

@TRIM default. Trims blanks from <strong>the</strong> right end of a character value.<br />

@NOTRIM print it all<br />

@EQUAL=20 when a labelled value (like Age = 40) is about <strong>to</strong> be written, place <strong>the</strong> = at position<br />

20. @EQUAL=20:40 prints 2 values per line with equal signs at positions 20 and 40.<br />

@NOEQUAL turns it back off.<br />

@MISS='string' use <strong>the</strong> string instead of -, --, or --- <strong>to</strong> represent missing values.<br />

@NOMISS resets <strong>to</strong> -, --, or ---.<br />

positions text and variables at specified column locations in <strong>the</strong> output line. Text strings are enclosed in<br />

quotes and variables are simply cited. paired angle brackets, “” may also be used as string<br />

delimiters in PUT statements.<br />

PUT @3 'The client is ' Name '.' @ ;<br />

Locations are specified with @:<br />

@3 at column 3<br />

@NEXT at <strong>the</strong> start of <strong>the</strong> next line<br />

@SKIP write a blank line and move <strong>to</strong> <strong>the</strong> start of <strong>the</strong> next line<br />

@PAGE issue a page change and move <strong>to</strong> <strong>the</strong> start of <strong>the</strong> first line<br />

A final @, at <strong>the</strong> end of a PUT, holds <strong>the</strong> text output location, so that subsequent text may follow directly<br />

after.<br />

exp=expression fn=file name vn=variable name


3.20 <strong>PPL</strong>: MODIFY, PROCESS and PUT<br />

PUTL<br />

PUT is often used after an IF test checking for erroneous data values. PUT specifies error messages <strong>to</strong><br />

print:<br />

[ IF ID MISSING,<br />

PUT Last.Name<br />

SS.Num ]<br />

positions variable names as well as variable values in <strong>the</strong> output line. If @EQUAL22 is used:<br />

[ PUTL @EQUAL22 Name SS.Num ]<br />

<strong>the</strong> variable names and values are listed, one per line, centered on <strong>the</strong> equal-sign in column 22 (or any<br />

o<strong>the</strong>r specified column location). @EQUAL22:52 positions both variable names and values on one line,<br />

<strong>the</strong> first centered on <strong>the</strong> equal sign in column 22 and <strong>the</strong> second centered on <strong>the</strong> equal-sign in column 52.<br />

COMMENTS<br />

/* comments can be inserted in <strong>the</strong> command stream<br />

wherever a command can be found. The initial characters<br />

are <strong>the</strong> /*. The terminating characters are <strong>the</strong> asterisk<br />

followed by <strong>the</strong> slash<br />

*/<br />

LIST Myfile $<br />

Comments can also be used in <strong>the</strong> <strong>PPL</strong> of a command as long as each comment is positioned as a <strong>PPL</strong><br />

phrase and not inserted in <strong>the</strong> middle of such a phrase.<br />

MODIFY Myfile [ /* generate coded variables */ GEN Coded.Age;<br />

GEN Coded.<strong>Inc</strong>ome;<br />

/* Age will be recoded in<strong>to</strong> 10 year groups */<br />

SET Age = .... ]<br />

fn=file name vn=variable name exp=expression


4<br />

<strong>PPL</strong>:<br />

NCOT and RECODE<br />

The previous <strong>PPL</strong> chapters covered <strong>the</strong> basics of data modification using <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong><br />

(<strong>PPL</strong>). This chapter provides information about <strong>the</strong> RECODE and NCOT commands. These commands often<br />

provide <strong>the</strong> easiest way <strong>to</strong> do complex recodes.<br />

Values may be changed <strong>to</strong> different values using ei<strong>the</strong>r <strong>the</strong> RECODE or NCOT functions. Both numeric and<br />

character values may be recoded using RECODE; only numeric values may be changed with NCOT. RECODE<br />

permits any arbitrary changes, including <strong>the</strong> recoding of individual values, ranges of values, missing values, character<br />

strings and extra (“left over”) values. XRECODE permits case sensitive recodes of character data. NCOT<br />

recodes ranges of values, specified with “cutting points”, <strong>to</strong> consecutive constants.<br />

The RECODE function is usually used <strong>to</strong> test a single argument, which may be a variable, a constant or a more<br />

complex expression. However, it can also be used <strong>to</strong> test multiple arguments creating a result which is based on<br />

several different arguments such as setting:<br />

Group=1 when Age lt 30 and Sex eq male and <strong>Inc</strong>ome lt 20000<br />

Group=2 when Age ge 30 and Sex eq male and <strong>Inc</strong>ome lt 20000<br />

Group=3 when Age lt 30 and Sex eq female and <strong>Inc</strong>ome lt 20000, etc.<br />

This multi-argument use of RECODE often replaces a lengthy series of complex IF’s with a single statement that<br />

is both easier <strong>to</strong> read and <strong>to</strong> understand.<br />

4.1 The NCOT Function<br />

NCOT recodes numeric variable values <strong>to</strong> numeric constants. It does an N-way dicho<strong>to</strong>mization or division of <strong>the</strong><br />

values <strong>to</strong> be recoded, using cutting points supplied in <strong>the</strong> NCOT instructions. The cutting points divide <strong>the</strong> values<br />

in<strong>to</strong> groups or ranges of values. The ranges are recoded <strong>to</strong> consecutive integers.<br />

Because NCOT is a function, it begins with a left paren<strong>the</strong>sis and ends with a right paren<strong>the</strong>sis. The first element<br />

following <strong>the</strong> left paren<strong>the</strong>sis is <strong>the</strong> NCOT argument which must be a variable name or an expression. This<br />

is followed by additional arguments giving cutting points for <strong>the</strong> values. Each NCOT argument is separated from<br />

<strong>the</strong> next by a comma. NCOT is designed for use when a numeric variable is <strong>to</strong> be divided in<strong>to</strong> groups based on a<br />

series of ascending values or cutting points. It does not work with character values and it cannot be used for complex<br />

recoding.<br />

The cutting points for NCOT can be fractional values. The one restriction is that <strong>the</strong> cutting points must go<br />

in ascending order, from low (which may be negative) <strong>to</strong> high. Given:<br />

[ SET Hours = NCOT ( Hours, 20, 25, 30, 35, 40, 45, 50 ) ]<br />

everything less than or equal <strong>to</strong> <strong>the</strong> first value (20) becomes a “1”, everything above <strong>the</strong> first value, but not above<br />

<strong>the</strong> second value (25) becomes a “2”, and so on. The final value includes all <strong>the</strong> numbers greater than <strong>the</strong> final<br />

cutting point. Thus, <strong>the</strong> number of possible values is always one more than <strong>the</strong> cutting points.<br />

The NCOT function instructions can be abbreviated fur<strong>the</strong>r by providing a step size:<br />

[ SET Hours = NCOT ( Hours, 20, 50/5 ) ]<br />

The 20 is <strong>the</strong> first cutting point, 50 is <strong>the</strong> last cutting point and 5 is <strong>the</strong> step size. Thus, <strong>the</strong> cutting points are 20,<br />

25, 30, 35, 40, 45 and 50. The instructions:


4.2 <strong>PPL</strong>: NCOT and RECODE<br />

[ SET Hours =<br />

NCOT ( Hours, 20, 50/5, 100/10 ) ]<br />

create cutting points at 20, 25, 30, 35, 40, 45 and 50 (steps of 5), and also at 60, 70, 80, 90 and 100 (steps of 10).<br />

A value of 33, which is between <strong>the</strong> 3rd and 4th cutting point, becomes a “4” and a value of 85, which is between<br />

<strong>the</strong> 10th and 11th cutting point becomes an “11”.<br />

__________________________________________________________________________<br />

Figure 4.1 NCOT: Numeric Recodes<br />

File RawData<br />

Age <strong>Inc</strong>ome Hours Sex<br />

13 1350 - m<br />

22 20100 40 f<br />

24 18400 36 f<br />

31 31000 40 m<br />

33 35000 49 f<br />

37 27000 38 m<br />

42 20000 40 f<br />

49 45000 40 m<br />

50 61000 62 m<br />

55 31000 30 m<br />

62 24000 24 f<br />

73 16000 20 m<br />

MODIFY RawData [ GEN Coded.Age, GEN Coded.<strong>Inc</strong>ome, GEN Coded.Hours;<br />

SET Coded.Age = NCOT ( Age, 25, 40, 55 );<br />

SET Coded.<strong>Inc</strong>ome = NCOT ( <strong>Inc</strong>ome, 10000, 100000 / 10000 );<br />

SET Coded.Hours = NCOT ( Hours, 20, 50/5, 100/10 ) ],<br />

OUT NewData $<br />

File NewData<br />

Coded Coded Coded<br />

Age <strong>Inc</strong>ome Hours Sex Age <strong>Inc</strong>ome Hours<br />

13 1350 - m 1 1 -<br />

22 20100 40 f 1 3 5<br />

24 18400 36 f 1 2 5<br />

31 31000 40 m 2 4 5<br />

33 35000 49 f 2 4 7<br />

37 27000 38 m 2 3 5<br />

42 20000 40 f 3 2 5<br />

49 45000 40 m 3 5 5<br />

50 61000 55 m 3 7 8<br />

55 31000 30 m 3 4 3<br />

62 24000 24 f 4 3 2<br />

73 16000 20 m 4 2 1<br />

__________________________________________________________________________<br />

Figure 4.1 illustrates NCOT with three different patterns. The NCOT of Age provides 3 cutting points and<br />

results in 4 values. Values on Age less than or equal <strong>to</strong> 25 are a 1 in Coded.Age. Values greater than 25 and less


<strong>PPL</strong>: NCOT and RECODE 4.3<br />

than or equal <strong>to</strong> 40 are a 2 in Coded.Age. Values greater than 40 and less than or equal <strong>to</strong> 55 are a 3 in Coded.Age.<br />

And finally any value on Age that is greater than 55 is a 4 in Coded.Age.<br />

Coded.<strong>Inc</strong>ome is variable <strong>Inc</strong>ome in groups of 10,000. Coded.Hours is a more complex pattern with cutting<br />

points at 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100. With 12 cutting points <strong>the</strong>re is a possibility of codes between<br />

1 and 13.<br />

NCOT is a very parsimonious and clear way <strong>to</strong> recode numeric values when cutting points, ei<strong>the</strong>r arbitrary or<br />

patterned is required. When <strong>the</strong> recodes <strong>to</strong> be done are not in ascending order, <strong>the</strong> RECODE function is <strong>the</strong> function<br />

<strong>to</strong> use.<br />

4.2 The RECODE Function: Single Argument Usage<br />

RECODE changes (recodes) numeric or character variables. XRECODE, for eXact recodes, respects <strong>the</strong> case of<br />

characters in recoding <strong>the</strong>m <strong>to</strong> o<strong>the</strong>r characters or <strong>to</strong> numbers. This section describes simple recodes, ones with a<br />

single argument. Multiple-argument recodes are described later.<br />

Because RECODE is a function, it begins with a left paren<strong>the</strong>sis and ends with a right paren<strong>the</strong>sis. The first<br />

element following <strong>the</strong> left paren<strong>the</strong>sis is <strong>the</strong> RECODE argument which must be a variable name or an expression.<br />

This is followed by a series of recoding tests, separated by commas.<br />

[ SET Age =<br />

RECODE ( ROUND (Age), 0 TO 20 = 1, 21 TO 100 = 2 ) ]<br />

The format of <strong>the</strong> single argument RECODE function is:<br />

RECODE [ Argument, test, test, test, ..... ]<br />

The argument may be a variable name or a complex expression. Each recode test is composed of a list of one or<br />

more values followed by an “=” sign and <strong>the</strong> new value that replaces <strong>the</strong> values in <strong>the</strong> list.<br />

value.list = new.value, value.list = new.value, ....<br />

The list may be a single value such as “2”, a range of values such as “3 TO 5”, or a combination of single<br />

values and ranges such as “12 TO 15 33 M2”. The list is followed by an equal sign "=" and <strong>the</strong> new value <strong>to</strong> be<br />

used. Each recode test is separated from <strong>the</strong> next by a comma. After <strong>the</strong> recoding, <strong>the</strong> new values must all be of<br />

one type, ei<strong>the</strong>r numeric or character, and <strong>the</strong>y must be <strong>the</strong> same data type as <strong>the</strong> variable being set or generated.<br />

Single values may be recoded <strong>to</strong> new values:<br />

[ GEN Gender:c;<br />

SET Sex = RECODE ( Sex, 0 = 1, 1 = 2 ) ;<br />

SET Gender = RECODE ( Sex, 1 = 'male', 2 = 'female') ;<br />

SET Gender =<br />

RECODE ( Gender, 'male' = 'boy', 'female' = 'girl' ) ]<br />

The first recode changes <strong>the</strong> values of <strong>the</strong> numeric variable Sex; “zeros” become “ones” and “ones” become<br />

“twos”. The second recode provides a value for a new character variable, Gender. Its values are recodes of <strong>the</strong><br />

numeric values of <strong>the</strong> existing variable Sex. “Ones” become “male” and “twos” become “female”. (Notice that<br />

character strings in <strong>the</strong> recode tests are enclosed in quotes.) The third recode changes <strong>the</strong> values of <strong>the</strong> character<br />

variable Gender; “male” becomes “boy” and “female” become “girl”.<br />

RECODE tests may recode values in any of <strong>the</strong>se combinations: numeric <strong>to</strong> numeric, character <strong>to</strong> character,<br />

numeric <strong>to</strong> character, or character <strong>to</strong> numeric. However, <strong>the</strong> resultant values must all be one data type, and that<br />

type must correspond with that of <strong>the</strong> variable being recoded or generated.<br />

Ranges of values may be recoded <strong>to</strong> new values:<br />

[ SET <strong>Inc</strong>ome = RECODE ( INT (<strong>Inc</strong>ome),<br />

0 TO 50000 = 1, 50001 TO 100000 = 2 ) ]


4.4 <strong>PPL</strong>: NCOT and RECODE<br />

This example recodes <strong>the</strong> integer portion of <strong>the</strong> values of <strong>Inc</strong>ome; values from zero through 50,000 become<br />

“ones”, values from 50,001 through 100,000 become “twos”.<br />

Ranges can also be used with character variables:<br />

[ GEN Session = RECODE ( Last.Name,<br />

'A' TO 'MZZZ' = 1, 'N' TO 'ZZZZ' = 2 ) ]<br />

This example generates <strong>the</strong> numeric variable Session, whose values are based on those of <strong>the</strong> character variable<br />

Last.Name. Cases with last names from “A” through “MZZZ” have “ones” for <strong>the</strong> value of Session, and cases<br />

with names from “N” through “ZZZZ” have “twos”. In RECODE, case does not matter. Therefore, 'a' TO 'MZZZ'<br />

and 'A' TO 'mzzz' are equivalent.<br />

Any non-missing values left over after <strong>the</strong> recoding is complete, may be recoded using G (for Good). There<br />

may be only one G in a RECODE instruction:<br />

[ GENERATE Test.Score =<br />

RECODE ( ROUND ( (Correct / Total) * 100 ),<br />

65 TO 74 = 1, 75 TO 84 = 2, 85 TO 94 = 3,<br />

95 TO 100 = 4, G = 0 ) ]<br />

In this example, <strong>the</strong> RECODE argument is a complex expression — <strong>the</strong> number of correct items (Correct) divided<br />

by <strong>the</strong> <strong>to</strong>tal number of items (Total) and multiplied by 100. That value is rounded <strong>to</strong> a whole number (ROUND)<br />

and recoded <strong>to</strong> <strong>the</strong> specified values. If a non-missing value is not included in <strong>the</strong> recoding tests ( G ), Test.Score<br />

is zero. Thus, a value of 42 yields a Test.Score value of 0. The recode testing is done in a strict left <strong>to</strong> right order.<br />

Therefore, you should put any G= recode AFTER all recode tests that expect a good value.<br />

A common mistake in using RECODE with ranges of values is <strong>to</strong> specify <strong>the</strong> recode tests using integers, without<br />

provision for values which fall between <strong>the</strong> ranges. For example, <strong>the</strong> value 50.5 falls between 50 and 51:<br />

[ GENERATE Score = RECODE ( XA, 1 TO 50 = 1, 51 TO 100 = 2 ) ]<br />

Any such numbers are not recoded. To solve this problem, <strong>the</strong> recode tests may be specified as any of <strong>the</strong><br />

following:<br />

[ GEN Score =<br />

RECODE ( XA, 1 TO 50 = 1, 51 TO 100 = 2, G = M3 ) ]<br />

[ GEN Score =<br />

RECODE ( XA, 1 TO 50 = 1, 50 TO 100 = 2, G = M ) ]<br />

[ GEN Score =<br />

RECODE ( XA, .01 TO 50 = 1, 50.0001 TO 100 = 2, G = 3 ) ]<br />

The first example uses G <strong>to</strong> detect non-missing values that fall between <strong>the</strong> ranges in <strong>the</strong> recode tests — 51.5<br />

yields a value of MISSING3 for Score. The second example uses overlapping ranges <strong>to</strong> avoid gaps in <strong>the</strong><br />

ranges — values of 50 are recoded <strong>to</strong> 1, <strong>the</strong> first test in which 50 appears. The third example specifies all-inclusive<br />

ranges.<br />

It is a good idea <strong>to</strong> also use G in <strong>the</strong> recode tests <strong>to</strong> include any non-missing values that may have been overlooked.<br />

For example, if G is used <strong>to</strong> set any overlooked values <strong>to</strong> MISSING3 (M3), <strong>the</strong> user need only search for<br />

MISSING3 in <strong>the</strong> output <strong>to</strong> locate any values that have not been recoded. When <strong>the</strong> recoding transforms character<br />

<strong>to</strong> numeric variables or numeric <strong>to</strong> character variables, ei<strong>the</strong>r all possible values must be recoded or G must be<br />

used <strong>to</strong> avoid an error situation.<br />

Missing values are not recoded unless <strong>the</strong> recode instructions specify how <strong>the</strong>y should be recoded. (G refers<br />

<strong>to</strong> only non-missing extra values.) The three different types of missing values can be explicitly referenced using<br />

M1, M2 and M3:


<strong>PPL</strong>: NCOT and RECODE 4.5<br />

LIST Kittens<br />

[ SET Sex = RECODE ( Sex,<br />

'm' = 'male', 'f' = 'female',<br />

G = M1, M2 = 'neuter' ) ] $<br />

In this listing, values of Sex that are MISSING2 are recoded <strong>to</strong> “neuter”, and extra values are recoded <strong>to</strong><br />

MISSING1. Any values of MISSING1 or MISSING3 remain <strong>the</strong> same.<br />

M by itself recodes any of MISSING1, MISSING2 or MISSING3 in <strong>the</strong> original value <strong>to</strong> a single new value.<br />

When M by itself is used as a new value, it is assumed <strong>to</strong> be MISSING1:<br />

LIST File1<br />

[ GENERATE New = RECODE ( Old, M = 0, 99 = M ) ] $<br />

Old New<br />

- 0<br />

-- 0<br />

--- 0<br />

99 -<br />

77 77<br />

All types of missing are recoded as “zeros”. However, since a variable can be only one type of missing at a time,<br />

using 99=M is treated as 99=M1.<br />

The number “99” is recoded as M1. Since no test is given for 77 and since G= was not used, <strong>the</strong> 77 is not changed.<br />

(Note that <strong>the</strong> RECODE function references <strong>the</strong> system variables for <strong>the</strong> different types of missing data using a<br />

simplified notation. The regular notation for system variables may also be used — .M., .M1., .M2. and .M3.)<br />

XRECODE is an eXact recode — it works just like RECODE except that <strong>the</strong> case (upper, lower or mixed)<br />

of <strong>the</strong> recoding instructions is respected:<br />

LIST File2<br />

[GEN Num = XRECODE ( Char, 'a' = 1, 'A' = 9 ) ] $<br />

Char Num<br />

a 1<br />

A 9<br />

In o<strong>the</strong>r aspects, XRECODE operates in <strong>the</strong> same manner that RECODE does.<br />

In summary, <strong>the</strong>re are RECODE instructions for:<br />

• Numbers;<br />

• Character strings;<br />

• Missing values (M, M1, M2, M3);<br />

• Any good (non-missing) value left over (G).<br />

The data type of all <strong>the</strong> recoded values for a given variable must be <strong>the</strong> same, and it must agree with that of <strong>the</strong><br />

modified or newly generated variable.<br />

Note: On an ASCII character set computer like a PC, you cannot use an XRECODE test of 'a' <strong>to</strong> 'Z' because<br />

'a' is 97 and 'Z' is 90, and <strong>the</strong> test is backwards. That would, however, be legal in EBCDIC on an IBM mainframe,<br />

where 'a' is 129 and 'Z' is 233.


4.6 <strong>PPL</strong>: NCOT and RECODE<br />

4.3 COMPLEX RECODES<br />

The multiple argument usage of RECODE is exactly like <strong>the</strong> single argument usage in <strong>the</strong> way that <strong>the</strong> arguments<br />

are organized and in <strong>the</strong> values that can be supplied and tested. The RECODE syntax is:<br />

1. RECODE (<br />

2. <strong>the</strong> arguments <strong>to</strong> be used in <strong>the</strong> recode. If <strong>the</strong>re are multiple arguments <strong>the</strong>y are separated by a vertical<br />

bar, "|". A comma follows <strong>the</strong> final argument.<br />

3. optional definitions for a set of values. The definitions provide a label for a set of values that can<br />

<strong>the</strong>n be referenced by that label in <strong>the</strong> recode tests that follow. These definitions are enclosed in paren<strong>the</strong>ses<br />

and are described below.<br />

4. one or more recoding tests which are executed in left <strong>to</strong> right order. If <strong>the</strong>re are multiple arguments<br />

<strong>the</strong> sections of <strong>the</strong> test are separated by a vertical bar.<br />

5. a right paren<strong>the</strong>sis, ")"<br />

4.4 RECODE: The Arguments<br />

The composition of <strong>the</strong> arguments in a complex recode is exactly <strong>the</strong> same as in <strong>the</strong> single argument situation. An<br />

argument is often simply <strong>the</strong> name of a variable, but it can be a complex expression.<br />

RECODE ( Age,<br />

all tests are made using <strong>the</strong> single variable age. When <strong>the</strong>re are two arguments:<br />

RECODE ( Age | <strong>Inc</strong>ome,<br />

test values must be supplied for both variables Age and <strong>Inc</strong>ome. A three argument example:<br />

RECODE ( Age | Husband.<strong>Inc</strong>ome + Wife.<strong>Inc</strong>ome | Region<br />

requires 3 values for each test. The first is a value for Age. The second is a value for <strong>the</strong> sum of variables Husband.<strong>Inc</strong>ome<br />

and Wife.<strong>Inc</strong>ome. The third value is for variable Region. The arguments and <strong>the</strong> first test for this<br />

RECODE might look like:<br />

RECODE ( Age |Husband.<strong>Inc</strong>ome + Wife.<strong>Inc</strong>ome | Region,<br />

le 30 | le 35000 | 'East' = 1, ...]<br />

The arguments can be numeric or character or a mixture of <strong>the</strong> two. Each value in a test must be <strong>the</strong> same<br />

data type as <strong>the</strong> corresponding argument. In <strong>the</strong> previous RECODE “30” and “35000” are appropriate values for<br />

<strong>the</strong> two numeric arguments and “East” is an appropriate value for <strong>the</strong> third argument, <strong>the</strong> character variable<br />

Region.<br />

4.5 The RECODE Tests<br />

There are two general tests which do not use a test segment for each argument. One is for missing arguments, <strong>the</strong><br />

o<strong>the</strong>r is for good arguments.<br />

M = result<br />

is successful when ANY of <strong>the</strong> recode arguments is missing.<br />

G = result<br />

is successful when ALL of <strong>the</strong> recode arguments are good (non-missing).<br />

M=, when used, is usually placed at ei<strong>the</strong>r <strong>the</strong> beginning or end of <strong>the</strong> tests. G=when used is usually placed<br />

at <strong>the</strong> end of <strong>the</strong> tests. Since tests are processed in a strict left <strong>to</strong> right order placing both of <strong>the</strong>m at <strong>the</strong> beginning<br />

causes all <strong>the</strong> rest of <strong>the</strong> tests <strong>to</strong> be ignored. Processing of a recode s<strong>to</strong>ps as soon as <strong>the</strong>re is a successful test. Any


<strong>PPL</strong>: NCOT and RECODE 4.7<br />

set of arguments is ei<strong>the</strong>r all good (G=) or has some missing values (M=). When both of <strong>the</strong>m are placed at <strong>the</strong><br />

beginning of <strong>the</strong> tests one of <strong>the</strong>m will be successful and <strong>the</strong> remaining tests will never be processed.<br />

[ GEN Group:c = RECODE ( Age | Region,<br />

M = M3,<br />

LT 30 | 'east' = 'one',<br />

GE 30 | NE 'east' = 'two',<br />

G = 'three' ) ]<br />

Each test, except for M= and G=, is composed of as many test segments as <strong>the</strong>re are arguments. The vertical<br />

bar is used <strong>to</strong> separate <strong>the</strong> test segments within each test. In <strong>the</strong> example above <strong>the</strong>re are 4 tests. The first test<br />

“M=”is not segmented. It returns missing 3 when ei<strong>the</strong>r Age or Region is missing. The next two tests have 2<br />

segments, one for each of <strong>the</strong> two arguments Age and Region. Finally “G=”, an unsegmented test, assigns <strong>to</strong><br />

Group three any remaining case with non-missing values on both Age and Region<br />

A test segment consists of one or more comparisons. A comparison consists of:<br />

1. An optional logical opera<strong>to</strong>r such as<br />

LT (less than) LE (less than or equal),<br />

EQ (equal), NE (not equal),<br />

GE (greater than or equal) GT (greater than).<br />

EQ is assumed. LT, LE, GT, GE can only be used with a single numeric or character constant.<br />

2. The values <strong>to</strong> be tested. These are usually one or more constants but can also be a definition. Definitions<br />

are discussed below.<br />

The things that can be tested:<br />

1. numeric constant such as 12.7, 13 or 55<br />

2. numeric range such as 1 <strong>to</strong> 8<br />

3. character constant such as 'east'<br />

4. character range such as 'a' <strong>to</strong> 'e'<br />

5. G - any good value<br />

6. M - any missing value<br />

7. M1, M2, or M3 for MISSING1, MISSING2 or MISSING3<br />

8. (nnn) provides <strong>the</strong> number, for example “(123)”, of a definition containing <strong>the</strong> values <strong>to</strong> be tested.<br />

Definitions are described below.<br />

There can be several comparisons in a test segment. The test segment for an argument is successful when any<br />

one of <strong>the</strong> comparisons is successful. An example of a multiple comparison:<br />

LT 20, GT 30 = 11<br />

The segment is successful for arguments less than 20 or greater than 30. This is <strong>the</strong> same as:<br />

NE 20 TO 30 = 11<br />

In a long series of tests, it is often necessary <strong>to</strong> repeat a test segment several times. This repetition can be<br />

minimized by making use of <strong>the</strong> fact that a null segment au<strong>to</strong>matically repeats <strong>the</strong> previous test for that segment.<br />

[ SET <strong>Inc</strong>ome.Groups = RECODE<br />

( area code | income,<br />

609 908 201 215 | LE 30000 = 1,<br />

| GT 30000 = 2,<br />

NE 609 908 201 215 | LE 30000 = 3,<br />

| GT 30000 = 4 ) }


4.8 <strong>PPL</strong>: NCOT and RECODE<br />

is <strong>the</strong> same as<br />

[ SET <strong>Inc</strong>ome.Groups = RECODE<br />

( area code | income,<br />

609 908 201 215 | LE 30000 = 1,<br />

609 908 201 215 | GT 30000 = 2,<br />

NE 609 908 201 215 | LE 30000 = 3,<br />

NE 609 908 201 215 | GT 30000 = 4 ) ]<br />

Note: “NE 609 908 201 215” is true when <strong>the</strong> area code value is not equal <strong>to</strong> ANY of <strong>the</strong>m.<br />

4.6 Defining a Set of Constants<br />

When a set of constants is used repeatedly, <strong>the</strong>y can be defined as a group, given an integer label (from 1 <strong>to</strong><br />

999999), and referred <strong>to</strong> by using that label.<br />

[ SET <strong>Inc</strong>ome.Groups = recode (<br />

area code | income,<br />

( DEFINE 101 = 609 908 201 215),<br />

(101) | LE 30000 = 1,<br />

| GT 30000 = 2,<br />

NE (101) | LE 30000 = 3,<br />

GT 30000 = 4 ) ]<br />

There can be many such definitions within a single RECODE. They must follow <strong>the</strong> list of arguments and<br />

precede <strong>the</strong> tests. The format is ei<strong>the</strong>r “DEFINE” or “DEF” followed by a numeric label, an equal sign (=) and a<br />

list of values. The entire definition is in paren<strong>the</strong>ses and is followed by a comma. The numeric labels in <strong>the</strong> definitions<br />

must be unique. The following definitions cause an error because both use <strong>the</strong> label “1”:<br />

( DEFINE 1 = 609 908 201 ),<br />

( DEFINE 1 = 215 412 610 717 814 ),<br />

A given definition set can be referenced many times, and can be used for any of <strong>the</strong> arguments. A given test<br />

segment can reference several definition sets and use additional values as well. Figure 4.2 contains both <strong>the</strong> command<br />

with <strong>PPL</strong> for a multiple-variable RECODE and <strong>the</strong> resulting output file Because <strong>the</strong> recode action<br />

progresses from left <strong>to</strong> right. It is easy <strong>to</strong> flag as errors <strong>the</strong> cases which have conflicting postal zip codes and telephone<br />

area codes. Definitions 1 and 101 represent <strong>the</strong> area codes and zip codes for New Jersey. Definitions 2 and<br />

102 represent <strong>the</strong> area codes and zip codes for Pennsylvania.<br />

( DEFINE 1 = 201 609 908 ),<br />

( DEFINE 2 = 215 412 610 717 814 ),<br />

( DEFINE 101 = '07000' TO '07900' '08001' TO '08990' ),<br />

( DEFINE 102 = '15201' TO '19980' ),<br />

Given <strong>the</strong>se two definitions, <strong>the</strong>se two tests are <strong>the</strong> same:<br />

(1) M1 | (101) = 'New Jersey'<br />

201 609 908 M1 | '07000' TO '07900' '08001' <strong>to</strong> '08990' = 'New Jersey'<br />

Any case with a value on Area.code that is ei<strong>the</strong>r included in definition 1 or is missing, and with a value on variable<br />

Zip that is included in definition 101 is given a value of “New Jersey” on variable State:<br />

(1) M1 | (101) = 'New Jersey',<br />

A case is also coded as “New Jersey” if it has a value on Area.code that is one of <strong>the</strong> definition 1 values and is<br />

ei<strong>the</strong>r missing or has a zip code that is among <strong>the</strong> values for definition 101:<br />

(1) | (101) M1 = 'New Jersey',


<strong>PPL</strong>: NCOT and RECODE 4.9<br />

__________________________________________________________________________<br />

Figure 4.2 Multi-Variable RECODE With Definitions<br />

MODIFY States [<br />

GEN State:c = RECODE ( Area.code | Zip,<br />

( DEFINE 1 = 201 609 908 ),<br />

( DEFINE 2 = 215 412 610 717 814 ),<br />

( DEFINE 101 = '07000' TO '07900' '08001' TO '08990' ),<br />

( DEFINE 102 = '15201' TO '19980' ),<br />

(1) M | (101) = 'New Jersey',<br />

(1) | (101) M = 'New Jersey',<br />

(2) M | (102) = 'Pennsylvania',<br />

(2) | (102) M = 'Pennsylvania',<br />

(1) (2) | (101) (102) = 'ERROR',<br />

M='Undefined', G='O<strong>the</strong>r' ) ], OUT States $<br />

File States<br />

Area<br />

code Zip State<br />

201 - New Jersey<br />

313 30225 O<strong>the</strong>r<br />

609 08525 New Jersey<br />

215 08525 ERROR<br />

412 16030 Pennsylvania<br />

- 19340 Pennsylvania<br />

215 - Pennsylvania<br />

__________________________________________________________________________<br />

The same procedure is used for <strong>the</strong> area codes and zip codes used <strong>to</strong> set State <strong>to</strong> “Pennsylvania”. Any case<br />

that has not passed one of <strong>the</strong>se 4 tests, and has a non-missing value for area code that is among <strong>the</strong> values in ei<strong>the</strong>r<br />

of definitions 1 or 2 with a zip code that is in ei<strong>the</strong>r of <strong>the</strong> definitions 101 or 102 has a coding problem: ei<strong>the</strong>r a<br />

New Jersey area code and a Pennsylvania zip code or a Pennsylvania area code and a New Jersey zip code. The<br />

value for State on <strong>the</strong>se cases is set <strong>to</strong> “ERROR”.<br />

Any case that has good values on both variables that are not among any of <strong>the</strong> lists in <strong>the</strong> definitions, is caught<br />

by <strong>the</strong> 'G=' test and is set <strong>to</strong> “O<strong>the</strong>r”. Any case which has an undefined good value on one of <strong>the</strong> variables and<br />

missing on <strong>the</strong> o<strong>the</strong>r variable is caught by <strong>the</strong> 'M=” test and is set <strong>to</strong> “Undefined”, as are cases that are missing on<br />

both variables.<br />

4.7 The Result Values<br />

The results values can be any of <strong>the</strong> following:<br />

1. M1 or M missing 1<br />

2. M2 missing 2<br />

3. M3 missing 3


4.10 <strong>PPL</strong>: NCOT and RECODE<br />

4. nn a numeric constant such as 22 or 1.543<br />

5. 'ccccc' a character constant<br />

6. #tt a temporary scratch variable<br />

7. ##tt a permanent scratch variable<br />

8. ARGn <strong>the</strong> value of <strong>the</strong> cited argument. If <strong>the</strong> recode has 2 arguments, <strong>the</strong>y are referred<br />

<strong>to</strong> as ARG1 and ARG2.<br />

All <strong>the</strong> result values in a given recode must be <strong>the</strong> same type. In o<strong>the</strong>r words, you cannot use a numeric constant<br />

and a character scratch value as results in <strong>the</strong> same recode.<br />

Using scratch variables allows a recode <strong>to</strong> access o<strong>the</strong>r variables in a case.<br />

[ GENERATE #n = d;<br />

SET XYX = RECODE ( a|b|c, 1|2|3 = #n, etc.<br />

When none of <strong>the</strong> tests is successful and G= and M= are not used, <strong>the</strong> result depends on <strong>the</strong> number and type<br />

of <strong>the</strong> arguments. If <strong>the</strong> recode has one argument:<br />

1. if <strong>the</strong> argument and result types are <strong>the</strong> same, <strong>the</strong> argument is used as <strong>the</strong> result. This is useful when<br />

some values are <strong>to</strong> be changed, but <strong>the</strong> rest should remain <strong>the</strong> same.<br />

2. if <strong>the</strong> argument and result types differ, and <strong>the</strong> argument is missing, <strong>the</strong> result is set <strong>to</strong> <strong>the</strong> same kind<br />

of missing.<br />

3. if <strong>the</strong> argument and result types differ, and <strong>the</strong> argument is not missing, an error occurs.<br />

If <strong>the</strong> recode has more than one argument:<br />

4. if any argument is missing, <strong>the</strong> result is set <strong>to</strong> <strong>the</strong> same kind of missing.<br />

5. if all arguments are non-missing, an error occurs.<br />

In all situations except (possibly) <strong>the</strong> first, it is good practice <strong>to</strong> use M= and G=, so that <strong>the</strong> recode is fully<br />

defined.<br />

4.8 RECODE or IF/SET<br />

Using RECODE is usually clearer and faster than using a series of IFs and SETs <strong>to</strong> do <strong>the</strong> same thing. For example,<br />

file AAA has variables AGE and REGION. We need a new variable named SECTOR <strong>to</strong> be created from <strong>the</strong><br />

values on age and region. We want SECTOR <strong>to</strong> be:<br />

• M1 if ei<strong>the</strong>r age or region is missing<br />

• 1 if age LT 30 and region EQ 'east'<br />

• 2 if age LT 30 and region EQ 'central'<br />

• 3 if age LT 30 and region EQ 'west'<br />

• 4 if age GE 30 and region EQ 'east'.<br />

• 5 if age GE 30 and region EQ 'central'<br />

• 6 if age GE 30 and region EQ 'west'<br />

• M2 if age and region have GOOD values, but have not matched a previous test.<br />

Figure 4.3 contains <strong>the</strong> <strong>PPL</strong> statements first <strong>to</strong> do this recode using IF and SET and <strong>the</strong>n using a multi-argument<br />

RECODE<br />

If file AAA had age and region as shown, ei<strong>the</strong>r of <strong>the</strong> MODIFY commands in Figure 4.3 would produce <strong>the</strong><br />

following results:


<strong>PPL</strong>: NCOT and RECODE 4.11<br />

__________________________________________________________________________<br />

Figure 4.3 RECODE or IF/SET<br />

Using IF/SET:<br />

MODIFY aaa<br />

[ GENERATE sec<strong>to</strong>r = .m1. ;<br />

IF age good and region good, SET sec<strong>to</strong>r = .m2.;<br />

IF age lt 30 and region EQ 'east' SET sec<strong>to</strong>r = 1;<br />

IF age lt 30 and region EQ 'central' SET sec<strong>to</strong>r = 2;<br />

IF age lt 30 and region EQ 'west' SET sec<strong>to</strong>r = 3;<br />

Using RECODE:<br />

IF age ge 30 and region EQ 'east' SET sec<strong>to</strong>r = 4;<br />

IF age ge 30 and region EQ 'central' SET sec<strong>to</strong>r = 5;<br />

IF age ge 30 and region EQ 'west' SET sec<strong>to</strong>r = 6;<br />

], out bbb $<br />

MODIFY aaa<br />

[ GENERATE sec<strong>to</strong>r = RECODE ( age|region,<br />

M = m1,<br />

lt 30| 'east' = 1,<br />

| 'central' = 2,<br />

| 'west' = 3,<br />

ge 30| 'east' = 4,<br />

| 'central' = 5,<br />

| 'west' = 6,<br />

G = m2 ) ], out bbb$<br />

__________________________________________________________________________<br />

Age Region Sec<strong>to</strong>r<br />

22 -- -<br />

23 central 2<br />

44 west 6<br />

19 south --<br />

30 east 4<br />

4.9 RECODE Pointers<br />

If you are doing a very complex recode with many variables and values, <strong>the</strong>re may be some combinations that are<br />

far more likely than o<strong>the</strong>rs. You can improve <strong>the</strong> speed of <strong>the</strong> command by arranging your recodes so that <strong>the</strong><br />

most common results are among <strong>the</strong> early tests. Suppose you are recoding 60 country names in<strong>to</strong> integers. One<br />

approach would be <strong>to</strong> organize <strong>the</strong> tests alphabetically, so that 'albania=22' precedes 'china'=12. If, however, half<br />

of <strong>the</strong> cases come from just five countries, <strong>the</strong> recode will be faster if those five tests are placed before <strong>the</strong> fiftyfive<br />

o<strong>the</strong>rs.<br />

In <strong>the</strong> same manner, putting M=m1 or such at <strong>the</strong> beginning of <strong>the</strong> tests will be faster when many of <strong>the</strong> cases<br />

have a missing value on <strong>the</strong> recode argument, but will be slightly slower when no cases have missing values on<br />

<strong>the</strong> recode argument.<br />

When you are dealing with missing values, it should be noted that M= and M|M|M= are different. Consider<br />

a three value recode:


4.12 <strong>PPL</strong>: NCOT and RECODE<br />

M= is successful when ANY argument is missing,<br />

M|M|M= is successful when ALL arguments are missing.<br />

When you are using EQ (equal) and NE (not equal) <strong>the</strong> phrase<br />

EQ 2 5 <strong>to</strong> 9 11<br />

should be thought of as<br />

The phrase:<br />

EQ 2, OR EQ 5 <strong>to</strong> 9, OR EQ 11.<br />

NE 2 5 <strong>to</strong> 9 11<br />

should be thought of as<br />

NE 2, AND NE 5 <strong>to</strong> 9, AND NE 11.<br />

Figure 4.4 shows successful EQ comparisons for arguments of 1, 2, m1, m2 and m3 when compared <strong>to</strong> test<br />

constants of 1, 2, m1, m2, m3, m and g. S means a successful comparison.<br />

__________________________________________________________________________<br />

Figure 4.4 EQ and NE Comparisons<br />

---EQ comparisons with---<br />

argument 1 2 M1 M2 M3 M G<br />

1 S . . . . . S<br />

2 . S . . . . S<br />

M1 . . S . . S .<br />

M2 . . . S . S .<br />

M3 . . . . S S .<br />

---NE comparisons with---<br />

argument 1 2 M1 M2 M3 M G<br />

1 . S S S S S .<br />

2 S . S S S S .<br />

M1 . . . S S . S<br />

M2 . . S . S . S<br />

M3 . . S S . . S<br />

__________________________________________________________________________<br />

4.10 XRECODE<br />

The X in Xrecode means eXact comparisons. Consider:<br />

RECODE( 'aBc', 'ABC'=1, 'aBc'=2, etc.<br />

XRECODE( 'aBc', 'ABC'=1, 'aBc'=2, etc.<br />

The RECODE returns 1, because recode ignores upper/lower case differences. Therefore, <strong>the</strong> argument value of<br />

aBc is matched by ABC. The XRECODE does not match aBc with ABC because <strong>the</strong> cases differ, and proceeds<br />

<strong>to</strong> <strong>the</strong> aBc test which succeeds, and returns 2.<br />

Suppose, however, you want <strong>to</strong> do a recode using two character arguments, REGION with case-independent<br />

comparisons (RECODE) and CODE with case specific comparisons (XRECODE). This can be done using XRE-<br />

CODE by:


<strong>PPL</strong>: NCOT and RECODE 4.13<br />

1. converting values of REGION <strong>to</strong> upper case as <strong>the</strong> recode begins, and <strong>the</strong>n<br />

2. using uppercase constants in its test segments.<br />

For example:<br />

XRECODE( UPPER(region)| code, 'EAST' | 'aBc' = 1, etc.


4.14 <strong>PPL</strong>: NCOT and RECODE<br />

<strong>PPL</strong> Functions:<br />

NCOT (exp, ncot instructions)<br />

SUMMARY<br />

does N-way dicho<strong>to</strong>mizations (divisions) of numeric values and recodes those values according <strong>to</strong> instructions<br />

given in <strong>the</strong> second argument. The arguments for NCOT must be enclosed in paren<strong>the</strong>ses.<br />

The first argument is an expression which may be a simple variable name or a complex expression. This<br />

is followed by cutting points and possibly a step size. All values less than or equal <strong>to</strong> <strong>the</strong> first cutting<br />

point become a “1”. All values greater than <strong>the</strong> first cutting point and less than or equal <strong>to</strong> <strong>the</strong> second<br />

cutting point become a “2”.<br />

MODIFY File1<br />

[ SET Age = NCOT ( Age, 14 ) ;<br />

GENERATE NN = NCOT ( FRAC (T1), .3, .6, .9 ) ;<br />

SET ZZ = NCOT ( ZZ, 20, 50/5, 90/10 ) ],<br />

OUT File2 $<br />

The final value includes all <strong>the</strong> numbers greater than <strong>the</strong> final cutting point. Thus, <strong>the</strong>re will always be<br />

one more possible output value than <strong>the</strong>re are cutting points. The instruction “20, 50/5” defines cutting<br />

points from 20 through 50 in steps of 5. This is a shorthand way of providing <strong>the</strong> cutting points 25, 30,<br />

35, and so on. In <strong>the</strong> example above, cutting points for <strong>the</strong> variable ZZ will occur at 20, 25, 30, 35, 40,<br />

45, 50, 60, 70, 80, and 90. The resulting values will be from 1 <strong>to</strong> 12.<br />

RECODE (exp, recode instructions)<br />

recodes <strong>the</strong> numeric or character variable specified by <strong>the</strong> expression according <strong>to</strong> <strong>the</strong> instructions given<br />

in <strong>the</strong> second argument:<br />

MODIFY Nursery<br />

[ GENERATE Coded.Age =<br />

RECODE ( ROUND (Age), LE 4 = 1, 5 = 2, GE 6 = 3 ) ;<br />

SET Race =<br />

RECODE ( Race, 0 3 = 2, M3 = 1 ) ;<br />

GENERATE Gender:C =<br />

RECODE ( Sex, 1 = 'Boy', 2 = 'Girl', G = '?' ) ],<br />

OUT Nursery $<br />

RECODE is a function and its arguments must be enclosed within paren<strong>the</strong>ses. The first argument following<br />

<strong>the</strong> RECODE may be a variable name or a complicated expression.<br />

Recoding may be applied <strong>to</strong> numeric values:<br />

1 TO 5 = 1 one through five become one<br />

6 = 'F' sixes become F<br />

7 TO 9 14 = 3 seven, eight, nine, and fourteen<br />

become three


<strong>PPL</strong>: NCOT and RECODE 4.15<br />

Recoding may be applied <strong>to</strong> character values:<br />

'male' = 1 values of male become 1<br />

'male' = 'm' values of male become m<br />

'A' TO 'DZ' = 3 A through DZ become three<br />

After <strong>the</strong> recode, <strong>the</strong> new values must be all of one data type; that is, <strong>the</strong>y must be ei<strong>the</strong>r all numeric or<br />

all character. Recoding may be applied <strong>to</strong> missing values:<br />

M = 3 missing values become threes<br />

M1 = 'DK' missing one becomes DK<br />

Recoding can be applied <strong>to</strong> what is left over after <strong>the</strong> o<strong>the</strong>r recodes are completed:<br />

G = 4 unrecoded good values become 4<br />

G = '?' unrecoded good values become ?<br />

When <strong>the</strong> recoding transforms character <strong>to</strong> numeric variables or numeric <strong>to</strong> character variables, ei<strong>the</strong>r all<br />

possible values must be recoded or G must be used <strong>to</strong> avoid an error situation. There may be only one G<br />

= in a RECODE.<br />

RECODE multiple argument<br />

recode values are based on several variables or expressions.<br />

[ GEN Group:c = RECODE ( Age | Region,<br />

M = M3,<br />

LT 30 | 'east' = 'one',<br />

GE 30 | NE 'east' = 'two',<br />

G = 'three' ) ]<br />

In this example <strong>the</strong> recode is based on <strong>the</strong> combined values of Age and Region. There are four possible<br />

results. Variable Group = ’one’ when Age is less than 30 and Region = 'East'. Variable Group = 'two'<br />

when age is greater or equal <strong>to</strong> thirty and Region does not equal 'east'. Any case that is missing on ei<strong>the</strong>r<br />

variable is set <strong>to</strong> missing type 3 on variable Group. Any case with good values on both Age and Region<br />

that was not mentioned in <strong>the</strong> previous tests has a value of 'three' on variable Groups.<br />

XRECODE(exp, recode instructions)<br />

recodes character strings eXactly — that is, respecting <strong>the</strong> specified case (lower, upper or mixed) of <strong>the</strong><br />

string:<br />

[ GEN Symp<strong>to</strong>m = XRECODE (Note, 'a' = 1, 'b' = 2, 'A' = 0) ;<br />

XRECODE works like RECODE with regard <strong>to</strong> o<strong>the</strong>r aspects. Character strings may be exactly recoded<br />

<strong>to</strong> numbers or o<strong>the</strong>r character strings. XRECODE can be used for both simple and complex recodes.


5<br />

<strong>PPL</strong>:<br />

DO LOOPS and<br />

IF-THEN-ELSE Blocks<br />

The first section of this chapter documents DO loops. DO loops enable you <strong>to</strong> do repetitive operations easily. The<br />

second section covers <strong>the</strong> use of DO loops <strong>to</strong> generate or rename groups of variables. The last section covers <strong>the</strong><br />

use of IF-THEN-ELSE blocks <strong>to</strong> handle complex logic. (The use of a simple IF was covered in <strong>the</strong> previous<br />

chapters.)<br />

5.1 DO LOOPS<br />

DO loops specify repetitive instructions. They are useful when it is necessary <strong>to</strong> do <strong>the</strong> same modification on a<br />

number of different variables. Repetitive actions can of course be done one at a time, repeating <strong>the</strong> modification<br />

clauses and changing <strong>the</strong> variable names or positions as many times as necessary:<br />

LIST F1 [ SET V(1) = RECODE ( V(1), 6 TO 9 = 5 ) ;<br />

SET V(2) = RECODE ( V(2), 6 TO 9 = 5 ) ;<br />

SET V(3) = RECODE ( V(3), 6 TO 9 = 5 ) ;<br />

SET V(5) = RECODE ( V(5), 6 TO 9 = 5 ) ] $<br />

However, this is tedious and may be done more easily using a DO loop:<br />

LIST F1 [<br />

DO #J USING V(1) TO V(3) V(5);<br />

SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 );<br />

ENDDO ] $<br />

The DO statement above has five components:<br />

1. DO which is followed by<br />

2. #J a temporary or permanent numeric scratch variable. The value of this changes<br />

each time <strong>the</strong> loop is traversed<br />

3. USING which indicates that a list of variables follows<br />

4. V(1) TO V(3) V(5) a list of variables associated with <strong>the</strong> loop<br />

5. ; ends <strong>the</strong> list of variables and <strong>the</strong>refore ends <strong>the</strong> DO statement.<br />

The USING list has four variables; <strong>the</strong>refore <strong>the</strong> statements up <strong>to</strong> <strong>the</strong> ENDDO will be done four times. The<br />

scratch variable #J is set <strong>to</strong> <strong>the</strong> POSITION of <strong>the</strong> next variable in <strong>the</strong> list each time <strong>the</strong> loop repeats. Thus, in <strong>the</strong><br />

four iterations, it takes on <strong>the</strong> values 1, 2, 3, and 5.<br />

The V vec<strong>to</strong>r, as always, holds <strong>the</strong> data of <strong>the</strong> current case. In <strong>the</strong> SET statement, variable V(#J) is recoded.<br />

Therefore we recode variables 1, 2, 3, and 5 in <strong>the</strong> four iterations. Note that in V(#J) usage, <strong>the</strong> subscript expression<br />

(here, just <strong>the</strong> #J) must result in an integer that is within <strong>the</strong> range of variables in <strong>the</strong> file. In o<strong>the</strong>r words,<br />

fractional or negative subscripts like V(#J+.6) and V(-#J) would be errors.<br />

The DO loop always ends with <strong>the</strong> ENDDO instruction.


5.2 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

The USING loop is one form of DO loop. The o<strong>the</strong>r forms is a range loop. In this form:<br />

DO #J = 5, 13, 1;<br />

<strong>the</strong> DO scratch variable takes its value from <strong>the</strong> DO range which begins with <strong>the</strong> first number (5) and, increases<br />

through <strong>the</strong> second number (13) in steps of <strong>the</strong> third number (1). Here #J begins with 5, becomes 6, <strong>the</strong>n 7, etc.<br />

The final time through <strong>the</strong> loop #J has <strong>the</strong> value 13 and <strong>the</strong> loop has been executed 9 times. When <strong>the</strong> stepsize is<br />

1, it may be omitted.<br />

There is no limit <strong>to</strong> <strong>the</strong> number of <strong>PPL</strong> instructions that may be done between <strong>the</strong> DO statement and <strong>the</strong> END-<br />

DO statement. The DO can contain o<strong>the</strong>r DOs or IF/THEN/ELSE blocks (described later in this chapter). Figure<br />

5.1 contains <strong>the</strong> input, <strong>the</strong> LIST command and <strong>the</strong> resulting prin<strong>to</strong>ut for a simple DO loop with a list of variables.<br />

__________________________________________________________________________<br />

Figure 5.1 Simple DO Loop with a List of Variables<br />

FILE F1<br />

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6<br />

1 9 2 7 6 8<br />

6 4 5 2 7 3<br />

LIST F1 [<br />

DO #J USING V(1) TO V(3) V(5);<br />

SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 );<br />

ENDDO ] $<br />

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6<br />

1 5 2 7 5 8<br />

5 4 5 2 5 3<br />

__________________________________________________________________________<br />

5.2 DO USING a Variable List<br />

DO USING specifies a list of variables or values <strong>to</strong> which <strong>the</strong> subsequent instructions are <strong>to</strong> be applied. Both<br />

variable names and positions may be used in <strong>the</strong> list of variables. A user-supplied scratch variable must be provided.<br />

This scratch variable is <strong>the</strong>n available for general use within <strong>the</strong> loop.<br />

[ DO #J USING V(1) <strong>to</strong> V(3);<br />

The scratch variable is “#J” in this example, but it may be any legal temporary or permanent numeric scratch variable<br />

name. In this example <strong>the</strong> range of #J is from 1 through 3, <strong>the</strong> positions of variables V(1), V(2) and V(3).<br />

The subsequent modification instructions:<br />

SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 )<br />

are done three times, first when #J has <strong>the</strong> value 1,<br />

SET V(1) = RECODE ( V(1), 6 TO 9 = 5 )<br />

and <strong>the</strong>n when it has <strong>the</strong> values 2 and 3:<br />

SET V(2) = RECODE ( V(2), 6 TO 9 = 5 )<br />

SET V(3) = RECODE ( V(3), 6 TO 9 = 5 )<br />

In effect, a “loop” is set up — <strong>the</strong> first value of #J is used in <strong>the</strong> instruction, <strong>the</strong>n <strong>the</strong> next, and so on, until <strong>the</strong> last<br />

value of #J is used. The loop s<strong>to</strong>ps when all <strong>the</strong> instructions have been processed for <strong>the</strong> last value of #J


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.3<br />

Variable names and wildcards may be used in <strong>the</strong> DO list. If names are used in a DO instruction — for<br />

example:<br />

DO #J USING Math.Test TO English.Test;<br />

<strong>the</strong> positions of <strong>the</strong> variables in <strong>the</strong> file defines <strong>the</strong> range of <strong>the</strong> scratch variable. If Math.Test is <strong>the</strong> first variable<br />

in <strong>the</strong> file and English.Test is <strong>the</strong> third, #J has <strong>the</strong> values 1 through 3. If Math.Test is <strong>the</strong> sixth variable in <strong>the</strong> file<br />

and English.Test is <strong>the</strong> ninth, #J has <strong>the</strong> values 6 through 9. In ei<strong>the</strong>r case, <strong>the</strong> appropriate variables are referenced<br />

in <strong>the</strong> instruction following <strong>the</strong> DO:<br />

[ DO #J USING Math.Test TO English.Test ;<br />

SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 );<br />

ENDDO ]<br />

This DO phrase may be interpreted: for <strong>the</strong> scratch variable #J, which is initially <strong>the</strong> position of Math.Test<br />

and subsequently <strong>the</strong> positions of <strong>the</strong> o<strong>the</strong>r variables in <strong>the</strong> list, set each variable in turn <strong>to</strong> <strong>the</strong> RECODE function<br />

of itself, setting any value between 6 and 9 <strong>to</strong> 5. As <strong>the</strong> DO loop is processed, <strong>the</strong> variable position represented<br />

by V(#J) changes. Initially, <strong>the</strong> variable position is that of first variable in <strong>the</strong> list. With each new loop <strong>the</strong> scratch<br />

variable takes as its value <strong>the</strong> position of <strong>the</strong> next variable in <strong>the</strong> USING list.<br />

DO #J USING *;<br />

requests <strong>the</strong> re-use of <strong>the</strong> USING list from <strong>the</strong> most recent DO USING loop.<br />

__________________________________________________________________________<br />

Figure 5.2 DO With Two Scratch Variables<br />

File cold<br />

p<br />

Stuffy<br />

Date Headache Fever Nose Cough<br />

011593 1 0 0 0<br />

012293 0 1 1 1<br />

020993 0 1 0 1<br />

021093 1 0 0 0<br />

MODIFY Cold [ DO #J #N USING headache <strong>to</strong> Cough;<br />

IF V(#J) EQ 1, SET V(#J) = #N,<br />

F.SET V(#J) = .M1.;<br />

ENDDO ],<br />

OUT Cold $<br />

LIST Cold $<br />

Stuffy<br />

Date Headache Fever Nose Cough<br />

011593 1 - - -<br />

012293 - 2 3 4<br />

020993 - 2 - 4<br />

021093 1 - - -<br />

__________________________________________________________________________<br />

Ei<strong>the</strong>r form of DO loop may have a second scratch variable which has as its value <strong>the</strong> number of times <strong>the</strong><br />

DO is executed. Figure 5.2 illustrates this usage. The file has a series of dummy (0/1) variables. The purpose of


5.4 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

<strong>the</strong> MODIFY command is <strong>to</strong> convert <strong>the</strong> zeros <strong>to</strong> missing and <strong>the</strong> ones <strong>to</strong> <strong>the</strong> position of <strong>the</strong> variable in <strong>the</strong> DO<br />

list. This is an easy way <strong>to</strong> convert a series of multiple response questions which are coded as dummy variables<br />

in<strong>to</strong> <strong>the</strong> 1 through n type of code which <strong>the</strong> SURVEY command expects for multiple response banner (column)<br />

variables.<br />

MODIFY Cold [ DO #j #n USING headache <strong>to</strong> Cough;<br />

The first scratch variable, #j, takes on <strong>the</strong> positions (2-5) of <strong>the</strong> 4 variables in <strong>the</strong> USING list. The second<br />

scratch variable takes on <strong>the</strong> value 1 <strong>the</strong> first time through <strong>the</strong> loop, 2 <strong>the</strong> second time through <strong>the</strong> loop, 3 <strong>the</strong> third<br />

time through, and 4 in <strong>the</strong> final loop.<br />

5.3 DO Stepping Through a Range<br />

The second form of <strong>the</strong> DO uses a range of numeric constants or expressions.<br />

DO #K = 15, 24;<br />

The range of <strong>the</strong> scratch variable #K is 15 through 24. Because <strong>the</strong>re is no third argument, a stepsize of 1 is assumed<br />

and <strong>the</strong> values of #K are 15, 16, 17, etc.<br />

DO #K = 15, 24, 2;<br />

Here <strong>the</strong> stepsize is 2 and #K takes on <strong>the</strong> value 15, 17, 19, etc. The constants and <strong>the</strong> stepsize can be any numeric<br />

value or expression that is available at that moment. They can, in o<strong>the</strong>r words, use values which change from case<br />

<strong>to</strong> case. The values can be real numbers with a fractional part. The only exception <strong>to</strong> this is when <strong>the</strong> DO is used<br />

<strong>to</strong> generate or rename a list of variables. The values in a GENERATE or RENAME loop must be available at <strong>the</strong><br />

beginning of <strong>the</strong> command and must have integer values.<br />

__________________________________________________________________________<br />

Figure 5.3 DO: Range and Stepsize<br />

File Tests<br />

pre post pre post pre post<br />

.1 .1 .2 .2 .3 .3<br />

68 75 92 94 89 88<br />

73 73 84 93 85 89<br />

78 79 72 80 73 75<br />

MODIFY Tests<br />

[ DO #j = 2, 6, 2;<br />

SET V(#j) = V(#j) - V(#j-1);<br />

ENDDO ],<br />

OUT Test2 $<br />

File Test2<br />

pre post pre post pre post<br />

.1 .1 .2 .2 .3 .3<br />

68 7 92 2 89 -1<br />

73 0 84 9 85 4<br />

78 1 72 8 73 2<br />

__________________________________________________________________________


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.5<br />

Figure 5.3 contains a small data set with 3 sets of values: a pre score and a post score. The MODIFY command<br />

is used <strong>to</strong> change <strong>the</strong> post values <strong>to</strong> <strong>the</strong> difference between <strong>the</strong> pre value and <strong>the</strong> corresponding post value.<br />

DO #J = 2, 6, 2;<br />

has a value for #J that is 2 <strong>the</strong> first time through <strong>the</strong> loop. Because <strong>the</strong> stepsize is 2, <strong>the</strong> second time through <strong>the</strong><br />

loop <strong>the</strong> value of #J is 4. The final time through <strong>the</strong> loop <strong>the</strong> value of #J is 6. Because <strong>the</strong> subscripts for <strong>the</strong> V<br />

vec<strong>to</strong>r can be expressions, <strong>the</strong> use of V(#j-1) points <strong>to</strong> each of <strong>the</strong> pre variables in turn as it takes on <strong>the</strong> values 1,<br />

3, and 5.<br />

The DO numbers can be fractional values.<br />

DO #J = .5, .8, .1;<br />

This loop will have 4 iterations with #j as .5, .6, .7, and .8 . The range can go backwards with ei<strong>the</strong>r a supplied<br />

negative value or a default -1.<br />

DO #J = 3, -3, -2;<br />

The arguments for <strong>the</strong> DO can be expressions. If you wish <strong>to</strong> loop with a step argument through a list of variables<br />

and you know <strong>the</strong> variable names but not <strong>the</strong> locations you can <strong>to</strong> <strong>the</strong> following:<br />

DO #J = loc(pre.1) <strong>to</strong> loc(pre.3), 2;c<br />

Figure 5.4 illustrates <strong>the</strong> difference between <strong>the</strong> types of DO loops. In <strong>the</strong> first command #J takes on <strong>the</strong> positions<br />

of <strong>the</strong> variables in <strong>the</strong> USING list. In <strong>the</strong> second command #J begins with 2, <strong>the</strong> value of VAR1 and ends<br />

with 6, <strong>the</strong> value of VAR3.<br />

__________________________________________________________________________<br />

Figure 5.4 DO Loops: An Example of Each Type<br />

File XX<br />

VAR1 VAR2 VAR3<br />

2 4 6<br />

The Commands The Output<br />

PROCESS XX [ DO USING var1 TO var3; #J= 1 positions<br />

PUT #J; ENDDO ] $ #J= 2 of USING<br />

#J= 3 variables<br />

PROCESS XX [ DO #J = var1, var3; #J= 2 value var1<br />

PUT #J; #J= 3<br />

ENDDO ] $ #J= 4<br />

#J= 5<br />

#J= 6 value var3<br />

PROCESS XX [ DO #J = positions<br />

(loc)var1, (loc)var3, 2; #J= 1 var1<br />

PUT #J; #J= 3 var3<br />

ENDDO ] $<br />

__________________________________________________________________________


5.6 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

5.4 DO Loops: O<strong>the</strong>r Features<br />

Figure 5.5 illustrates <strong>the</strong> optional features of <strong>the</strong> DO. A DO can reference a label. For example:<br />

DO testloop #J ....<br />

This label is <strong>the</strong>n used as a statement label on <strong>the</strong> ENDDO statement:<br />

testloop: ENDDO;<br />

__________________________________________________________________________<br />

Figure 5.5 Labelled DO, EXITDO and NEXTDO<br />

File Myfile<br />

QA1 QA2 QA3 QB1 QB2 QB3<br />

2 3 4 - 1 1<br />

3 1 8 2 1 1<br />

7 3 6 1 1 1<br />

TEXT;<br />

First RECODE variables QA1 through QA3 in<strong>to</strong> <strong>the</strong> values 0-4. Then<br />

compute <strong>the</strong> average of QA1, QA2 and QA3. However, if <strong>the</strong> QB variable<br />

that corresponds <strong>to</strong> <strong>the</strong> QA variable is missing, exit <strong>the</strong> DO (EXITDO)<br />

and ignore <strong>the</strong> remaining values. If <strong>the</strong> QB variable that corresponds<br />

<strong>to</strong> <strong>the</strong> QA variable is a 2, move immediately (NEXTDO) <strong>to</strong> <strong>the</strong> next<br />

element in <strong>the</strong> loop and do not include <strong>the</strong> current value.<br />

$<br />

LIST Myfile [ GEN #Total = 0, GEN N = 0, GEN Average = .M.;<br />

...............<br />

DO testloop #j USING QA1 TO QA3;<br />

SET V(#j) = RECODE<br />

( V(#J), 0=M, 1 2=2, 3 TO 5=1, 6 8 9=3, G=4 );<br />

IF V(#J+3) MISSING, EXITDO;<br />

IF V(#J+3) EQ 2, NEXTDO;<br />

INCREASE #Total BY V(#J), INCREASE N;<br />

testloop: ENDDO;<br />

SET Average = #Total / N ] $<br />

QA1 QA2 QA3 QB1 QB2 QB3 N Average<br />

1 3 4 - 1 1 0 -<br />

2 1 3 2 1 1 2 2<br />

4 2 3 1 1 1 3 3<br />

__________________________________________________________________________<br />

Any statement that has a label can be used as <strong>the</strong> target of a GOTO. GOTO, which is discussed later in this chapter,<br />

provides a way <strong>to</strong> selectively execute <strong>the</strong> <strong>PPL</strong>. The label in a DO is also useful when <strong>the</strong>re are nested DO's.


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.7<br />

DO loop1 #J ....;<br />

<strong>PPL</strong> here;<br />

DO Loop2 #K ....;<br />

More <strong>PPL</strong> here;<br />

loop2: ENDDO;<br />

And yet more <strong>PPL</strong>;<br />

loop1: ENDDO;<br />

You can exit from a DO loop at any time by using <strong>the</strong> EXITDO <strong>PPL</strong> statement. EXITDO has <strong>the</strong> effect of a<br />

branch <strong>to</strong> <strong>the</strong> <strong>PPL</strong> (if any) after its ENDDO. <strong>PPL</strong> processing continues <strong>the</strong>re. NEXTDO, on <strong>the</strong> o<strong>the</strong>r hand, is a<br />

branch <strong>to</strong> <strong>the</strong> ENDDO statement where <strong>the</strong> DO counters are incremented. In Figure 5.5 <strong>the</strong> last 5 lines of <strong>the</strong> LIST<br />

command could have been written as:<br />

IF V(#J+3) MISSING, GOTO NEXT;<br />

IF V(#J+3) EQ 2, GOTO testloop;<br />

INCREASE #Total BY V(#J), INCREASE N;<br />

testloop: ENDDO;<br />

NEXT: SET Average = #Total / N ] $<br />

In Figure 5.5, QA2 and QA3 for <strong>the</strong> first case are not recoded. This is because QB1 is missing. As soon as<br />

QA1 is recoded <strong>the</strong> statement<br />

IF V(#J+3) MISSING, EXITDO;<br />

is executed. Since QB1 is missing, <strong>the</strong> loop is exited without processing <strong>the</strong> remaining variables for that case.<br />

Case 2 has it average calculated on just <strong>the</strong> last two values. This is because QB1 on that case is a 2. The statement:<br />

IF V(#J+3) EQ 2, NEXTDO;<br />

causes a branch <strong>to</strong> <strong>the</strong> ENDDO without including QA1 in <strong>the</strong> <strong>to</strong>tals.<br />

The DO scratch variable or variables are still defined when a DO exits. They remain set <strong>to</strong> whatever values<br />

<strong>the</strong>y had in <strong>the</strong> final DO iteration that was done.<br />

EXITDO and NEXTDO can be used in phrases like:<br />

IF Age GT 14, T.NEXTDO, F.EXITDO;<br />

EXITDO and NEXTDO can be followed by a DO statement label. Here, we exit all three loops from <strong>the</strong> innermost<br />

loop:<br />

DO aaa #J = 1, 2;<br />

DO bbb #K = 3, 4;<br />

DO ccc #M - 5, 6;<br />

EXITDO aaa;<br />

ccc: ENDDO;<br />

bbb: ENDDO;<br />

aaa: ENDDO;<br />

Even though we are out of <strong>the</strong> loops, #J and #K and #M can be used; <strong>the</strong>y have <strong>the</strong> values 1, 3, and 5. If <strong>the</strong> above<br />

EXITDO had no label, it would have exited only <strong>the</strong> DO ccc loop.<br />

5.5 GENERATE AND RENAME<br />

GENERATE and RENAME use <strong>the</strong> same conventions in creating variable names. When a single variable is involved<br />

<strong>the</strong>re is no need for a complex mask:


5.8 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

[ GENERATE Family.<strong>Inc</strong>ome;<br />

RENAME Test1 TO Math121;<br />

RENAME V(2) TO Chem34 ]<br />

RENAME requires <strong>the</strong> existing name, TO, and <strong>the</strong> new name, which must be a unique name in <strong>the</strong> file. GENER-<br />

ATE requires only <strong>the</strong> variable name.<br />

If a list of variables is <strong>to</strong> be generated or renamed, a DO loop may be used. A DO which contains a GEN-<br />

ERATE or RENAME may not contain o<strong>the</strong>r <strong>PPL</strong> statements. Also, if <strong>the</strong> DO uses a range (as in DO #J = 1, 5),<br />

<strong>the</strong> control values must be integer constants or integer scratch variables whose values are known when <strong>the</strong> command<br />

begins. It is necessary <strong>to</strong> know <strong>the</strong> range of values before any cases are read in order <strong>to</strong> properly set up <strong>the</strong><br />

renames or generates.<br />

5.6 Using GENERATE in DO Loops<br />

Typically, when GENERATE is used <strong>to</strong> create a new variable, a name for <strong>the</strong> variable is provided by <strong>the</strong> user or,<br />

if <strong>the</strong> “?” has been used, P-<strong>STAT</strong> generates a name. When GENERATE is used in a DO loop, multiple variables<br />

are created and unique names need <strong>to</strong> be provided or generated for <strong>the</strong>m. The format of a GENERATE within a<br />

DO loop is one of <strong>the</strong> following:<br />

GENERATE ? = value;<br />

GENERATE ? (mask) = value;<br />

GENERATE V(#J) (mask) = value;<br />

GENERATE V(##K) (mask) = value;<br />

If <strong>the</strong> variables are character, <strong>the</strong> :C or :C20 or such directly follows <strong>the</strong> mask or, if <strong>the</strong>re is no mask, <strong>the</strong> “?”.<br />

Masks are described below. The “= value” is optional; if not supplied, <strong>the</strong> variable is set <strong>to</strong> missing.<br />

When <strong>the</strong> “?” is used:<br />

[ DO #K USING Q3 TO Q5; GENERATE ? = SQRT ( V(#K) );<br />

ENDDO ]<br />

names for <strong>the</strong> three new variables are generated by P-<strong>STAT</strong>. If <strong>the</strong>re are ten variables in <strong>the</strong> file, <strong>the</strong> new variables<br />

are VAR11 (<strong>the</strong> square root of <strong>the</strong> variable named Q3), VAR12 (<strong>the</strong> square root of <strong>the</strong> variable named Q4) and<br />

VAR13 (<strong>the</strong> square root of <strong>the</strong> variable named Q5). The same format is used <strong>to</strong> generate a list of character<br />

variables:<br />

[ DO #K USING Q3 TO Q5;<br />

GENERATE ?:C = CHARACTER ( V(#K) ); ENDDO ]<br />

The ? is followed by “:C”. The length may be specified:<br />

GENERATE ?:C32<br />

A mask containing a prefix or suffix may be provided for <strong>the</strong> names being generated. The mask is enclosed<br />

in paren<strong>the</strong>ses and an ampersand (&) is used <strong>to</strong> represent <strong>the</strong> name of <strong>the</strong> current DO loop variable:<br />

[ DO #K USING Q3 TO Q5;<br />

GENERATE V(#K) ( 'Sqrt.' & ) = SQRT ( V(#K) ); ENDDO ]<br />

The new variable names are composed of <strong>the</strong> prefix “Sqrt.” followed by one of <strong>the</strong> names of <strong>the</strong> variables in <strong>the</strong><br />

DO list — <strong>the</strong> variable currently in <strong>the</strong> DO loop. Since <strong>the</strong> names of <strong>the</strong> variables in <strong>the</strong> DO list are Q3, Q4 and<br />

Q5, <strong>the</strong> names for <strong>the</strong> new variables are “Sqrt.Q3”, “Sqrt.Q4” and “Sqrt.Q5”. A suffix is created by moving <strong>the</strong><br />

“&” in <strong>the</strong> mask so that it precedes <strong>the</strong> string.<br />

[ DO #K USING Q3 TO Q5;<br />

GENERATE V(#K) ( & '.Sqrt' ) = SQRT ( V(#K) );<br />

ENDDO ]<br />

This creates new names “Q3.Sqrt”, “Q4.Sqrt” and “Q5.Sqrt”.


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.9<br />

If <strong>the</strong> new name is longer than 16 characters, <strong>the</strong> prefix or suffix is left intact and <strong>the</strong> current variable name<br />

is truncated. This may cause an error due <strong>to</strong> a repeated name.<br />

When GENERATE is used in a DO loop, <strong>the</strong> loop can have no o<strong>the</strong>r statements; i.e., it can have only DO,<br />

GENERATE and ENDDO.<br />

5.7 Using RENAME in DO Loops<br />

In <strong>the</strong> simplest form of RENAME, like <strong>the</strong> GENERATE illustrated above, all <strong>the</strong> renamed variables have <strong>the</strong><br />

specified prefix (or suffix) in <strong>the</strong>ir names:<br />

[ DO #J USING Item1 TO Item10; RENAME V(#J) ( 'Test.' & ); ENDDO]<br />

Here, variables Item1 through Item10 are renamed by prefixing <strong>the</strong>ir names with “Test.”. The variable previously<br />

named “Item1” is renamed “Test.Item1”, “Item2” is renamed “Test.Item2”, and so on. If <strong>the</strong> prefix plus <strong>the</strong> original<br />

name contains more than 16 characters, <strong>the</strong> entire prefix is used and characters are removed from <strong>the</strong> end of<br />

<strong>the</strong> original name until a 16-character name results.<br />

The format for RENAME within a DO loop is <strong>the</strong> following:<br />

1. RENAME<br />

2. a V(#J) usage. This identifies <strong>the</strong> variable <strong>to</strong> be renamed. It also provides its current name <strong>to</strong> <strong>the</strong><br />

mask.<br />

3. a mask in paren<strong>the</strong>ses which contains strings in quotes <strong>to</strong> be used exactly as entered. It also contains<br />

special characters such as <strong>the</strong> “&” which are used <strong>to</strong> select or omit letters from <strong>the</strong> input label and <strong>to</strong><br />

supply numbers using <strong>the</strong> DO loop scratch variable.<br />

4. a semicolon, ending <strong>the</strong> statement.<br />

This is an example of a simple mask:<br />

[ DO #j=21,35; RENAME V(#j) (XOOXX); ENDDO ]<br />

Here, a mask of (XOOXX) is supplied. The initial X says use <strong>the</strong> first input character, <strong>the</strong> OO says omit <strong>the</strong> next<br />

two characters, and <strong>the</strong> XX says use <strong>the</strong> next two (characters 4 and 5). This mask would rename VAR31 in<strong>to</strong><br />

V31.<br />

[ DO #n USING pre? ;<br />

RENAME V(#n) ( 'test' OOO & );<br />

ENDDO]<br />

In this loop, each name that starts with 'pre' is renamed. Each new name begins with 'test'. The first 3 characters<br />

of each old name, which are known <strong>to</strong> be 'pre', are bypassed (indicated by ooo), and <strong>the</strong> rest of <strong>the</strong> old name (indicated<br />

by &) is copied in<strong>to</strong> <strong>the</strong> new name area after 'test'. We are replacing 3 characters with 4. The & opera<strong>to</strong>r<br />

truncates if needed, so if a name started with 16 characters, it would get 'test' followed by characters 4 <strong>to</strong> 15. The<br />

OOO opera<strong>to</strong>r caused <strong>the</strong> & opera<strong>to</strong>r <strong>to</strong> start with character 4.<br />

When RENAME is used in a DO loop, <strong>the</strong> loop can have no o<strong>the</strong>r statements; ie.e, it can have only DO, RE-<br />

NAME and ENDDO.<br />

5.8 Masks for RENAME and GENERATE<br />

A mask is used <strong>to</strong> create a name for a variable, ei<strong>the</strong>r by modifying <strong>the</strong> ? or V(#J) name preceding it or by<br />

creating a <strong>to</strong>tally different name. The mask activity begins with a pointer on <strong>the</strong> initial character of <strong>the</strong> input name.<br />

The pointer is moved on<strong>to</strong> <strong>the</strong> next character after each usage of X, O, c or C. Fur<strong>the</strong>r use of X-O-c-C is ignored<br />

when <strong>the</strong> pointer is beyond <strong>the</strong> final input character.


5.10 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

__________________________________________________________________________<br />

Figure 5.6 Rename Examples<br />

FILE myfile<br />

Id Item1 Item2 Item3 Item4 Item5<br />

1 1 2 3 4 5<br />

LIST Myfile [ DO #j USING Item2 TO Item5; Select <strong>the</strong> 1st<br />

RENAME v(#j) ( XOOOX ) ; ENDDO ] $ and 5th characters<br />

Id Item1 I2 I3 I4 I5<br />

1 1 2 3 4 5<br />

LIST Myfile [ DO #J USING Item2 TO Item5; Provide a prefix<br />

RENAME V(#j) ( 'Q.' d ); ENDDO; ] $ and use <strong>the</strong> DO<br />

loop number (d)<br />

Id Item1 Q.3 Q.4 Q.5 Q.6<br />

1 1 2 3 4 5<br />

LIST Myfile [ DO #J USING Item2 TO Item5; Provide a prefix<br />

RENAME V(#j) ( 'Question.' n ); ENDDO; ] $ and use <strong>the</strong> DO<br />

loop counter (n)<br />

Question Question Question Question<br />

Id Item1 .1 .2 .3 .4<br />

1 1 2 3 4 5<br />

LIST Myfile [ DO #J USING Item2 TO Item5; Use <strong>the</strong> original<br />

RENAME V(#j) ( & '.' d ); ENDDO; ] $ name and <strong>the</strong> DO<br />

loop number.<br />

Item2 Item3 Item4 Item5<br />

Id Item1 .3 .4 .5 .6<br />

1 1 2 3 4 5<br />

LIST Myfile [ DO #J USING Item2 TO Item5; Use 'Q.', <strong>the</strong><br />

RENAME V(#j) ( 'Q.' & '.' n ); ENDDO; ] $ original name,<br />

'.' and <strong>the</strong> DO<br />

Q. Q. Q. Q. loop counter<br />

Item2 Item3 Item4 Item5<br />

Id Item1 .1 .2 .3 .4<br />

1 1 2 3 4 5<br />

__________________________________________________________________________


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.11<br />

1. x or X takes <strong>the</strong> current input character, if usable.<br />

2. o or O omits <strong>the</strong> current input character. NOTE: <strong>the</strong> digit '0' is also usable.<br />

3. c takes <strong>the</strong> current input character, if usable, and, if it is a letter, puts it in<strong>to</strong> lower case. C takes <strong>the</strong><br />

current input character, if usable, and, if it is a letter, puts it in<strong>to</strong> upper case.<br />

4. & takes all remaining usable input characters that can fit, starting at <strong>the</strong> current location of <strong>the</strong><br />

pointer.<br />

5. @4 places <strong>the</strong> pointer on<strong>to</strong> <strong>the</strong> 4th input character. (@4 xxx) and (ooo xxx) are identical.<br />

6. @-5 places <strong>the</strong> pointer on <strong>the</strong> 5th character from <strong>the</strong> right hand end.<br />

A character that has been <strong>the</strong> subject of any of <strong>the</strong> X-O-c-C-& opera<strong>to</strong>rs is no longer usable by any subsequent<br />

opera<strong>to</strong>r. As a result, ( @-4 OOOO @1 & 'post' ) could be used <strong>to</strong> inactivate <strong>the</strong> rightmost 4 characters, take <strong>the</strong><br />

rest, and add 'post' <strong>to</strong> it. In o<strong>the</strong>r words, <strong>the</strong> mask has replaced an existing 4-character suffix with a new one.<br />

Blanks are ignored in masks, which can markedly improve readability. For example:<br />

(XXXXXXXXX) and<br />

(XXX XXX XXX) are identical,<br />

O<strong>the</strong>r features of <strong>the</strong> mask are:<br />

1. 'ab.cde' moves <strong>the</strong> string contents in<strong>to</strong> <strong>the</strong> new name.<br />

2. D or d inserts <strong>the</strong> V subscript. This is based on <strong>the</strong> current value of <strong>the</strong> DO scratch variable. If<br />

V(#j) is used, 17 is inserted when #j=17. If V(#j+10) is used, 27 is inserted when #j=17. DD is like<br />

D, but forces 2 characters; 07 is used instead of 7. If DDD is used, three numbers are inserted in <strong>the</strong><br />

new name and a 7 bonds <strong>to</strong> <strong>the</strong> new label as 007.<br />

3. N or n inserts <strong>the</strong> current iteration count of <strong>the</strong> DO loop. If this is <strong>the</strong> third trip through <strong>the</strong> loop, 3<br />

is inserted. NN provides 2 digits, NNN 3 digits. You do not have <strong>to</strong> use a counter scratch variable<br />

in <strong>the</strong> DO statement in order <strong>to</strong> use 'N'.<br />

The following are some examples of a name, a mask and <strong>the</strong> resulting labels<br />

Suppose we have:<br />

current name mask result<br />

abcdefg (xx @-4 xoxx ) abdfg<br />

abcdeF (Cccc @-4 cccc ) Abcdef<br />

abc12345def ('Var' @4 xxxxx) Var12345<br />

abc12345def ('Var' @4 & ) Var12345def<br />

abcd (@-6 xxxxxx ) abcd<br />

[ DO #j=11,13; RENAME v(#j) (a mask); ENDDO ]<br />

When #j is 12, meaning its <strong>the</strong> second iteration, and <strong>the</strong> name of v(12) is “ abcdef”, <strong>the</strong> masks behave as follows:<br />

abcdef (“item.” NN ) item.02<br />

abcdef ('PreTest.' DDD) PreTest.012<br />

abcdef ( xxx '.' D ) abc.12<br />

Figure 5.6 contains 5 examples of RENAME masks and illustrates <strong>the</strong> use of 'X' and 'O', text strings, <strong>the</strong> original<br />

variable name and both of <strong>the</strong> DO scratch variable.<br />

Figure 5.7 illustrates <strong>the</strong> difference between <strong>the</strong> use of ? and V(#scratch) in <strong>the</strong> DO LOOP GENERATE. File<br />

work has four variables. The ? uses <strong>the</strong> generated labels as <strong>the</strong> labels on which <strong>to</strong> base any changes. In <strong>the</strong> first<br />

example in Figure 5.7 <strong>the</strong>se labels are VAR5 through VAR8. When a scratch variable is used, <strong>the</strong> labels provided<br />

<strong>to</strong> <strong>the</strong> mask are <strong>the</strong> current DO loop variables. In <strong>the</strong> second example in Figure 5.7 <strong>the</strong>se variables are V1 through


5.12 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

V4. The only reason for ever using <strong>the</strong> V vec<strong>to</strong>r and a scratch variable in a RENAME or GENERATE is <strong>to</strong> use<br />

some of <strong>the</strong> characters found in <strong>the</strong> original variable name.<br />

__________________________________________________________________________<br />

Figure 5.7 GENERATE: Generated Versus Original<br />

LIST work [ DO #j = 1, 4;<br />

GEN ? ( 'New.' & );<br />

ENDDO ] $<br />

New New New New<br />

V1 V2 V3 V4 VAR5 VAR6 VAR7 VAR8<br />

1 2 3 4 - - - -<br />

LIST work [ DO #j = 1, 4;<br />

GEN v(#j) ( 'New.' & );<br />

ENDDO ] $<br />

New New New New<br />

V1 V2 V3 V4 V1 V2 V3 V4<br />

1 2 3 4 - - - -<br />

__________________________________________________________________________<br />

DO #J #N USING Pre?;<br />

SET PRE?(#N) = ..... ;<br />

ENDDO;<br />

If <strong>the</strong>re are 5 variables beginning with “pre”, <strong>the</strong> loop will be exercised 5 times and <strong>the</strong> scratch variable #N will<br />

take on <strong>the</strong> values 1, 2, 3, 4, and 5. Here, using V(#J) is <strong>the</strong> same as using PRE?(#N).<br />

__________________________________________________________________________<br />

Figure 5.8 Dynamic Array, Wildcard, Prefix and GENERATE<br />

LIST Tests<br />

[ DO #P #N USING pre?;<br />

GEN ? ( 'Diff.' n ) = post?(#N) - pre?(#N) ;<br />

ENDDO ] $<br />

pre post pre post pre post Diff Diff Diff<br />

.1 .1 .2 .2 .3 .3 .1 .2 .3<br />

68 75 92 94 89 88 7 2 -1<br />

73 73 84 93 85 89 0 9 4<br />

78 79 72 80 73 75 1 8 2<br />

__________________________________________________________________________<br />

When variables have a common prefix, <strong>the</strong> combination of <strong>the</strong> DO and a dynamic vec<strong>to</strong>r created using a wildcard<br />

can be a powerful <strong>to</strong>ol. A dynamic vec<strong>to</strong>r is created any time that a wildcard is used in a variable name list.


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.13<br />

Age Sex Pre.1 Post.1 Pre.2 Post.2 Pre.3 Post.3 Aptitude<br />

Given <strong>the</strong>se variable names, <strong>the</strong> use of<br />

Pre?<br />

creates a dynamic vec<strong>to</strong>r containing <strong>the</strong> three variables in <strong>the</strong> list which begin with <strong>the</strong> characters “pre”. These<br />

variables can now be referenced in <strong>the</strong> same way that <strong>the</strong> variables in <strong>the</strong> V vec<strong>to</strong>r are referenced:<br />

__________________________________________________________________________<br />

Figure 5.9 Complex MASK: Generate Variable Names<br />

Given a file with variables “Variable01” “Variable02” and “Variable03”<br />

DO #j = 2, 3;<br />

GEN v(#j)<br />

( C D @3c 'mMm' @-4C NN o & 'zz' dd )= v(#j);<br />

ENDDO;<br />

The variable name created from input variable “Variable02” is:<br />

V2rmMmL0102zz02<br />

The mask C produces V upper-case letter from input<br />

D produces 2 <strong>the</strong> value of #J in <strong>the</strong> DO loop<br />

@3c produces r 3rd character of input in lower case<br />

'mMm' add mMm strings enclosed in quotes are added<br />

@-4Ce produces L 4th character from <strong>the</strong> endof <strong>the</strong><br />

input variable name, upper case<br />

NN adds 01 iteration count in <strong>the</strong> DO <strong>to</strong> 2 places<br />

o omit <strong>the</strong> next letter in input variable name: skip <strong>the</strong> “e”<br />

& include 02 use <strong>the</strong> rest of <strong>the</strong> input variable name<br />

'zz' adds zz mask can have multiple strings<br />

dd adds 02 same as DD <strong>the</strong> value of #J <strong>to</strong> 2 places<br />

Spaces are used in masks <strong>to</strong> make <strong>the</strong>m easier <strong>to</strong> follow. They are not<br />

required:<br />

(CD@3c'mMm'@-4CNNo&'zz'dd)<br />

The characters in <strong>the</strong> new name associated with scratch variable #J are;<br />

D NN dd<br />

V 2 rmMmL 01 02zz 02<br />

The characters in <strong>the</strong> new name taken from <strong>the</strong> original name are:<br />

C @3c @-4C o&<br />

V2 r mMm L 01 02 zz02<br />

Character strings added <strong>to</strong> <strong>the</strong> new name<br />

V2r mMm L0102 zz 02<br />

___________________________________________________________________________


5.14 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

Figure 5.8 contains <strong>the</strong> command and <strong>the</strong> resulting output file for a DO/GENERATE using dynamic arrays.<br />

The mask for <strong>the</strong> variable names ( 'Diff' n ) asks for names beginning with <strong>the</strong> string “Diff.” followed by <strong>the</strong> DO<br />

loop counter “n”<br />

Figure 5.9 illustrates a complex rename which uses all <strong>the</strong> possible mask codes. The variable named<br />

“Variable02” in <strong>the</strong> input file is used in <strong>the</strong> creation of a new variable “V2rl02mmm01zz02'. The mask used<br />

in this example is: “(CD@3c@-4co&’mmm’NN’zz’dd )”<br />

5.9 IF-THEN-ELSE BLOCKS<br />

Figure 5.10 contains an example of a series of IF/SET statements contrasted with a complex RECODE. These<br />

same IF/SET statements could have been written as an IF-THEN-ELSE block.<br />

The IF-THEN-ELSE block makes <strong>the</strong> logic easier <strong>to</strong> follow when <strong>the</strong>re is more than a single condition. The IF-<br />

THEN-ELSE block has additional advantages because <strong>the</strong>re can be any number of <strong>PPL</strong> statements and actions in<br />

<strong>the</strong> block, including nested IF-THEN-ELSE blocks and DO loops.<br />

__________________________________________________________________________<br />

Figure 5.10 IF or IF-THEN-ELSE<br />

MODIFY aaa<br />

[ GENERATE sec<strong>to</strong>r = .m1. ;<br />

IF age good and region good, SET sec<strong>to</strong>r = .m2.;<br />

IF age lt 30 and region EQ 'east' SET sec<strong>to</strong>r = 1;<br />

IF age lt 30 and region EQ 'central' SET sec<strong>to</strong>r = 2;<br />

IF age lt 30 and region EQ 'west' SET sec<strong>to</strong>r = 3;<br />

IF age ge 30 and region EQ 'east' SET sec<strong>to</strong>r = 4;<br />

IF age ge 30 and region EQ 'central' SET sec<strong>to</strong>r = 5;<br />

IF age ge 30 and region EQ 'west' SET sec<strong>to</strong>r = 6;<br />

], OUT bbb $<br />

MODIFY aaa<br />

[ GENERATE sec<strong>to</strong>r = .m1. ;<br />

IF age good and region good, SET sec<strong>to</strong>r = .m2.;<br />

IF age lt 30 THEN;<br />

IF region EQ 'east' SET sec<strong>to</strong>r = 1;<br />

IF region EQ 'central' SET sec<strong>to</strong>r = 2;<br />

IF region EQ 'west' SET sec<strong>to</strong>r = 3;<br />

F.ELSE;<br />

IF region EQ 'east' SET sec<strong>to</strong>r = 4;<br />

IF region EQ 'central' SET sec<strong>to</strong>r = 5;<br />

IF region EQ 'west' SET sec<strong>to</strong>r = 6;<br />

ENDIF ], out bbb $<br />

__________________________________________________________________________<br />

IF-THEN-ELSE-ENDIF blocks can be nested 9 deep. They can occur within a DO loop, as long as <strong>the</strong> block<br />

is ENTIRELY within <strong>the</strong> DO loop. The block begins with an IF statement. The IF statement begins with IF. It<br />

can also have OR and AND. A THEN ends <strong>the</strong> statement. The THEN, just like a consequence in a simple IF<br />

statement can be preceded with FMT qualification, like


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.15<br />

For example:<br />

M.THEN<br />

LIST X [ GEN Newvar = 99;<br />

IF Age GE 21 OR age EQ 19,<br />

THEN;<br />

SET oldvar = 22 ;<br />

SET abcdef = 33 ;<br />

ELSE;<br />

SET xyz = 44;<br />

ENDIF ] $<br />

The ELSE section is executed whenever <strong>the</strong> section before <strong>the</strong> ELSE is not executed. I.e., given THEN, <strong>the</strong><br />

statements before ELSE are executed when <strong>the</strong> IF is true, and <strong>the</strong> statements after <strong>the</strong> ELSE are executed when<br />

<strong>the</strong> IF is false or missing. Using FM.THEN would reverse this and <strong>the</strong> statements following <strong>the</strong> FM.THEN will<br />

be executed when <strong>the</strong> result of <strong>the</strong> IF was ei<strong>the</strong>r false or missing while <strong>the</strong> statements following <strong>the</strong> ELSE will be<br />

executed whenever <strong>the</strong> result of <strong>the</strong> IF is true. In o<strong>the</strong>r words <strong>the</strong> previous is example is exactly <strong>the</strong> same as:<br />

LIST X [ GEN Newvar = 99;<br />

IF Age GE 21 OR age EQ 19,<br />

FM.THEN;<br />

SET xyz = 44;<br />

ELSE;<br />

SET oldvar = 22 ;<br />

SET abcdef = 33 ;<br />

ENDIF ] $<br />

5.10 IF-THEN-ELSE: O<strong>the</strong>r Features.<br />

F.ELSE and M.ELSE sections can be used <strong>to</strong> provide greater control. These allow a true 3-way logic in <strong>the</strong><br />

IF-THEN blocks. T.ELSE is also allowed; however it is only useful when M.THEN or F.THEN begins <strong>the</strong> block.<br />

Alternate names TELSE, FELSE and MELSE are recognized.<br />

There are some restrictions. GENERATE cannot be used within an IF block. The ELSE section, if used, must<br />

be <strong>the</strong> last section.<br />

Figure 5.11 illustrates <strong>the</strong> use of an IF-THEN block with F.ELSE and M.ELSE. The example illustrates a<br />

way of estimating a missing value from <strong>the</strong> mean of previous values. This is sometimes referred <strong>to</strong> as a hot deck<br />

approach. The results change with <strong>the</strong> data. For purposes of this example, it was decided <strong>to</strong> use <strong>the</strong> average of<br />

<strong>the</strong> previous 10 non-missing values as <strong>the</strong> substitute value. These values are s<strong>to</strong>red in <strong>the</strong> first 10 locations of <strong>the</strong><br />

P vec<strong>to</strong>r.<br />

Because we have chosen <strong>to</strong> use previous values, <strong>the</strong>re is a problem if any of <strong>the</strong> first ten cases has a missing<br />

value. Once ten non-missing cases have been read, <strong>the</strong> problem disappears. In this example we have decided <strong>to</strong><br />

use whatever information is available. Given <strong>the</strong> following data values for variable Age:<br />

33 9 15 20 73 - 44 23 18 54 62 29 - 50 82 19 - 29 39<br />

we use <strong>the</strong> 5 values prece<br />

ding <strong>the</strong> first missing value <strong>to</strong> produce a result value of 35. If <strong>the</strong>re were no good values available, <strong>the</strong> result would<br />

be set <strong>to</strong> missing type 3.<br />

When <strong>the</strong> first case in <strong>the</strong> file is processed we set <strong>the</strong> 10 locations in <strong>the</strong> permanent vec<strong>to</strong>r that we are going<br />

<strong>to</strong> use <strong>to</strong> -1. This permits us <strong>to</strong> test for positive values when we calculate <strong>the</strong> substitute value.<br />

MODIFY Ages [ IF FIRST ( .FILE. )<br />

THEN;<br />

DO #j = 1, 10 ;


5.16 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

SET P(#j) = -1;<br />

ENDDO;<br />

ENDIF;<br />

The 3-way logic is determined in <strong>the</strong> Age LT 10 test:<br />

1. True if Age is non-missing and less than 10. This is considered an error and a simple<br />

report with <strong>the</strong> case number is written.<br />

IF Age LT 10<br />

THEN;<br />

PUT .N. >;<br />

GO TO NEXT;<br />

2. False if Age is non-missing and greater or equal <strong>to</strong> 10. The next location in <strong>the</strong> P vec<strong>to</strong>r is calculated<br />

and that value is replaced by <strong>the</strong> current good value on Age. Thus <strong>the</strong> contents of <strong>the</strong> P vec<strong>to</strong>r continually<br />

change as good values are processed. When 10 good values have been found, <strong>the</strong>re are no<br />

longer any negative numbers (-1) remaining.<br />

F.ELSE;<br />

IF ##Ploc EQ 10, SET ##Ploc = 0;<br />

__________________________________________________________________________<br />

Figure 5.11 IF-THEN with F.ELSE and M.ELSE in a Simple Hot Deck Example<br />

GEN ##Ploc = 10, GEN ##Total = 0, GEN ##N=0 $<br />

MODIFY Ages [ IF FIRST ( .FILE. )<br />

THEN;<br />

DO #j = 1, 10 ;<br />

SET P(#j) = -1;<br />

ENDDO;<br />

ENDIF;<br />

IF Age LT 10<br />

THEN;<br />

PUT .N. >;<br />

GO TO NEXT;<br />

F.ELSE;<br />

IF ##Ploc EQ 10, SET ##Ploc = 0;<br />

INCREASE ##Ploc;<br />

SET P(##Ploc) = Age;<br />

GO TO NEXT;<br />

M.ELSE;<br />

SET ##Total = 0, SET ##N = 0;<br />

DO #J = 1, 10;<br />

IF ( P(#j) LT 0 ) EXITDO;<br />

/* increase count of good P values */<br />

INCREASE ##N;


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.17<br />

/* increase <strong>to</strong>tals of good P values */<br />

INCREASE ##Total BY P(#j);<br />

ENDDO;<br />

IF ##N GT 0 THEN;<br />

SET Age = INT ( ##Total / ##N ) ;<br />

PUT .N. > Age<br />

>;<br />

ELSE;<br />

SET Age = .M3.;<br />

ENDIF;<br />

ENDIF;<br />

NEXT: ], OUT Ages $<br />

__________________________________________________________________________<br />

INCREASE ##Ploc;<br />

SET P(##Ploc) = Age;<br />

GO TO NEXT;<br />

3. Missing if Age is unknown. The current contents of <strong>the</strong> P vec<strong>to</strong>r are <strong>to</strong>taled and <strong>the</strong> average is calculated.<br />

This average is based on <strong>the</strong> number of good values currently available in <strong>the</strong> P vec<strong>to</strong>r. This<br />

average is substituted for Age in <strong>the</strong> file and a report is printed giving <strong>the</strong> case number and <strong>the</strong> new<br />

value.<br />

M.ELSE;<br />

SET ##Total = 0, SET ##N = 0;<br />

A DO loop is used <strong>to</strong> examine <strong>the</strong> 10 values currently in <strong>the</strong> P vec<strong>to</strong>r. If a negative number is found<br />

<strong>the</strong> P vec<strong>to</strong>r is not yet s<strong>to</strong>cked with <strong>the</strong> full complement of 10 values and we can exit from <strong>the</strong> DO<br />

loop with ##N set <strong>to</strong> <strong>the</strong> current number of good values and ## Total <strong>to</strong> <strong>the</strong> sum of those values.<br />

DO #J = 1, 10;<br />

IF ( P(#j) LT 0 ) EXITDO;<br />

INCREASE ##N;<br />

INCREASE ##Total BY P(#j);<br />

ENDDO;<br />

If <strong>the</strong>re is at least 1 good P value we can now calculate an average, set Age <strong>to</strong> that value and write<br />

<strong>the</strong> appropriate information in <strong>the</strong> report.<br />

IF ##N GT 0 THEN;<br />

SET Age = INT ( ##Total / ##N ) ;<br />

PUT .N. > Age<br />

>;<br />

If <strong>the</strong>re have been no good values as this case is processed, it is set <strong>to</strong> missing 3.<br />

ELSE;<br />

SET Age = .M3.;<br />

Given <strong>the</strong> following 19 values for variable Age:<br />

33 9 15 20 73 - 44 23 18 54 62 29 - 50 82 19 - 29 39<br />

<strong>the</strong> output file contains:<br />

33 9 15 20 73 35 44 23 18 54 62 29 37 50 82 19 45 29 39


5.18 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

And <strong>the</strong> report is:<br />

Case 2 is <strong>to</strong>o young.<br />

Case 6 given 35 as <strong>the</strong> current hot deck value of Age.<br />

Case 13 given 37 as <strong>the</strong> current hot deck value of Age.<br />

Case 17 given 45 as <strong>the</strong> current hot deck value of Age.<br />

In both Figures 5.11 and 5.13, <strong>the</strong> scratch variables that are needed are generated before <strong>the</strong> command<br />

in which <strong>the</strong>y are used. If all <strong>the</strong> scratch variables are predefined, <strong>the</strong>re is no need <strong>to</strong> worry about <strong>the</strong> restrictions<br />

on generating variables in ei<strong>the</strong>r an IF-THEN-ELSE block or a DO.<br />

5.11 IF-THEN-ELSE: Ano<strong>the</strong>r Example<br />

Figure 5.12 contains a small data set and <strong>the</strong> resulting report. Figure 5.13 contains <strong>the</strong> commands which produced<br />

<strong>the</strong> report. The commands contain IF-THEN-ELSE blocks within IF-THEN-ELSE blocks as well as a DO loop<br />

inside <strong>the</strong> blocks. The data set mimics a survey in which <strong>the</strong> respondents were asked about <strong>the</strong>ir computer hardware<br />

and software. The software questions Appl.1 through Appl.3 coded 1 for an edi<strong>to</strong>r or report writer, 2 for a<br />

database and 3 for an analysis or statistics program. The character variables Wappl.1, Wappl.2 and Wappl.3 are<br />

character variables containing <strong>the</strong> name of <strong>the</strong> program associated with <strong>the</strong> usage in <strong>the</strong> Appl? questions.<br />

__________________________________________________________________________<br />

Figure 5.12 IF-THEN-ELSE: The Data and <strong>the</strong> Report<br />

File Compute<br />

Appl Wappl Appl Wappl Appl Wappl<br />

OS Chip .1 .1 .2 .2 .3 .3<br />

DOS 386 1 Word Perfect 2 Dbase III -<br />

MVS 386 2 Excel 3 P-<strong>STAT</strong> 1 Kedit<br />

Unix Spark 1 P-<strong>STAT</strong> 2 Informix 3 P-<strong>STAT</strong><br />

The Report<br />

error on case 2<br />

PC users 1<br />

Unix users 1<br />

P-<strong>STAT</strong> users 1<br />

P-<strong>STAT</strong> usages 2<br />

__________________________________________________________________________<br />

The purpose of this possibly daunting example is <strong>to</strong> show <strong>the</strong> generality of use of IF-THEN-ELSE blocks and<br />

DO loops. When we say 'if #Puse EQ 1' within <strong>the</strong> DO loop we have:<br />

1. a simple IF<br />

2. within an IF-THEN block (it has no ELSE)<br />

3. within a DO loop<br />

4. within an IF-THEN-ELSE block<br />

5. within an IF-THEN-ELSE block.<br />

The report contains a counter of <strong>the</strong> number of people using a PC operating system or a Unix operating system,<br />

using P-<strong>STAT</strong> for any single purpose and a count of <strong>the</strong> <strong>to</strong>tal times that P-<strong>STAT</strong> was cited. Before any <strong>to</strong>tals


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.19<br />

are accumulated, <strong>the</strong> data are check for validity and if <strong>the</strong> answers seem inappropriate <strong>the</strong> information in <strong>the</strong> case<br />

is not used in <strong>the</strong> report.<br />

The first step in generating <strong>the</strong> report is <strong>to</strong> set up a series of scratch variables. This is done as stand-alone <strong>PPL</strong>.<br />

GEN ##Error = 0, GEN ##Numuse = 0, GEN ##PCuse = 0,<br />

GEN ##Users = 0, GEN ##Aps = 0 $<br />

It could instead be included at <strong>the</strong> beginning of <strong>the</strong> PROCESS command:<br />

PROCESS Computers [ IF FIRST ( .FILE. ),GEN ##Error = 0,<br />

GEN ##Numuse = 0, GEN ##PCuse = 0, GEN ##Users = 0, GEN ##Aps = 0;<br />

After generating a single temporary scratch variable, <strong>the</strong> first step in <strong>the</strong> PROCESS command in Figure 5.13<br />

is <strong>to</strong> set up <strong>the</strong> major IF-THEN-ELSE block.<br />

IF OS MATCHES ' ( DOS | Windows | NT | OS/2 ) * '<br />

THEN;<br />

The MATCHES function is described in detail in <strong>the</strong> chapter “<strong>PPL</strong>: Modification of Character Variables”. Here<br />

it is used <strong>to</strong> see if <strong>the</strong> operating system is any of <strong>the</strong> common operating systems for Intel Chip machines. If <strong>the</strong> IF<br />

is true, a second IF-THEN-ELSE block is used <strong>to</strong> see if <strong>the</strong> computer chip is one of <strong>the</strong> Intel chips:<br />

__________________________________________________________________________<br />

Figure 5.13 IF-THEN-ELSE Block with Nested IF and a DO Loop<br />

GEN ##Error = 0, GEN ##Numuse = 0, GEN ##PCuse = 0,<br />

GEN ##Users = 0, GEN ##Aps = 0 $<br />

PROCESS Compute<br />

[ GEN #Puse = 0; IF OS MISSING OR Chip MISSING GOTO Err;<br />

IF OS MATCHES ' ( DOS | Windows | NT | OS/2 ) * '<br />

THEN;<br />

if Chip AMONG ( '286' '386' '486' 'Pentium' )<br />

<strong>the</strong>n;<br />

INCREASE ##PCuse;<br />

ELSE;<br />

else;<br />

PUT Chip > OS<br />

> .n. ;<br />

SET ##Error = 1;<br />

GO TO Err;<br />

endif;<br />

if OS NE 'UNIX' THEN;<br />

SET ##Error = 2;<br />

GO TO Err;<br />

else;<br />

INCREASE ##Users;<br />

DO #AP #N USING Appl?;<br />

IF V(#AP) MISSING, NEXTDO;


5.20 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

IF Wappl?(#N) AMONG ( 'P-<strong>STAT</strong>' 'P<strong>STAT</strong>' ) THEN;<br />

INCREASE ##Aps, INCREASE #Puse;<br />

IF #Puse EQ 1, INCREASE ##Numuse;<br />

ENDIF;<br />

ENDDO;<br />

endif;<br />

ENDIF;<br />

GO TO Report;<br />

Err: PUT @SKIP2 .n. @NEXT ; GO TO Next;<br />

Report: IF LAST ( .FILE. )<br />

PUT @20 ##PCuse @next<br />

> @20 ##Users @next<br />

> @20 ##Numuse @next<br />

> @20 #Puse;<br />

Next: ] $<br />

__________________________________________________________________________<br />

if Chip AMONG ( '286' '386' '486' 'Pentium' )<br />

<strong>the</strong>n;<br />

INCREASE ##PCuse;<br />

If <strong>the</strong> replies <strong>to</strong> <strong>the</strong> questions about <strong>the</strong> operating system and <strong>the</strong> chip agree, <strong>the</strong> scratch variable ##PCuse is increased.<br />

If it is false <strong>the</strong>re is a possible error indicated by <strong>the</strong> PUT statement and a branch <strong>to</strong> <strong>the</strong> statement labelled<br />

“Err”.<br />

else;<br />

PUT Chip > OS<br />

> .n. ;<br />

SET ##Error = 1;<br />

GO TO Err;<br />

endif;<br />

The endif completes <strong>the</strong> nested IF-THEN-ELSE block and also <strong>the</strong> THEN portion of <strong>the</strong> major block.<br />

If <strong>the</strong> first IF-THEN is false and we have a computer that appears <strong>to</strong> be running an operating system o<strong>the</strong>r<br />

than <strong>the</strong> standard PC operating systems we will now process <strong>the</strong> “ELSE”.<br />

ELSE;<br />

if OS NE 'UNIX' THEN;<br />

SET ##Error = 2;<br />

GO TO Err;<br />

This starts a nested IF-THEN-ELSE block <strong>to</strong> eliminate and print an error report for any respondents who, like<br />

<strong>the</strong> second case in <strong>the</strong> data in Figure 5.12, are not using <strong>the</strong> UNIX operating system. The final section of <strong>the</strong> command<br />

is used <strong>to</strong> examine <strong>the</strong> applications for all cases like <strong>the</strong> third case in Figure 5.12 who are running UNIX.<br />

INCREASE ##Users;<br />

DO #AP #N USING Appl?;<br />

IF V(#AP) MISSING, NEXTDO;<br />

IF Wappl?(#N) AMONG ( 'P-<strong>STAT</strong>' 'P<strong>STAT</strong>' ) THEN;<br />

INCREASE ##Aps, INCREASE #Puse;<br />

IF #Puse EQ 1, INCREASE ##Numuse;<br />

ENDIF;<br />

ENDDO;


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.21<br />

The scratch variable ##Users, our counter of UNIX users is immediately incremented. Next a DO is used <strong>to</strong> examine<br />

<strong>the</strong> list of applications and increment <strong>the</strong> remaining counters that are needed for <strong>the</strong> report.<br />

DO #AP #N USING Appl?;<br />

Two scratch variables are created by <strong>the</strong> DO. #AP takes on <strong>the</strong> values of <strong>the</strong> positions of any variable beginning<br />

with <strong>the</strong> characters “Appl”. Thus <strong>the</strong> first time through <strong>the</strong> loop #AP = 3. The second time #AP = 5. The final<br />

time #AP = 7.<br />

IF V(#AP) MISSING, NEXTDO;<br />

If <strong>the</strong> value of <strong>the</strong> application variable is missing, <strong>the</strong> remaining steps in <strong>the</strong> DO are bypassed. If <strong>the</strong>re are no more<br />

loop iterations <strong>to</strong> be done control moves past <strong>the</strong> ENDDO statement. If this is not missing, <strong>the</strong> next test is done<br />

<strong>to</strong> determine if P-<strong>STAT</strong> is <strong>the</strong> name given for <strong>the</strong> application:<br />

IF Wappl?(#N) AMONG ( 'P-<strong>STAT</strong>' 'P<strong>STAT</strong>' ) THEN;<br />

#N in <strong>the</strong> DO loop takes on <strong>the</strong> values 1, 2, and 3 as <strong>the</strong> loop progresses. The use of <strong>the</strong> wildcard <strong>to</strong> set up a dynamic<br />

vec<strong>to</strong>r results in Wappl?(#N) tests variable Wappl.1 when #N is a 1, and Wappl.2 when #N is a 2 and<br />

Wappl.3 when #N is a 3. The third case in <strong>the</strong> file contains:<br />

Unix Spark 1 P-<strong>STAT</strong> 2 Informix 3 P-<strong>STAT</strong><br />

The first time through <strong>the</strong> loop Wappl.1 has <strong>the</strong> value “P-<strong>STAT</strong>', <strong>the</strong>refore, <strong>the</strong> IF is true and <strong>the</strong> rest of <strong>the</strong> four<br />

line IF-ENDIF is executed.<br />

We increase <strong>the</strong> permanent scratch variable ##Aps, which is used for a <strong>to</strong>tal of all P-<strong>STAT</strong> applications, and also<br />

increase #Puse, a temporary scratch variable that is reset <strong>to</strong> 0 as each case starts. Thus when “P-<strong>STAT</strong> is found<br />

again in <strong>the</strong> third loop #Puse becomes 2 and we do not increase ##Numuse a second time for <strong>the</strong> same case.<br />

The work is now all done. It is only necessary <strong>to</strong> end <strong>the</strong> open IF blocks and write out <strong>the</strong> error messages and<br />

<strong>the</strong> reports:<br />

endif;<br />

ENDIF;<br />

GO TO Report;<br />

Err: PUT @SKIP2 .n. @NEXT ; GO TO Next;<br />

Report: IF LAST ( .FILE. )<br />

PUT @20 ##PCuse @next<br />

> @20 ##Users @next<br />

> @20 ##Numuse @next<br />

> @20 #Puse;<br />

Next: ] $


5.22 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

DO LOOPS and IF-THEN-ELSE BLOCKS<br />

DO #J USING vn TO vn;<br />

SUMMARY<br />

specifies a scratch variable and <strong>the</strong>n, after <strong>the</strong> USING, a list of variable names or positions. The list can<br />

also use TO, .ON. and wildcards like PRE? . The remaining <strong>PPL</strong> in <strong>the</strong> loop is executed once for each<br />

variable in <strong>the</strong> list. The scratch variable is set <strong>to</strong> <strong>the</strong> LOCATION of <strong>the</strong> current variable; <strong>the</strong>refore, <strong>the</strong><br />

scratch variable is different in each iteration.<br />

Thus, #L, in <strong>the</strong> loop below, is set <strong>to</strong> <strong>the</strong> location (not <strong>the</strong> value) of Test1 in <strong>the</strong> first iteration, and <strong>to</strong> <strong>the</strong><br />

location of Test10 in <strong>the</strong> last iteration.<br />

[ DO #L USING Test1 TO Test10 );<br />

SET V(#L) = SQRT ( V(#L) ); ENDDO ]<br />

[ DO #J USING v(1) Height v(5) TO v(7) );<br />

INCREASE V(#J); ENDDO ]<br />

The variables in <strong>the</strong> DO list can be tested <strong>to</strong> ensure that numeric operations are not preformed on character<br />

variables or visa-versa. The opera<strong>to</strong>rs CHARACTER, NUMERIC, MISSING and GOOD can be<br />

used.<br />

[ DO #QQ USING SS.Number <strong>to</strong> ZIP;<br />

IF V(#QQ) NUMERIC, NEXTDO;<br />

SET V(#QQ) = LEFT ( V(#QQ); ENDDO ]<br />

[ DO #I USING V(1) TO V(25) V(28);<br />

IF V(#I) CHARACTER OR V(#I) GOOD, NEXTO;<br />

SET V(#I) = .M3.; ENDDO ]<br />

There is no limit <strong>to</strong> <strong>the</strong> number of <strong>PPL</strong> instructions that may be included in a DO loop. DO's may include<br />

o<strong>the</strong>r DO loops and IF-THEN-ELSE blocks.<br />

DO #J = nn, nn, nn;<br />

EXITDO<br />

specifies a scratch variable and <strong>the</strong>n, after <strong>the</strong> '=', a start expression, an end expression and an optional<br />

stepsize expression. The scratch variable takes <strong>the</strong> values of <strong>the</strong> numbers from <strong>the</strong> start value through<br />

<strong>the</strong> end value as incremented by <strong>the</strong> stepsize. If <strong>the</strong> stepsize is not supplied, 1 is assumed. The scratch<br />

variable is usable in <strong>the</strong> <strong>PPL</strong> within <strong>the</strong> loop.<br />

[ DO #Vars = 1, 3 ;<br />

SET V(#Vars) = V(#Vars) / 12 ; ENDDO ]<br />

In this example, “#Vars” is <strong>the</strong> user-supplied scratch variable. It is used in <strong>the</strong> SET instruction as <strong>the</strong><br />

subscript of V, <strong>the</strong> vec<strong>to</strong>r of variables in <strong>the</strong> file. Each of <strong>the</strong> first 3 variables in <strong>the</strong> file has its value<br />

divided by 12.<br />

causes <strong>the</strong> DO loop <strong>to</strong> be exited immediately even if all <strong>the</strong> loop instructions have not been completed.<br />

vn=variable name nn=number exp=expression


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.23<br />

NEXTDO<br />

ENDDO<br />

causes a jump <strong>to</strong> <strong>the</strong> ENDDO statement where <strong>the</strong> DO loop counter is evaluated. If <strong>the</strong>re are no more<br />

iterations, <strong>the</strong> loop terminates.<br />

Defines <strong>the</strong> end of <strong>the</strong> DO loop domain. Then ENDDO is processed, <strong>the</strong> current DO value is evaluated.<br />

If <strong>the</strong> loop is not complete, <strong>the</strong> counters are incremented and <strong>the</strong> commands in <strong>the</strong> DO domain are executed<br />

with <strong>the</strong> new values.<br />

GENERATE Within a DO Loop<br />

A new variable can be generated in each iteration of a DO-GENERATE loop. When GENERATE is used<br />

in a DO loop, <strong>the</strong> loop can have no o<strong>the</strong>r statements; i.e., it can have only DO, GENERATE and ENDDO.<br />

The format of a GENERATE within a DO loop is one of <strong>the</strong> following:<br />

GENERATE ? = value;<br />

GENERATE ? (mask) = value;<br />

GENERATE V(#J) (mask) = value;<br />

GENERATE V(##K) (mask) = value;<br />

If <strong>the</strong> variables are character, <strong>the</strong> :C or :C20 or such directly follows <strong>the</strong> mask or, if <strong>the</strong>re is no mask, <strong>the</strong><br />

“?”. Masks are described below. The “= value” is optional; if not supplied, <strong>the</strong> variable is set <strong>to</strong> missing.<br />

If <strong>the</strong> file currently has 20 variables, <strong>the</strong> ? causes <strong>the</strong> name of VAR21 <strong>to</strong> be created. It can <strong>the</strong>n be<br />

masked. Use of V(#J) must be followed by a mask since that name already exists.<br />

RENAME Within a DO Loop<br />

A group of variables can be renamed in a DO loop. Each iteration renames a different variable. When<br />

RENAME is used in a DO loop, <strong>the</strong> loop can have no o<strong>the</strong>r statements; i.e., it can have only DO, RE-<br />

NAME and ENDDO.<br />

[ RENAME Social.S.Num TO SS.Number ]<br />

The format for RENAME within a DO loop is <strong>the</strong> following:<br />

1. RENAME<br />

2. a V(#J) usage. This identifies <strong>the</strong> variable <strong>to</strong> be renamed. It also provides its current name <strong>to</strong> <strong>the</strong><br />

mask.<br />

3. a mask in paren<strong>the</strong>ses which contains strings in quotes <strong>to</strong> be used exactly as entered. It also contains<br />

special characters such as <strong>the</strong> “&” which are used <strong>to</strong> select or omit letters from <strong>the</strong> input<br />

label and <strong>to</strong> supply numbers using <strong>the</strong> DO loop scratch variable.<br />

4. a semicolon, ending <strong>the</strong> statement.<br />

Examples of DO-RENAME loops with masks:<br />

[ DO #J USING Q1 TO Q23;<br />

RENAME V(#J) ( 'Survey.' & );<br />

ENDDO]<br />

“Survey.” is a prefix. Variables Q1 through Q23 will be renamed by prefixing <strong>the</strong>ir names with “Survey.”<br />

The new names will be “Survey.Q1”, “Survey.Q2”, and so on. This is an example of a simple mask:<br />

exp=expression vn=variable name nn=number


5.24 <strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks<br />

[ DO #j=21,35;<br />

RENAME V(#j) (XOOXX);<br />

ENDDO ]<br />

Here, a mask of (XOOXX) is supplied. The initial X says use <strong>the</strong> first input character, <strong>the</strong> OO says omit<br />

<strong>the</strong> next two characters, and <strong>the</strong> XX says use <strong>the</strong> next two (characters 4 and 5). This mask would rename<br />

VAR31 in<strong>to</strong> V31.<br />

MASKS for RENAME and GENERATE<br />

A mask is used <strong>to</strong> create a name for a variable, ei<strong>the</strong>r by modifying <strong>the</strong> ? or V(#J) name preceding it or<br />

by creating a <strong>to</strong>tally different name. The mask activity begins with a pointer on <strong>the</strong> initial character of<br />

<strong>the</strong> input name. The pointer is moved on<strong>to</strong> <strong>the</strong> next character after each usage of X, O, c or C. Fur<strong>the</strong>r<br />

use of X-O-c-C is ignored when <strong>the</strong> pointer is beyond <strong>the</strong> final input character.<br />

1. x or X takes <strong>the</strong> current input character, if usable.<br />

2. o or O omits <strong>the</strong> current input character. NOTE: <strong>the</strong> digit '0' is also usable.<br />

3. c takes <strong>the</strong> current input character, if usable, and, if it is a letter, puts it in<strong>to</strong> lower case. C takes<br />

<strong>the</strong> current input character, if usable, and, if it is a letter, puts it in<strong>to</strong> upper case.<br />

4. & takes all remaining usable input characters that can fit, starting at <strong>the</strong> current location of <strong>the</strong><br />

pointer.<br />

5. @4 places <strong>the</strong> pointer on<strong>to</strong> <strong>the</strong> 4th input character. (@4 xxx) and (ooo xxx) are identical.<br />

6. @-5 places <strong>the</strong> pointer on <strong>the</strong> 5th character from <strong>the</strong> right hand end.<br />

A character that has been <strong>the</strong> subject of any of <strong>the</strong> X-O-c-C-& opera<strong>to</strong>rs is no longer usable by any subsequent<br />

opera<strong>to</strong>r. As a result, ( @-4 OOOO @1 & 'post' ) could be used <strong>to</strong> inactivate <strong>the</strong> rightmost 4<br />

characters, take <strong>the</strong> rest, and add 'post' <strong>to</strong> it. In o<strong>the</strong>r words, <strong>the</strong> mask has replaced an existing 4-character<br />

suffix with a new one.<br />

Blanks are ignored in masks, which can markedly improve readability. For example:<br />

(XXXXXXXXX) and<br />

(XXX XXX XXX) are identical,<br />

O<strong>the</strong>r features of <strong>the</strong> mask are:<br />

1. 'ab.cde' moves <strong>the</strong> string contents in<strong>to</strong> <strong>the</strong> new name.<br />

2. D or d inserts <strong>the</strong> V subscript. This is based on <strong>the</strong> current value of <strong>the</strong> DO scratch variable. If<br />

V(#j) is used, 17 is inserted when #j=17. If V(#j+10) is used, 27 is inserted when #j=17. DD is<br />

like D, but forces 2 characters; 07 is used instead of 7. If DDD is used, three numbers are inserted<br />

in <strong>the</strong> new name and a 7 bonds <strong>to</strong> <strong>the</strong> new label as 007.<br />

3. N or n inserts <strong>the</strong> current iteration count of <strong>the</strong> DO loop. If this is <strong>the</strong> third trip through <strong>the</strong> loop,<br />

3 is inserted. NN provides 2 digits, NNN 3 digits. You do not have <strong>to</strong> use a counter scratch<br />

variable in <strong>the</strong> DO statement in order <strong>to</strong> use 'N'.<br />

IF-THEN-ELSE<br />

IF-THEN-ELSE blocks may include any <strong>PPL</strong> statements including o<strong>the</strong>r IF-THEN-ELSE blocks and DO<br />

LOOPS. GOTO may also be used as long as <strong>the</strong> target label is not in <strong>the</strong> middle of ano<strong>the</strong>r block.<br />

vn=variable name nn=number exp=expression


<strong>PPL</strong>: DO LOOPS and IF-THEN-ELSE Blocks 5.25<br />

IF Age GE 14, THEN;<br />

ELSE;<br />

ENDIF;<br />

The IF of an IF-THEN can be complex (using OR and AND) but it can only be followed by 'THEN;'.<br />

The IF-THEN statement is followed by all <strong>the</strong> <strong>PPL</strong> statements <strong>to</strong> be executed when <strong>the</strong> IF clause is true.<br />

The directions of this logic can be changed by using <strong>the</strong> M/F prefixes. M.THEN would be followed by<br />

<strong>PPL</strong> statements <strong>to</strong> be executed if <strong>the</strong> IF statement result is missing. F.THEN is executed only if <strong>the</strong> IF<br />

statements evaluate <strong>to</strong> FALSE.<br />

[ IF Age GE 14, THEN;<br />

PUT 'over 14';<br />

ELSE;<br />

PUT 'failed';<br />

ENDIF; ]<br />

ELSE; is followed by all <strong>the</strong> <strong>PPL</strong> statements <strong>to</strong> be executed when <strong>the</strong> IF clause is not true. Not true includes<br />

results that are ei<strong>the</strong>r false or missing. ELSE is optional. M.ELSE or F.ELSE can also be<br />

specified.<br />

ENDIF is required <strong>to</strong> denote <strong>the</strong> end of <strong>the</strong> IF block.<br />

exp=expression vn=variable name nn=number


6<br />

<strong>PPL</strong>:<br />

Functions and System Variables<br />

This chapter explains P-<strong>STAT</strong> functions. Functions evaluate or transform one or more arguments and yield a numeric<br />

or character value. This chapter also contains a complete list and description of <strong>the</strong> P-<strong>STAT</strong> system<br />

variables. System variables are special variables whose values are set by P-<strong>STAT</strong>, but may be accessed by <strong>the</strong><br />

user.<br />

Numeric functions and functions that transform ei<strong>the</strong>r numeric or character arguments are explained in this<br />

chapter. The final <strong>PPL</strong> chapter covers character (string) functions. The prior <strong>PPL</strong> chapters cover <strong>the</strong> basics of<br />

<strong>PPL</strong> modification — case and variable selection, changing existing variables, creating new variables, logical selection,<br />

positional notation, DO loops, <strong>the</strong> two recoding functions, NCOT and RECODE, and IF-THEN-ELSE<br />

blocks.<br />

Most data modification is done on a single case. The case is retained, deleted, or modified, depending on <strong>the</strong><br />

values of variables found in that case or on <strong>the</strong> value of some system variable such as .N. , <strong>the</strong> case number. This<br />

is within-case modification. The functions and system values described in this chapter are primarily applicable <strong>to</strong><br />

modification of a single case. The next <strong>PPL</strong> chapter covers across-case modification, that is, modification of multiple<br />

cases grouped <strong>to</strong>ge<strong>the</strong>r because of a common relationship.<br />

6.1 ONE-EXPRESSION FUNCTIONS<br />

There are four basic types of functions in <strong>the</strong> P-<strong>STAT</strong> programming language:<br />

1. functions that evaluate a single expression;<br />

2. functions that evaluate a list of expressions;<br />

3. special functions that evaluate <strong>the</strong> first expression in <strong>the</strong> argument list, using <strong>the</strong> additional arguments<br />

<strong>to</strong> define <strong>the</strong> function more precisely; and<br />

4. distribution functions that give <strong>the</strong> probability of obtaining a random deviate less than a specified<br />

value.<br />

One-expression functions evaluate a single numeric expression enclosed in paren<strong>the</strong>ses. The expression may<br />

be a variable name or position, a constant, or a complex expression. Complex expressions are nested expressions,<br />

expressions containing arithmetic opera<strong>to</strong>rs, and combinations of both of <strong>the</strong>se.<br />

The function is used in a <strong>PPL</strong> clause containing an instruction or a logical test and its consequence:<br />

SET Age = INT (Age);<br />

IF <strong>Inc</strong>ome GOOD, SET <strong>Inc</strong>ome = ROUND (<strong>Inc</strong>ome);<br />

Paren<strong>the</strong>ses enclose <strong>the</strong> expression that <strong>the</strong> function evaluates. In <strong>the</strong> first example just given, <strong>the</strong> INT (i.e., integer)<br />

function evaluates Age and yields <strong>the</strong> integer portion of Age. Age is set <strong>to</strong> this integer value. In <strong>the</strong> second<br />

example, if <strong>Inc</strong>ome is GOOD (non-missing), <strong>the</strong> ROUND function evaluates <strong>Inc</strong>ome and yields a value rounded<br />

<strong>to</strong> <strong>the</strong> nearest whole number. <strong>Inc</strong>ome is set <strong>to</strong> this rounded value.<br />

Functions may be nested within functions. For example:<br />

SET Root.<strong>Inc</strong>ome = ROUND ( SQRT ( <strong>Inc</strong>ome ));<br />

The square-root of <strong>Inc</strong>ome is rounded and s<strong>to</strong>red in variable Root.<strong>Inc</strong>ome.


6.2 <strong>PPL</strong>: Functions and System Variables<br />

The functions that evaluate a single numeric expression are:<br />

ABS ( exp ) absolute value<br />

COS ( exp ) cosine<br />

ACOS ( exp ) arc cosine<br />

EXP ( exp ) exponential (e raised <strong>to</strong> this exponent)<br />

FACTORIAL (exp) <strong>the</strong> fac<strong>to</strong>rial value of <strong>the</strong> argument<br />

FRAC ( exp ) fractional part<br />

INT ( exp ) integer part<br />

LOC ( vn ) location (of a variable)<br />

LOG ( exp ) natural logarithm (base e)<br />

LOG10 ( exp ) common logarithm (base 10)<br />

ROUND ( exp ) rounds <strong>to</strong> nearest integer<br />

CEIL (exp) smallest integer greater than or equal <strong>to</strong> <strong>the</strong> input value<br />

FLOOR (exp) largest integer that is less than or equal <strong>to</strong> <strong>the</strong> input value<br />

SIN ( exp ) sine<br />

ASIN ( exp ) arc sine<br />

SQRT ( exp ) square root<br />

TAN ( exp ) tangent<br />

ATAN ( exp ) arc tangent<br />

6.2 Rounding Functions<br />

The FRAC, INT and ROUND functions yield rounded values (of sorts). The original signs of <strong>the</strong> numbers are preserved.<br />

The ABS (absolute value) function yields <strong>the</strong> original value of a number without any sign. Examples of<br />

<strong>the</strong>se functions, using <strong>the</strong> same value as <strong>the</strong> argument for each, highlight <strong>the</strong> differences among <strong>the</strong> functions:<br />

Function Result<br />

FRAC ( -621.87 ) -0.87<br />

INT ( -621.87 ) -621<br />

ROUND ( -621.87 ) -622<br />

ABS ( -621.87 ) 621.87<br />

6.3 Floor and Ceiling<br />

FLOOR is a function that takes a numeric input and produces <strong>the</strong> largest integer that is less than or equal <strong>to</strong> <strong>the</strong><br />

input value. Thus:<br />

FLOOR(-4.1) = -5<br />

FLOOR( 2 ) = 2<br />

FLOOR( 2.9) = 2<br />

CEIL is a function that takes a numeric input and produces <strong>the</strong> smallest integer that is greater than or equal <strong>to</strong> <strong>the</strong><br />

input value. Thus:<br />

CEIL (-4.7) = -4<br />

CEIL ( 2 ) = 2<br />

CEIL ( 2.1) = 3


<strong>PPL</strong>: Functions and System Variables 6.3<br />

6.4 Exponential and Trigonometric Functions<br />

The SQRT function yields <strong>the</strong> square root of a number. The square of a number is obtained using <strong>the</strong> numeric<br />

opera<strong>to</strong>r ** (see <strong>the</strong> first <strong>PPL</strong> chapter). The LOG and LOG10 functions yield <strong>the</strong> natural and common logarithms<br />

of values, <strong>to</strong> base e and base 10, respectively. The EXP (exponential) function raises e <strong>to</strong> <strong>the</strong> value given as its<br />

argument (“undoing” <strong>the</strong> effect of <strong>the</strong> LOG function). Similarly, raising 10 <strong>to</strong> <strong>the</strong> value produced by LOG10 “undoes”<br />

that function.<br />

Function Result<br />

LOG ( 12094.5 ) 9.40051<br />

EXP ( 9.40051 ) 12094.5<br />

LOG10 ( 12094.5 ) 4.08259<br />

10 ** 4.08259 12094.5<br />

The SIN, COS and TAN functions yield <strong>the</strong> sine, cosine and tangent of <strong>the</strong>ir numeric argument. The ASIN,<br />

ACOS and ATAN functions yield <strong>the</strong> arc sine, arc cosine and arc tangent. Using <strong>the</strong>se functions in conjunction<br />

with <strong>the</strong> numeric opera<strong>to</strong>rs permits calculation of a variety of trigonometric expressions.<br />

6.5 The Fac<strong>to</strong>rial Function<br />

The FACTORIAL function yields <strong>the</strong> fac<strong>to</strong>rial value of <strong>the</strong> argument. This is often shown as N!. The argument<br />

should be a non-negative integer. If <strong>the</strong> argument is zero, <strong>the</strong> result is one. If <strong>the</strong> argument is an integer from 1<br />

through 169 or so, <strong>the</strong> result is <strong>the</strong> product of integers from one through that argument.<br />

Function Result<br />

FACTORIAL (0) 1<br />

FACTORIAL (5) 120<br />

FACTORIAL (169) 0.4269068E305<br />

FACTORIAL (200) Missing 1 (<strong>the</strong> result would be <strong>to</strong>o large)<br />

FACTORIAL (-12) Missing 3 (argument is negative)<br />

FACTORIAL (3.5) Missing 3 (argument not an integer)<br />

6.6 Creating Dummy Variables with <strong>the</strong> LOC Function<br />

The LOC function yields <strong>the</strong> location of a variable. Thus, it is slightly different from <strong>the</strong> o<strong>the</strong>r simple functions<br />

because it is not purely a numeric function. The value it returns is numeric, but <strong>the</strong> variable given as its argument<br />

may be a character or numeric one. See <strong>the</strong> explanation for EXPAND later in this chapter for ano<strong>the</strong>r way <strong>to</strong> generate<br />

several variables from one or more input variables.<br />

Function Result<br />

LOC ( Name ) 6 (when Name is <strong>the</strong> 6th variable)<br />

LOC ( Age ) 10 (when Age is <strong>the</strong> 10th variable)<br />

LOC is often used when <strong>the</strong> location of a variable, referenced by position, is not known:<br />

SET V ( LOC ( North.East ) + 1 ) = 100 ;<br />

The location of <strong>the</strong> variable named “North.East” plus 1 defines a value; if North.East is <strong>the</strong> fourth variable in <strong>the</strong><br />

file, that value is 5. This value is <strong>the</strong> subscript or index of V (<strong>the</strong> vec<strong>to</strong>r of variables in <strong>the</strong> file) and V(5) is set <strong>to</strong><br />

100.<br />

In Figure 6.1, four variables are created, one for each of <strong>the</strong> possible values of Region. These variables are<br />

set <strong>to</strong> 0 or 1 depending on <strong>the</strong> value of Region for that case. This is sometimes referred <strong>to</strong> as creating dummy


6.4 <strong>PPL</strong>: Functions and System Variables<br />

variables, a technique often used in setting up data for regression or analysis of variance. With only four variables,<br />

it may be easier <strong>to</strong> understand what is being done if you use:<br />

ra<strong>the</strong>r than:<br />

[ IF Region EQ 1, SET North.East = 1 ;<br />

IF Region EQ 2, SET North.West = 1 ;<br />

IF Region EQ 3, SET South.East = 1 ;<br />

IF Region EQ 4, SET South.West = 1 ]<br />

SET V ( LOC ( North.East ) + Region - 1 ) = 1;<br />

However, with many more variables and corresponding IF statements, <strong>the</strong> use of this calculated expression becomes<br />

more desirable.<br />

When a variable is created with GENERATE, it takes <strong>the</strong> next position at <strong>the</strong> right end of <strong>the</strong> file. Thus, it is<br />

easy <strong>to</strong> calculate <strong>the</strong> location of each variable in turn, if <strong>the</strong> location of <strong>the</strong> first new one is known. Since <strong>the</strong>re are<br />

three variables in file Regional, <strong>the</strong> variable North.East will be in position four. The LOC function returns <strong>the</strong><br />

location of a variable. Therefore LOC ( North.East ) has a value of 4:<br />

Region LOC (North.East) V (LOC (North.East) + Region-1)<br />

1 4 4 + 1 - 1 = 4<br />

2 4 4 + 2 - 1 = 5<br />

3 4 4 + 3 - 1 = 6<br />

4 4 4 + 4 - 1 = 7<br />

When Region is 4, <strong>the</strong>n <strong>the</strong> variable in position 7, South.West, is set <strong>to</strong> 1.<br />

__________________________________________________________________________<br />

Figure 6.1 Calculating Variable Positions<br />

FILE Regional:<br />

Age Sex Region<br />

52 1 1<br />

31 2 2<br />

65 1 3<br />

27 2 4<br />

LIST Regional<br />

[ GENERATE North.East = 0, GENERATE North.West = 0,<br />

GENERATE South.East = 0, GENERATE South.West = 0;<br />

SET V ( LOC ( North.East ) + Region - 1 ) = 1 ] $<br />

North North South South<br />

Age Sex Region East West East West<br />

52 1 1 1 0 0 0<br />

31 2 2 0 1 0 0<br />

65 1 3 0 0 1 0<br />

27 2 4 0 0 0 1<br />

__________________________________________________________________________


<strong>PPL</strong>: Functions and System Variables 6.5<br />

If <strong>the</strong> value of Region is outside <strong>the</strong> expected range, an error condition could occur, or <strong>the</strong> value of some o<strong>the</strong>r<br />

existing variable could be changed. The use of an AMONG test ensures that <strong>the</strong> value of Region will be used only<br />

if it is non-missing and between 1 and 4:<br />

IF Region AMONG (1 TO 4),<br />

SET V ( LOC ( North.East ) + Region - 1 ) = 1;<br />

When a calculation might produce a value o<strong>the</strong>r than an integer, <strong>the</strong> INT or ROUND function may be used:<br />

IF Region AMONG (1 TO 4),<br />

SET V ( LOC ( North.East ) + INT ( Region ) - 1 ) = 1;<br />

The creation of dummy variables may be simplified if <strong>the</strong> original order of <strong>the</strong> variables does not need <strong>to</strong> be<br />

preserved. KEEP rearranges <strong>the</strong> new variables after <strong>the</strong>y are created:<br />

GENERATE North.East = 0, GENERATE North.West = 0,<br />

GENERATE South.East = 0, GENERATE South.West = 0;<br />

KEEP .NEW. Age Sex Region;<br />

SET V (Region) = 1;<br />

When Region is a 3, using V(region) is equivalent <strong>to</strong> using V(3). Note <strong>the</strong> use of ".NEW." a system variable which<br />

refers <strong>to</strong> all <strong>the</strong> variables created in <strong>the</strong> current command.<br />

6.7 Creating a Single Variable from Dummy Variables<br />

Sometimes <strong>the</strong> data are already entered as a series of variables coded 0 and 1, and you would like <strong>to</strong>, in effect,<br />

“undummy” <strong>the</strong>m; that is, you would like <strong>to</strong> create a new variable which has its value based on <strong>the</strong> location of <strong>the</strong><br />

one variable in <strong>the</strong> series which has a value of 1. Given cases in a file such as this case:<br />

North North South South<br />

Age Sex East West East West<br />

52 1 1 0 0 0<br />

These <strong>PPL</strong> clauses create <strong>the</strong> new variable Region:<br />

GENERATE Region = .M1.;<br />

DO #J USING North.East TO South.West;<br />

IF V(#J) EQ 1, SET Region = #J + 1 - LOC(North.East);<br />

ENDDO;<br />

The DO loop scratch variable is“#J. #J takes on <strong>the</strong> values 3, 4, 5 and 6 as <strong>the</strong> DO loop is processed. The<br />

LOC of North.East always has a value of 3. If V(3) is 1 when #J = 3, <strong>the</strong> new variable Region is set <strong>to</strong> 1:<br />

3 (#J) + 1 (a constant) - 3 (location of North.East) = 1<br />

If V(4) = 1 when #J = 4, Region is set <strong>to</strong> 2 which is:<br />

and so on.<br />

4 (#J) + 1 (a constant) - 3 (location of North.East) = 2<br />

Again, <strong>the</strong> calculations of position may be simplified by using a second scratch variable in <strong>the</strong> DO loop:<br />

GENERATE Region = .M1.;<br />

DO #J #N USING North.East TO South.West;<br />

IF V(#J) EQ 1, SET Region = #N;<br />

ENDDO;<br />

#N takes on a value which corresponds <strong>to</strong> <strong>the</strong> number of times through <strong>the</strong> loop. Thus it will be a 1 when <strong>the</strong><br />

current DO is positioned at North.East and a 4 when it is positioned at South.West.


6.6 <strong>PPL</strong>: Functions and System Variables<br />

6.8 LIST FUNCTIONS<br />

These functions evaluate a list of variables given as <strong>the</strong>ir arguments. The variables may be referenced by names<br />

or positions, or a combination of both. Ranges of variables and wildcards may be included in <strong>the</strong> list. The numeric<br />

list functions are:<br />

MAX ( vnp list ) maximum value of variables<br />

MAX.GOOD ( vnp list )<br />

MEAN ( vnp list ) mean of variables<br />

MEAN.GOOD ( vnp list )<br />

MIN ( vnp list ) minimum value of variables<br />

MIN.GOOD ( vnp list )<br />

SDEV ( vnp list ) standard deviation of variables<br />

SDEV.GOOD ( vnp list )<br />

SUM ( vnp list ) sum of variables<br />

SUM.GOOD ( vnp list )<br />

The list functions that evaluate ei<strong>the</strong>r numeric or character arguments are:<br />

COUNT.GOOD ( vnp list ) number of non-missing values<br />

FIRST.GOOD ( vnp list ) value of first non-missing var<br />

LAST.GOOD ( vnp list ) value of last non-missing var<br />

These function can be used quite generally:<br />

GENERATE Check = 0;<br />

IF MIN ( Test1 TO Test8 ) EQ MAX ( Test1 TO Test8 ),<br />

SET Check = 1;<br />

GENERATE Average = MEAN.GOOD ( Test1 TO Test8 );<br />

6.9 Numeric List Functions<br />

The arguments for <strong>the</strong> numeric list functions are enclosed in paren<strong>the</strong>ses. Individual variable names and positions,<br />

wildcards, and ranges of variables may be specified.<br />

The numeric list functions may be suffixed with “.GOOD” <strong>to</strong> specify that <strong>the</strong>y apply only <strong>to</strong> good (non-missing)<br />

values. “.GOOD” may be abbreviated <strong>to</strong> “.G” if desired. The difference between <strong>the</strong> function MEAN and<br />

MEAN.GOOD is that MEAN gives <strong>the</strong> mean of all <strong>the</strong> variables in <strong>the</strong> list, whereas MEAN.GOOD gives <strong>the</strong><br />

mean of only <strong>the</strong> good variables in <strong>the</strong> list. If MEAN is used and any one of <strong>the</strong> variables in <strong>the</strong> list is missing,<br />

<strong>the</strong> result is missing. If MEAN.GOOD is used and any of <strong>the</strong> variables in <strong>the</strong> list is missing, <strong>the</strong> mean is computed<br />

using only whatever good values are available.<br />

A teacher computing final grades could use <strong>the</strong> function MEAN and give students who have not completed<br />

all tests a missing or incomplete grade. Given this file,<br />

FILE Students:<br />

MidTerm.1 Final.1 MidTerm.2 Final.2<br />

2 3 4 4<br />

3 - 2 1<br />

<strong>the</strong>se instructions compute both <strong>the</strong> mean of all values and <strong>the</strong> mean of non-missing values:


<strong>PPL</strong>: Functions and System Variables 6.7<br />

LIST Students [<br />

GENERATE Average.Good =<br />

MEAN.GOOD ( MidTerm.1 TO Final.2 );<br />

GENERATE Average.All =<br />

MEAN ( MidTerm.1 TO Final.2 )] $<br />

MidTerm Final MidTerm Final Average Average<br />

.1 .1 .2 .2 Good All<br />

2 3 4 4 3.25 3.25<br />

3 - 2 1 2.00 -<br />

A doc<strong>to</strong>r looking at average blood pressure readings for his or her patients might use MEAN.GOOD, which uses<br />

only <strong>the</strong> available good information. SUM.GOOD, MAX.GOOD, MIN.GOOD, and SDEV.GOOD all use only<br />

<strong>the</strong> good data and ignore <strong>the</strong> missing values:<br />

GENERATE Low.Score = MIN.GOOD ( Test10 TO Test15, V(33) ) ;<br />

6.10 Character and Numeric List Functions<br />

The COUNT.GOOD, FIRST.GOOD and LAST.GOOD functions detect or count non-missing data. The arguments<br />

for <strong>the</strong>se functions may be character or numeric variable name and position lists. However, numeric and<br />

character values cannot be combined in one list.<br />

COUNT.GOOD yields a numeric value:<br />

IF COUNT.GOOD ( Course.1 TO Course.8 )<br />

NOTAMONG ( 4 TO 6 ), SET Special = 1;<br />

FIRST.GOOD and LAST.GOOD yield <strong>the</strong> value of ei<strong>the</strong>r <strong>the</strong> first or last non-missing variable in <strong>the</strong> argument<br />

list. This may be ei<strong>the</strong>r a character or numeric value:<br />

GEN Last.Course:C =<br />

LAST.GOOD ( Course.1 TO Course.8 );<br />

Thus, when a variable is being generated or recoded, its data type must agree with that of <strong>the</strong> value returned by<br />

LAST.GOOD.<br />

The FIRST and LAST functions access ei<strong>the</strong>r <strong>the</strong> first or last cases in a file, or <strong>the</strong> first and last cases in subgroups.<br />

These across-case functions are explained in <strong>the</strong> next <strong>PPL</strong> chapter, and <strong>the</strong>y are also described briefly in<br />

<strong>the</strong> summary in this chapter.<br />

6.11 SPECIAL FUNCTIONS<br />

Most of <strong>the</strong> special functions require two arguments. The first is <strong>the</strong> actual argument for <strong>the</strong> function. his is followed<br />

by a second argument that provides extra information and controls how <strong>the</strong> function operates. The special<br />

functions and <strong>the</strong>ir arguments are:<br />

CHAREX ( expression, mask )<br />

COMBINATIONS ( expression, expression )<br />

DIF ( expression, constant )<br />

LAG ( expression, constant )<br />

MOD ( expression, constant )<br />

NCOT ( expression, instructions )


6.8 <strong>PPL</strong>: Functions and System Variables<br />

NUMEX ( expression, mask )<br />

PLACES ( expression, constant )<br />

RECODE ( expression, instructions)<br />

NCOT and RECODE, which are discussed in <strong>the</strong> second <strong>PPL</strong> chapter, are typical of <strong>the</strong>se functions:<br />

SET Age = RECODE ( Age, 91 TO 99 = 90 );<br />

GENERATE Coded.Age = NCOT ( Age, 20, 90/5 );<br />

The first argument may be a simple or complex expression which, when resolved, is a value. The second argument<br />

provides additional instructions for evaluating <strong>the</strong> function.<br />

6.12 The LAG and DIF Functions<br />

LAG and DIF access a variable value in a prior case. These functions are used in econometrics, as well as in o<strong>the</strong>r<br />

fields. The LAG function “lags” back a specified number of cases <strong>to</strong> obtain a value of a given variable <strong>to</strong> use in<br />

<strong>the</strong> current case. The variable name and <strong>the</strong> number of cases <strong>to</strong> lag back are necessary:<br />

GENERATE Gross.Last.Month = LAG (Gross.Profit, 1);<br />

In this example, <strong>the</strong> variable Gross.Last.Month is generated from <strong>the</strong> variable Gross.Profit one case back. Each<br />

case represents a month’s values here.<br />

__________________________________________________________________________<br />

Figure 6.2 Using LAG and DIF<br />

TITLE 'Gross Profit (in Thousands of Dollars)' $<br />

LIST Acct84 [<br />

GENERATE Gross.Last.Month = LAG (Gross.Profit, 1);<br />

GENERATE Difference.1 = DIF (Gross.Profit, 1);<br />

GENERATE Difference.2 = DIF (Gross.Profit, 2);<br />

GENERATE Two.Month.Gross = LAG (Gross.Profit, 1) + Gross.Profit;<br />

KEEP Month Gross.Profit .NEW. ], MAX.PLACES 1 $<br />

Gross Profit (in Thousands of Dollars)<br />

Gross Gross Last Difference Difference Two Month<br />

Month Profit Month .1 .2 Gross<br />

1 4.8 - - - -<br />

2 5.1 4.8 0.3 - 9.9<br />

3 4.9 5.1 -0.2 0.1 10.0<br />

4 5.7 4.9 0.8 0.6 10.6<br />

5 6.2 5.7 0.5 1.3 11.9<br />

6 5.6 6.2 -0.6 -0.1 11.8<br />

__________________________________________________________________________<br />

The LAG function’s arguments are: 1) a name of a numeric variable or an expression that provides <strong>the</strong> location<br />

of a numeric variable, and 2) a positive integer constant (not exceeding 500) that indicates <strong>the</strong> number of cases<br />

<strong>to</strong> lag back. The new variable’s values in <strong>the</strong> initial cases in <strong>the</strong> file are set <strong>to</strong> missing type one. To do a lag on a<br />

character variable use <strong>the</strong> CLAG function described in <strong>the</strong> chapter “Modification of Character Variables”.


<strong>PPL</strong>: Functions and System Variables 6.9<br />

The DIF function finds <strong>the</strong> difference between a variable’s value in <strong>the</strong> current case and that variable’s value<br />

in a prior case. The variable name (or expression) and <strong>the</strong> number of cases back <strong>to</strong> find <strong>the</strong> comparison variable<br />

value are required:<br />

GENERATE Difference.1 = DIF (Gross.Profit, 1);<br />

GENERATE Difference.2 = DIF (Gross.Profit, 2);<br />

Here, <strong>the</strong> variables Difference.1 and Difference.2 are generated and set equal <strong>to</strong> <strong>the</strong> difference in Gross.Profit this<br />

month (<strong>the</strong> current case) and last month and <strong>the</strong> month before that (one case back and two cases back).<br />

The DIF function’s arguments are: 1) a variable name or expression, and 2) a positive integer constant (not<br />

exceeding 500) that indicates <strong>the</strong> number of cases back in which <strong>to</strong> find <strong>the</strong> comparison value. The new variable’s<br />

values in <strong>the</strong> initial cases are set <strong>to</strong> missing type 1. Thus, DIF is very similar in operation <strong>to</strong> LAG. Figure 6.2<br />

illustrates <strong>the</strong> results obtained in various usages of LAG and DIF. Note that it is also easy <strong>to</strong> get sums, products,<br />

quotients, and so on, by using LAG and DIF in conjunction with o<strong>the</strong>r arithmetic operations.<br />

LAG and DIF work with <strong>the</strong> cases <strong>the</strong>y get from any preceding <strong>PPL</strong>. This means that you will almost never<br />

use <strong>the</strong>se functions within an IF. Only <strong>the</strong> cases which have a true value on <strong>the</strong> IF in Figure 6.3 will be input <strong>to</strong><br />

<strong>the</strong> LAG/DIF function. Thus variable If.1 is only set when <strong>the</strong> IF statement is true. The first time that <strong>the</strong> IF is<br />

true <strong>the</strong>re is nothing in <strong>the</strong> lag buffer so variable If.1 is set <strong>to</strong> missing and <strong>the</strong> lag buffer is set <strong>to</strong> <strong>the</strong> current value<br />

of var1, a 3.. The second time that <strong>the</strong> IF is true, variable If.1 is set <strong>to</strong> 3, <strong>the</strong> value s<strong>to</strong>red in <strong>the</strong> lag buffer. The<br />

lag buffer now contains <strong>the</strong> value 5 which is used <strong>the</strong> next time <strong>the</strong>re is a true result for <strong>the</strong> IF.<br />

___________________________________________________________________________<br />

Figure 6.3 Interaction of LAG and IF<br />

File Work<br />

var1 var2<br />

1 3<br />

2 4<br />

1 5<br />

1 6<br />

2 7<br />

MODIFY work [ GEN If.1; GEN No.If;<br />

IF var1 = 1 SET If.1 = LAG ( var2, 1 );<br />

SET No.If = LAG ( var2, 1 ); ],<br />

OUT work2 $<br />

File Work2<br />

No<br />

var1 var2 If.1 If<br />

1 3 - -<br />

2 4 - 3<br />

1 5 3 4<br />

1 6 5 5<br />

2 7 - 6<br />

___________________________________________________________________________<br />

6.13 Modular (Remainder) Arithmetic<br />

MOD is a function that returns <strong>the</strong> remainder after a constant has been divided in<strong>to</strong> <strong>the</strong> value of <strong>the</strong> first expression.<br />

(This is often referred <strong>to</strong> as modular arithmetic.) The first expression usually points <strong>to</strong> a numeric variable; <strong>the</strong>


6.10 <strong>PPL</strong>: Functions and System Variables<br />

second argument must be numeric. If Age is 25, <strong>the</strong>n MOD (Age, 7) is 4, <strong>the</strong> remainder after all <strong>the</strong> possible 7's<br />

are removed.<br />

The following examples illustrate <strong>the</strong> results returned by <strong>the</strong> MOD function:<br />

Function Result<br />

MOD ( .75, .5 ) .25<br />

MOD ( 1, .3 ) .1<br />

MOD ( 6, 1 ) .0<br />

MOD ( 6, 2 ) .0<br />

MOD ( 6, 4 ) 2.0<br />

MOD ( 12, 7 ) 5.0<br />

MOD ( .M., 3 ) -<br />

MOD may be used <strong>to</strong> construct patterns for retaining cases:<br />

IF MOD ( .N., 3 ) EQ 1 OR<br />

MOD ( .N., 7 ) EQ 1, RETAIN;<br />

This instruction tests <strong>the</strong> case number (.N.) and, if it is 1, 4, 7, 8, 10, 13, 15, 16, and so on, retains <strong>the</strong> case.<br />

6.14 Setting PLACES in Specific Variables<br />

The function PLACES requests a specific number of decimal places for specified numeric variables. The function<br />

sets <strong>the</strong> selected variable <strong>to</strong> <strong>the</strong> desired number of places before <strong>the</strong> file is passed <strong>to</strong> any commands, such as <strong>the</strong><br />

LIST command. Thus, <strong>the</strong> function PLACES may operate on one particular variable, and <strong>the</strong> subsequent use of<br />

LIST identifiers, such as MIN.PLACES or MAX.PLACES, may <strong>the</strong>n affect all of <strong>the</strong> variables in <strong>the</strong> listing (including<br />

<strong>the</strong> one already modified by <strong>the</strong> PLACES function). The following example, which uses <strong>the</strong> output file<br />

from <strong>the</strong> command T.TEST with <strong>the</strong> PLACES function and <strong>the</strong> LIST identifier MAX.PLACES, illustrates this.<br />

The number of decimal places of <strong>the</strong> variable named T.Prob is set <strong>to</strong> 2, and <strong>the</strong>n <strong>the</strong> file is passed <strong>to</strong> <strong>the</strong> LIST<br />

command:<br />

LIST TTests<br />

[ SET T.Prob = PLACES (T.Prob, 2) ], MAX.PLACES 3 $<br />

The identifier MAX.PLACES requests that <strong>the</strong> maximum number of decimal places for all of <strong>the</strong> variables be limited<br />

<strong>to</strong> 3. The listing produced will have three decimal places (if <strong>the</strong> data has that many places) for all of <strong>the</strong><br />

variables, except for T.Prob, which will have only two places.<br />

The PLACES function requires two expressions in paren<strong>the</strong>ses: 1) <strong>the</strong> argument that is <strong>the</strong> name of <strong>the</strong> variable<br />

whose places are <strong>to</strong> be set, and 2) an integer from 0 <strong>to</strong> 9 that specifies <strong>the</strong> number of decimal places in <strong>the</strong><br />

fractional portion of <strong>the</strong> number, counting from <strong>the</strong> decimal point.<br />

Note: <strong>the</strong> result of a PLACES function is usually less accurate than <strong>the</strong> input value, because information beyond<br />

<strong>the</strong> requested number of places has been dropped.<br />

6.15 Extracting Digits Using NUMEX<br />

Specific digits may be extracted from numeric variables <strong>to</strong> yield a new numeric value. The NUMEX function<br />

operates only on <strong>the</strong> integer portion of a numeric value; any sign and fraction portion are ignored.<br />

NUMEX requires two arguments, a numeric expression and a character string mask composed only of X's<br />

and 0's and enclosed in quotes:<br />

GEN Engine.Num = NUMEX ( Serial.Num, 'X0XXX0' );


<strong>PPL</strong>: Functions and System Variables 6.11<br />

The selection mask is made up of X and 0 (zero) characters and may be up <strong>to</strong> nine characters in length. An X<br />

retains (extracts) a digit and a 0 drops (ignores) a digit. The mask is aligned with <strong>the</strong> right-most digit of <strong>the</strong> numeric<br />

value.<br />

Lead zeros are not retained in <strong>the</strong> output numeric value. The following examples illustrate NUMEX:<br />

Function Result<br />

( 984601, 'XXX' ) 601<br />

( 80742 , 'XXXX' ) 742<br />

( 10065 , 'X00X' ) 5<br />

The CHAREX function is similar <strong>to</strong> NUMEX. It extracts specific digits from a numeric value, but CHAREX<br />

yields a character representation of <strong>the</strong> digits, in which lead zeros are preserved. CHAREX is explained fur<strong>the</strong>r<br />

in <strong>the</strong> final <strong>PPL</strong> chapter.<br />

6.16 COMBINATIONS of N things, K at a time<br />

COMBINATIONS (n,k) returns <strong>the</strong> number of different ways that K things can be taken from N things; i.e., N<br />

things K at a time. For example, combinations (5,2) is 10, namely, 1:2, 1:3, 1:4, 1:5, 2:3, 2:4, 2:5, 3:4, 3:5 and 4:5.<br />

N should be an integer from 1 <strong>to</strong> 6,000. K should be an integer from 0 <strong>to</strong> 60, but not more than N. If <strong>the</strong> result<br />

would be <strong>to</strong>o large, missing 1 is returned. If an argument is invalid, missing 3 is returned.<br />

The function is defined as N! divided by <strong>the</strong> product of K! and (N-K)!. However, <strong>the</strong> actual computation is<br />

done by a series of integer divisions, cancelling out terms, until <strong>the</strong> denomina<strong>to</strong>r is all ones. The result is <strong>the</strong> product<br />

of <strong>the</strong> remaining values in <strong>the</strong> numera<strong>to</strong>r.<br />

Function Result<br />

COMBINATIONS( 6,0 ) 1<br />

COMBINATIONS( 6,1 ) 6<br />

COMBINATIONS( 6,3 ) 20<br />

COMBINATIONS( 6,6 ) 1<br />

COMBINATIONS( 46,6 ) 9,366,819 (<strong>the</strong> NJ lottery odds)<br />

COMBINATIONS( 6000,60) 0.4368755E145<br />

COMBINATIONS( 20,.7) Missing 3 (invalid argument)<br />

6.17 EXPAND ONE OR MORE VARIABLES<br />

EXPAND is a <strong>PPL</strong> statement that projects <strong>the</strong> values of one or more input variables in<strong>to</strong> a set of new variables,<br />

each associated with a specified value in <strong>the</strong> input variables. In it’s simplest usage, EXPAND uses one input variable<br />

<strong>to</strong> create a group of new zero/one variables. Each case begins with <strong>the</strong> new variables set <strong>to</strong> zero. Then, if <strong>the</strong><br />

value on <strong>the</strong> input variable is one of <strong>the</strong> specified values, <strong>the</strong> associated new variable is set <strong>to</strong> one.<br />

The input variable can be numeric or character. The output variables are always numeric. These new variables<br />

are sometimes called “dummy” variables. Several input variables can be expanded <strong>to</strong>ge<strong>the</strong>r. The output variables<br />

can <strong>the</strong>n be set <strong>to</strong>:<br />

1. one if ANY of <strong>the</strong> input variables has <strong>the</strong> associated value. This is <strong>the</strong> default.<br />

2. <strong>the</strong> NUMBER of input variables that have <strong>the</strong> associated value.<br />

3. a one (1) if <strong>the</strong> first input variable has <strong>the</strong> associated value, o<strong>the</strong>rwise a two (2) if <strong>the</strong> second input<br />

variable has <strong>the</strong> value, and so on. In o<strong>the</strong>r words, <strong>the</strong> RANK of <strong>the</strong> value.


6.12 <strong>PPL</strong>: Functions and System Variables<br />

Suppose variable BREAD is coded 1 through 4, with 1 meaning rye, 2 meaning wheat, 3 meaning raisin and 4<br />

meaning white. The <strong>PPL</strong> statement<br />

[ expand bread, values 1:4, gen rye wheat raisin white ]<br />

will generate four new variables named RYE, WHEAT, RAISIN and WHITE. If a given case has a 1 on BREAD,<br />

<strong>the</strong> value for RYE for that case will be set <strong>to</strong> one and <strong>the</strong> o<strong>the</strong>r three <strong>to</strong> zero. If BREAD is two, <strong>the</strong> second new<br />

variable, WHEAT, is set <strong>to</strong> one and <strong>the</strong> rest <strong>to</strong> zero, and so forth.<br />

The new variables are placed after <strong>the</strong> last current variable. In <strong>the</strong> VALUES phrase, ei<strong>the</strong>r 1:4 or 1 TO 4 could<br />

have been used, <strong>the</strong>y mean <strong>the</strong> same thing.<br />

6.18 Overall Syntax of a <strong>PPL</strong> EXPAND Statement<br />

An EXPAND statement consists of phrases, separated by commas. Some phrases are just a single word, o<strong>the</strong>rs<br />

are more extensive. Three of <strong>the</strong>se phrases are required. They start with EXPAND, VALUES and GENERATE.<br />

EXPAND is followed by <strong>the</strong> names of <strong>the</strong> variables <strong>to</strong> be expanded. There is usually just one variable, but <strong>the</strong>re<br />

can be more. If several, <strong>the</strong>y must be ei<strong>the</strong>r all numeric or all character. The EXPAND phrase comes first, <strong>the</strong><br />

order of <strong>the</strong> rest of <strong>the</strong> phrases doesn’t matter.<br />

[ EXPAND crust, ...<br />

[ EXPAND first.<strong>to</strong>p second.<strong>to</strong>p third.<strong>to</strong>p, ...<br />

VALUES is followed by integers if <strong>the</strong> EXPAND variables are numeric, or by quoted character strings is <strong>the</strong><br />

input is character.<br />

6.19 Numeric Input Values<br />

If <strong>the</strong> input variables are numeric, <strong>the</strong> values <strong>to</strong> be tested should be integers from 0 <strong>to</strong> 9999. Ranges can be used,<br />

<strong>the</strong>y are indicated by TO or a colon (:). If some values and/or ranges are placed within paren<strong>the</strong>ses, <strong>the</strong>y will all<br />

be mapped in<strong>to</strong> a single output variable.<br />

An output variable is created:<br />

1. for EACH integer outside of paren<strong>the</strong>ses, and<br />

2. for each paren<strong>the</strong>sis structure.<br />

VALUES 1 3:6 9, makes six output variables.<br />

VALUES 1 9, makes two output variables.<br />

VALUES 1 TO 9, makes nine output variables.<br />

VALUES 0:9, makes ten output variables.<br />

VALUES 1 (3 5:8) 4, makes three output variables.<br />

6.20 Character Input Values<br />

If <strong>the</strong> input variables are character, <strong>the</strong> values <strong>to</strong> be tested should come in quotes, ei<strong>the</strong>r ‘xxx’ or “xxx”. The default<br />

is <strong>to</strong> ignore leading blanks, trailing blanks and case. Thus ‘ Ohio ‘ is equivalent <strong>to</strong> ‘ohio’.<br />

VALUES (‘nj’ ‘new jersey’) ‘ohio’ ‘virginia’,<br />

The above example creates 3 output variables, since <strong>the</strong>re is one set in paren<strong>the</strong>ses, and 2 standalone values. The<br />

first output variable is set <strong>to</strong> 1 when EITHER ‘nj’ or ‘new jersey’ is found.<br />

6.21 The GENERATE or GEN phrase<br />

[ EXPAND <strong>to</strong>pping.1 <strong>to</strong>pping.2, VALUES 1:5 9, GEN <strong>to</strong>p.* ]<br />

[ EXPAND region, VALUES 1:4, GEN east west north south]<br />

GENERATE provides <strong>the</strong> names for <strong>the</strong> variables being created. This can be done in two ways, prefix or full<br />

names.


<strong>PPL</strong>: Functions and System Variables 6.13<br />

GENERATE prefix.*<br />

GENERATE name name name<br />

A prefix like crust.* can be provided. Given<br />

[EXPAND varname, VALUES 1:3, GENERATE vvv.* ]<br />

<strong>the</strong> new variables will be named vvv.1, vvv.2, and vvv.3 . Given<br />

[EXPAND varname, VALUES 7 5 2, GENERATE vvv.* ]<br />

<strong>the</strong> new variables will be named vvv.7, vvv.5, and vvv.2 .<br />

A prefix can be used in a character expand. The quoted values are used <strong>to</strong> complete <strong>the</strong> names of <strong>the</strong> new<br />

variables. If (‘nj’ ‘new jersey’) or such is supplied, <strong>the</strong> first element is used, in this case ‘nj’.<br />

Alternatively, a name can be supplied for each value. Given [EXPAND varname, VALUES 1:3, GENER-<br />

ATE aaa bbb ccc] <strong>the</strong> new variables will be named aaa, bbb, and ccc, with aaa representing <strong>the</strong> value 1 and so<br />

forth. Given<br />

[EXPAND varname, VALUES 7 5 2, GENERATE aaa bbb ccc]<br />

<strong>the</strong> new variables will be named aaa, bbb, and ccc, with aaa representing <strong>the</strong> value 7 (because 7 was <strong>the</strong> first value,<br />

and aaa was <strong>the</strong> first name). In a character expand, <strong>the</strong> first test value is associated with <strong>the</strong> first output name, and<br />

so on.<br />

6.22 Options With Several Input Variables<br />

*The default, when two input variables have <strong>the</strong> same value, is <strong>to</strong> simply set <strong>the</strong> associated output variable <strong>to</strong> 1.<br />

1. ADD, causes an output variable <strong>to</strong> show <strong>the</strong> NUMBER of input variables that have that value.<br />

2. RANK, causes an output variable <strong>to</strong> show <strong>the</strong> ORDER of <strong>the</strong> input variable that is <strong>the</strong> first <strong>to</strong> have<br />

that value. By “order” we mean its position in <strong>the</strong> EXPAND phrase. Consider<br />

[EXPAND var1 var2 var3, values 1:5, gen xxx.*].<br />

If a case has a 3 on both var1 and var2, <strong>the</strong> default is <strong>to</strong> simply set xxx.3 <strong>to</strong> 1. If ADD is in use, xxx.3<br />

would be 2, <strong>the</strong> count of input variables that have that value.<br />

Suppose RANK is in use and a case has 2, 4 and 5 on input variables var1, var2 and var3, having<br />

used VALUES 1:5. That would cause xxx.2 <strong>to</strong> be set <strong>to</strong> 1, xxx.4 <strong>to</strong> 2, and xxx.5 <strong>to</strong> 3. Why is xxx.5<br />

set <strong>to</strong> 3 ? Because <strong>the</strong> initial 5 was found in <strong>the</strong> third input variable.<br />

3. NEED 2, This sets <strong>the</strong> number of non-missing input values that are needed for <strong>the</strong> output variables<br />

<strong>to</strong> be non-missing. The default is one. Thus, if all of <strong>the</strong> expand input is missing for a case, <strong>the</strong> default<br />

is for <strong>the</strong> result variables <strong>to</strong> be set <strong>to</strong> missing for that case.<br />

“NEED 0” can be used. This causes <strong>the</strong> result variables <strong>to</strong> be non-missing, no matter what <strong>the</strong> input<br />

is. “NEED 0” can be used when <strong>the</strong>re is just one input variable.<br />

Suppose “NEED 2” is used when <strong>the</strong>re are 3 input variables. Any case with less than 2 non-missing<br />

input values will be given missing result values.<br />

6.23 Options When <strong>the</strong> Input Variables Are Character<br />

[ EXPAND region, VALUES ‘north’ ‘south’ ‘east’ ‘west’,<br />

GEN region.*, EXACT, NO TRIM ]<br />

The default is <strong>to</strong> ignore leading blanks, trailing blanks and case. Thus ‘ Ohio ‘ is equivalent <strong>to</strong> ‘ohio’.


6.14 <strong>PPL</strong>: Functions and System Variables<br />

1. EXACT, Causes <strong>the</strong> case used in <strong>the</strong> VALUES quoted constants <strong>to</strong> be matched exactly. If VAL-<br />

UES ‘East’ ‘South’ ‘West’ were used, a case with ‘east’ would by default match <strong>the</strong> first value.<br />

However, if EXACT is in use, only ‘East’ would match it.<br />

2. NO TRIM, Causes lead and trailing blanks used in <strong>the</strong> VALUES quoted constants <strong>to</strong> be matched<br />

exactly.<br />

The default does <strong>the</strong> compares using left-justified copies of both <strong>the</strong> test values (from <strong>the</strong> VALUES phrase)<br />

and <strong>the</strong> data from <strong>the</strong> EXPAND variables in <strong>the</strong> current case. In o<strong>the</strong>r words, lead blanks are trimmed before comparing.<br />

If NO TRIM is used, <strong>the</strong> lead blanks are meaningful. Trailing blanks don’t matter in any event.<br />

__________________________________________________________________________<br />

Figure 6.4 EXPAND Example<br />

File xxx has one variable and four cases.<br />

crust<br />

1<br />

3<br />

9<br />

--<br />

LIST xxx [ EXPAND crust, VALUES 1:5, GEN crust.* ]$<br />

produces<br />

crust crust.1 crust.2 crust.3 crust.4 crust.5<br />

1 1 0 0 0 0<br />

3 0 0 1 0 0<br />

9 0 0 0 0 0<br />

-- - - - - -<br />

___________________________________________________________________________<br />

In Figure 6.4, note <strong>the</strong> 9 in case 3. A non-missing input value is ignored when it does not match anything in <strong>the</strong><br />

VALUES phrase. Suppose <strong>the</strong>re is only one input variable. If a case has a value of zero on that variable when<br />

VALUES 1 TO 7 was used, <strong>the</strong> new variables for that case are all zero.<br />

6.24 SYSTEM VARIABLES<br />

System variables are defined and set by P-<strong>STAT</strong> as a run is processed. Typically, <strong>the</strong> values of system variables<br />

may not be changed by users, but <strong>the</strong>y may be accessed, tested and assigned <strong>to</strong> o<strong>the</strong>r variables. The names of<br />

system variables are surrounded by decimal points. This distinguishes system variables from user variables,<br />

whose names must begin with a letter.<br />

General system variables are discussed in <strong>the</strong> following sections. Numeric constants (.e., .PI.) are in <strong>the</strong> summary<br />

at <strong>the</strong> end of <strong>the</strong> chapter. O<strong>the</strong>r system variables used in across-case modifications are described elsewhere<br />

in this manual.<br />

6.25 Referencing Good and Missing Data<br />

.G. is <strong>the</strong> system variable for good data and .M. is <strong>the</strong> system variable for missing data of any type. .M1. indicates<br />

missing type 1, .M2. indicates missing type 2, and .M3. indicates missing type 3. Combinations such as .M13. for<br />

types 1 and 3 can also be used.<br />

The system variable .G. tests for good (non-missing) data:


<strong>PPL</strong>: Functions and System Variables 6.15<br />

IF Name EQ .G., RETAIN;<br />

The system variables for missing data are used both <strong>to</strong> test for missing and <strong>to</strong> set values <strong>to</strong> one of <strong>the</strong> types of<br />

missing:<br />

IF Age EQ .M., SET Age = .M2. ;<br />

IF Age LT 3 , SET Age = .M3. ;<br />

When .M. is specified as a consequence, it is treated as if it were .M1. (missing type 1). When .M. is specified<br />

as a test, it is treated as if it were any of <strong>the</strong> three types of missing. In this example, a case is deleted if <strong>the</strong> value<br />

of variable Age is any of <strong>the</strong> three types of missing:<br />

IF Age EQ .M., DELETE ;<br />

Note that when an IF clause tests for missing or good values, it produces only a true or false result.<br />

The system variable .M. and <strong>the</strong> equal-sign opera<strong>to</strong>r can be combined in<strong>to</strong> <strong>the</strong> opera<strong>to</strong>r MISSING. Both of<br />

<strong>the</strong>se instructions produce <strong>the</strong> same results:<br />

IF Age MISSING, DELETE ;<br />

IF Age EQ .M. , DELETE ;<br />

Similarly, .G. and <strong>the</strong> equal-sign may be combined in<strong>to</strong> <strong>the</strong> opera<strong>to</strong>r GOOD. These are <strong>the</strong> same:<br />

IF <strong>Inc</strong>ome GOOD, RETAIN;<br />

IF <strong>Inc</strong>ome EQ .G., RETAIN;<br />

Note that <strong>the</strong> system variables .G. and .M. are values, and thus may be used with <strong>the</strong> equal-sign opera<strong>to</strong>r, but that<br />

GOOD and MISSING are opera<strong>to</strong>rs already.<br />

6.26 Selecting Variables with .NEW. and .OTHERS.<br />

.NEW. and .OTHERS. reference variables concisely in variable selection clauses. .NEW. is used after KEEP and<br />

DROP <strong>to</strong> refer <strong>to</strong> all new variables created within a command:<br />

GENERATE Medicare.Amt = .80 * Approved.Amt ;<br />

GENERATE Patient.Amt = Approved.Amt - Medicare.Amt;<br />

KEEP Patient.ID TO Approved.Amt .NEW. ;<br />

Only <strong>the</strong> specified variables and <strong>the</strong> two new variables are kept. .NEW. may be used in KEEP and DROP clauses<br />

with or without o<strong>the</strong>r variable names.<br />

.OTHERS. is used in KEEP clauses as a shortcut <strong>to</strong> rearranging variables. .OTHERS. refers <strong>to</strong> all o<strong>the</strong>r variables<br />

in <strong>the</strong> file not explicitly specified in <strong>the</strong> KEEP clause:<br />

KEEP Patient.ID Code.Num .OTHERS.<br />

Billed.Amt TO Approved.Amt ;<br />

This clause keeps all of <strong>the</strong> variables in <strong>the</strong> file, but reorders <strong>the</strong>m as specified.<br />

6.27 Referencing <strong>the</strong> Number of Variables in <strong>the</strong> File<br />

.NV. is <strong>the</strong> system variable for <strong>the</strong> number of variables in <strong>the</strong> file at a given time. This value changes as KEEP,<br />

DROP, GENERATE, SPLIT and COLLECT statements are processed. The following example illustrates ano<strong>the</strong>r<br />

solution <strong>to</strong> <strong>the</strong> problem of creating a series of dummy variables, discussed earlier in this chapter.<br />

GENERATE Number.Vars = .NV.,<br />

GENERATE North.East = 0, GENERATE North.West = 0,<br />

GENERATE South.East = 0, GENERATE South.West = 0;<br />

SET V (Number.Vars + Region) = 1;


6.16 <strong>PPL</strong>: Functions and System Variables<br />

As each case is read, a new variable Number.Vars is generated equal <strong>to</strong> <strong>the</strong> number of variables in <strong>the</strong> file. This<br />

number includes <strong>the</strong> variable being created:<br />

Number North North South South<br />

XA XB XC Region Vars East West East West<br />

1 2 2 2 5 0 1 0 0<br />

2 - 3 4 5 0 0 0 1<br />

Number.Vars is 5. Thus, V ( Number.Vars + Region ) is V(7) or North.West when variable Region is 2, and V(9)<br />

or South.West when variable Region is 4.<br />

6.28 Referencing <strong>the</strong> Current Case Number<br />

.N., .HERE. and .USED. are system variables that refer <strong>to</strong> case numbers. .N. is equal <strong>to</strong> <strong>the</strong> current input case<br />

number (after any case selection). This value is increased every time a case is read, even though that case may be<br />

deleted and not passed <strong>to</strong> <strong>the</strong> current command. .HERE. is <strong>the</strong> number of cases that have been retained — that<br />

have actually been passed <strong>to</strong> <strong>the</strong> command up <strong>to</strong> <strong>the</strong> point when .HERE. is processed. .USED. is <strong>the</strong> number of<br />

cases that have been used after <strong>the</strong> completion of all <strong>PPL</strong>. These are cases that are passed <strong>to</strong> <strong>the</strong> current command<br />

preceding <strong>the</strong> <strong>PPL</strong>. The three values are <strong>the</strong> same when no cases have been deleted.<br />

.N. provides an easy way <strong>to</strong> delete individual cases by using <strong>the</strong>ir positions in <strong>the</strong> file:<br />

IF .N. AMONG ( 31, 100 TO 105, 399 ), DELETE;<br />

The next instruction retains <strong>the</strong> first 98 cases in <strong>the</strong> file and makes <strong>the</strong>m available <strong>to</strong> any subsequent <strong>PPL</strong> clauses<br />

and <strong>to</strong> <strong>the</strong> current command:<br />

IF .N. LT 99, RETAIN;<br />

However, <strong>the</strong> case reader continues <strong>to</strong> read through <strong>the</strong> rest of <strong>the</strong> file, testing each case against <strong>the</strong> value of .N.<br />

Thus, case selection is more economical:<br />

CASES 1 TO 98;<br />

The diagonal elements of a square matrix may be set <strong>to</strong> 1 easily with .N. and a DO loop:<br />

DO #J USING 1 .ON.;<br />

IF .N. EQ #J, SET V(#J) = 1;<br />

ENDDO;<br />

This is often useful when working with matrices.<br />

.HERE. is set <strong>to</strong> <strong>the</strong> number of cases that have been processed by <strong>the</strong> <strong>PPL</strong> clause in which .HERE. is found.<br />

If no cases have been deleted prior <strong>to</strong> that <strong>PPL</strong> clause, .HERE. is <strong>the</strong> same as .N. , <strong>the</strong> current input case number.<br />

.USED., which may be abbreviated <strong>to</strong> .U., is set after all cases are processed by all <strong>PPL</strong> clauses. It is <strong>the</strong> count of<br />

all cases not deleted by any logical tests. .USED. is <strong>the</strong> same as .N. when no cases have been deleted in any of <strong>the</strong><br />

<strong>PPL</strong> clauses.<br />

Figure 6 .5 illustrates <strong>the</strong> differences between .N., .HERE. and .USED. Each case in <strong>the</strong> output file contains<br />

three new variables. Input.Case.No is <strong>the</strong> sequence number in <strong>the</strong> input file, Student.No is <strong>the</strong> sequence number<br />

of students, and Output.Case.No is <strong>the</strong> sequence number in <strong>the</strong> output file. Student.No and Output.Case.No have<br />

gaps in <strong>the</strong> number sequence, indicating cases that were not students and cases that were students with missing<br />

tests, respectively.


<strong>PPL</strong>: Functions and System Variables 6.17<br />

__________________________________________________________________________<br />

Figure 6.5 Showing <strong>the</strong> Differences Between .N., .HERE. and .USED.<br />

File ABU:<br />

Test Test Test<br />

Status .1 .2 .3<br />

student 95 99 94<br />

student 87 81 93<br />

non-mat 78 86 89<br />

student 67 - 69<br />

student 87 88 90<br />

LIST ABU<br />

[ GENERATE Input.Case.No = .N. ;<br />

GENERATE Output.Case.No;<br />

GENERATE Student.NO ]<br />

[<br />

IF Status NE 'Student', DELETE ;<br />

SET Student.No = .HERE. ;<br />

IF ANY ( Test? ) MISSING, DELETE ;<br />

SET Output.Case.No = .USED. ] $<br />

Input Output<br />

Test Test Test Case Student Case<br />

Status .1 .2 .3 No No No<br />

student 95 99 94 1 1 1<br />

student 87 81 93 2 2 2<br />

student 87 88 90 5 4 3<br />

__________________________________________________________________________<br />

When files are concatenated on-<strong>the</strong>-fly and <strong>the</strong> same modifications are applied ( * ), case counting (.N.,<br />

.HERE. and .USED.) continues as if <strong>the</strong> files were a single file:<br />

MODIFY File1<br />

[ GENERATE Input.Case.No = .N.; GENERATE Output.Case.No;<br />

IF Test4 MISSING, DELETE ;<br />

SET Output.Case.No = .HERE.]<br />

+ File2 [ * ], OUT File12 $<br />

6.29 Referencing Numeric and Character Variables<br />

.NUMERIC. is <strong>the</strong> list of all numeric variables in a file. Similarly, .CHARACTER. is <strong>the</strong> list of all character variables<br />

in <strong>the</strong> file. Both of <strong>the</strong>se system variables are used in KEEP or DROP clauses.<br />

A selection of numeric variables may be appropriate for recoding and for input <strong>to</strong> some commands:


6.18 <strong>PPL</strong>: Functions and System Variables<br />

SURVEY Streams<br />

[ KEEP .NUMERIC. ;<br />

DO #J USING Stream1 TO Stream7;<br />

IF V(#J) MISSING THEN;<br />

SET V(#J) = 0;<br />

ELSE;<br />

SET V(#J) = NCOT ( V(#J), 25, 50, 75 );<br />

ENDIF;<br />

ENDDO ] ;<br />

STUBS Stream1 TO Stream7 $<br />

A selection of character variables may be useful in mapping (recoding) character values.<br />

MAP River<br />

[KEEP .CHARACTER. ], VAR Station1 TO Station7, OUT RiverMap $<br />

Ei<strong>the</strong>r system variable reorders variables:<br />

MODIFY Class88<br />

[ KEEP ID Name .NUMERIC. .OTHERS. ], OUT Class88 $<br />

6.30 Accessing <strong>the</strong> PUT Counter<br />

Each time that a case is read, .PUT. is set <strong>to</strong> 0. If <strong>the</strong> PUT or PUTL instructions are evoked (see <strong>the</strong> prior chapter),<br />

.PUT. is increased. After all <strong>the</strong> modifications for a given case are done, .PUT. can be tested <strong>to</strong> see whe<strong>the</strong>r <strong>the</strong><br />

PUT logic produced printed text. In this example, several checks for mutually inconsistent data are made. If inconsistencies<br />

are found, an explana<strong>to</strong>ry statement is printed:<br />

MODIFY InFile<br />

[ IF Age LT 18 AND Veteran GT 0,<br />

PUT '<strong>Inc</strong>onsistent values of Age and Veteran for '<br />

First.Name Last.Name ;<br />

IF Age LT 15 AND Married GT 0,<br />

PUT '<strong>Inc</strong>onsistent values of Age and Married for '<br />

First.Name Last.Name ;<br />

IF .PUT. GT 0, RETAIN ],<br />

OUT To.Check $<br />

The PUT counter .PUT. is increased, and those records with inconsistencies are retained for fur<strong>the</strong>r examination<br />

in a new file, To.Check.<br />

6.31 File, Date, Page and Line References<br />

.FILE. is <strong>the</strong> system variable that refers <strong>to</strong> <strong>the</strong> current P-<strong>STAT</strong> system file. It can be used <strong>to</strong> pass <strong>the</strong> filename <strong>to</strong><br />

<strong>the</strong> TITLE command or as an argument in <strong>the</strong> FIRST and LAST functions. FIRST and LAST are usually used for<br />

processing groups of related cases. When FIRST and LAST are used with .FILE. as <strong>the</strong> argument, <strong>the</strong>y test for<br />

<strong>the</strong> beginning and end of <strong>the</strong> file:<br />

[ IF FIRST (.FILE.), GENERATE #Children = 0 ;<br />

IF Age LT 16, INCREASE #Children ;<br />

IF LAST (.FILE.), RETAIN ;<br />

KEEP School.District #Children ]


<strong>PPL</strong>: Functions and System Variables 6.19<br />

The statement “IF FIRST (.FILE.)” is true only when <strong>the</strong> first case of a file is processed. Similarly <strong>the</strong> statement<br />

“IF LAST (.FILE.)” is true only when <strong>the</strong> last case of a file is processed. FIRST, LAST, and .FILE. have extensive<br />

uses in across-case data modification and are discussed in detail in <strong>the</strong> chapter “<strong>PPL</strong>: Across Case Modifications”.<br />

.DATE. is <strong>the</strong> current date. Its value is set when <strong>the</strong> current command begins. .DATE. is in character form,<br />

and thus a variable generated or set <strong>to</strong> .DATE. must be of character data type:<br />

[ GENERATE Today:C = .DATE. ]<br />

The exact string produced by .DATE. depends upon <strong>the</strong> computer on which P-<strong>STAT</strong> is running. Using .NDATE.<br />

requests <strong>the</strong> numeric form of <strong>the</strong> date. Note: .NDATE. returns a 4 digit year.<br />

.PAGE. and .TIME. reference <strong>the</strong> current page number since <strong>the</strong> command began and <strong>the</strong> current time when<br />

<strong>the</strong> command began. .CPAGE. resets <strong>the</strong> page number at each command ra<strong>the</strong>r than at each run. .RPAGE. sets <strong>the</strong><br />

page number within a run. The page value is numeric. The time value is character in <strong>the</strong> form “11:34:05” (hours:<br />

minutes: seconds). .NTIME. requests <strong>the</strong> numeric values of time (without colons). These four system variables<br />

are often used in <strong>to</strong>p and bot<strong>to</strong>m titles. Exact, run and command values, as well as numeric and character values,<br />

may also be requested.<br />

Manipulating date and time values is covered in a separate chapter. It describes 40 functions and 10 commands<br />

for formatting date/time values, finding <strong>the</strong> difference between date/time values, etc.


6.20 <strong>PPL</strong>: Functions and System Variables<br />

<strong>PPL</strong><br />

SUMMARY<br />

Functions are part of <strong>the</strong> P-<strong>STAT</strong> programming language. Function arguments are enclosed in<br />

paren<strong>the</strong>ses:<br />

LIST File109<br />

[ SET Usage = LOG ( Usage ) ;<br />

SET CPU.Time = PLACES ( CPU.Time, 2 ) ] $<br />

<strong>PPL</strong> Functions: Numeric — Single Expression<br />

The following functions require a single numeric expression as an argument:<br />

ABS (exp)<br />

gives <strong>the</strong> absolute value of <strong>the</strong> expression.<br />

COS (exp)<br />

gives <strong>the</strong> cosine of <strong>the</strong> expression.<br />

ACOS (exp)<br />

gives <strong>the</strong> arc cosine of <strong>the</strong> expression.<br />

EXP (exp)<br />

raises e <strong>to</strong> <strong>the</strong> exponent which is <strong>the</strong> value of <strong>the</strong> expression.<br />

FACTORIAL (exp)<br />

The FACTORIAL function yields <strong>the</strong> fac<strong>to</strong>rial value of <strong>the</strong> argument. This is often shown as N!.<br />

FRAC (exp)<br />

gives <strong>the</strong> fractional part of <strong>the</strong> numerical expression.<br />

INT (exp)<br />

gives <strong>the</strong> integer part of <strong>the</strong> numerical expression.<br />

LOC (exp)<br />

gives <strong>the</strong> location of <strong>the</strong> variable specified in <strong>the</strong> expression. The location is <strong>the</strong> position of <strong>the</strong> variable<br />

in <strong>the</strong> file, counting from <strong>the</strong> left.<br />

LOG (exp)<br />

gives <strong>the</strong> natural log (base e) of <strong>the</strong> numerical expression.<br />

LOG10 (exp)<br />

gives <strong>the</strong> common log (base 10) of <strong>the</strong> numerical expression.<br />

vnp=var name/position nn=number vn=variable name exp=expression


<strong>PPL</strong>: Functions and System Variables 6.21<br />

ROUND (exp)<br />

rounds <strong>the</strong> numerical expression <strong>to</strong> <strong>the</strong> nearest integer.<br />

SIN (exp)<br />

gives <strong>the</strong> sine of <strong>the</strong> numerical expression.<br />

ASIN (exp)<br />

gives <strong>the</strong> arc sine of <strong>the</strong> numerical expression.<br />

SQRT (exp)<br />

gives <strong>the</strong> square root of <strong>the</strong> numerical expression.<br />

TAN (exp)<br />

gives <strong>the</strong> tangent of <strong>the</strong> numerical expression.<br />

ATAN (exp)<br />

gives <strong>the</strong> arc tangent of <strong>the</strong> numerical expression.<br />

<strong>PPL</strong> Functions: Numeric — List<br />

The following functions operate on a list of numeric variables, which may be referenced by name, position,<br />

ranges, and wildcards. Functions will return missing if any variable is missing unless “.GOOD” is<br />

<strong>the</strong> suffix. When that is <strong>the</strong> case, <strong>the</strong> result will be based on all non-missing (good) values. At least one<br />

variable name or position (vnp) is required in <strong>the</strong> list.<br />

MAX (vnp list)<br />

gives <strong>the</strong> maximum value of <strong>the</strong> variables in <strong>the</strong> list:<br />

[ GEN Larger = MAX ( Length Girth ) ]<br />

MAX.GOOD (vnp list)<br />

gives <strong>the</strong> maximum value of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />

MEAN (vnp list)<br />

gives <strong>the</strong> arithmetic mean of <strong>the</strong> variables in <strong>the</strong> list:<br />

[ GEN Mean.Weight = MEAN ( V(1) .ON. ) ]<br />

MEAN.GOOD (vnp list)<br />

gives <strong>the</strong> arithmetic mean of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />

MIN (vnp list)<br />

gives <strong>the</strong> minimum value of <strong>the</strong> variables in <strong>the</strong> list.<br />

MIN.GOOD (vnp list)<br />

gives <strong>the</strong> minimum value of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />

SDEV (vnp list)<br />

gives <strong>the</strong> standard deviation of <strong>the</strong> variables in <strong>the</strong> list.<br />

vn=variable name exp=expression vnp=var name/position nn=number


6.22 <strong>PPL</strong>: Functions and System Variables<br />

SDEV.GOOD (vnp list)<br />

gives <strong>the</strong> standard deviation of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />

SUM (vnp list)<br />

gives <strong>the</strong> sum of <strong>the</strong> variables in <strong>the</strong> list:<br />

GEN Score = SUM ( Test? ) / 3 ;<br />

SUM.GOOD (vnp list)<br />

gives <strong>the</strong> sum of <strong>the</strong> non-missing variables in <strong>the</strong> list.<br />

<strong>PPL</strong> Functions: Numeric — Special<br />

The following special functions require one expression that describes <strong>the</strong> domain of <strong>the</strong> function (usually<br />

a variable) and one or more extra arguments, depending on <strong>the</strong> particular function.<br />

COMBINATIONS (exp, exp )<br />

COMBINATIONS (n,k) returns <strong>the</strong> number of different ways that K things can be taken from N things;<br />

i.e., N things K at a time. For example, combinations(5,2) is 10, namely, 1:2, 1:3, 1:4, 1:5, 2:3, 2:4, 2:5,<br />

3:4, 3:5 and 4:5.<br />

DIF (exp, nn)<br />

gives <strong>the</strong> difference between <strong>the</strong> current value of <strong>the</strong> numeric variable designated in <strong>the</strong> expression and<br />

<strong>the</strong> value of that variable nn cases back:<br />

GEN Difference.2 = DIF ( Gross.Profit, 2 ) ;<br />

The number nn must be a positive integer constant not exceeding 500.<br />

LAG (exp, nn)<br />

gives <strong>the</strong> value of <strong>the</strong> numeric variable, designated in <strong>the</strong> expression, nn cases back:<br />

GEN Gross.Last.Yr = LAG ( Gross.Profit, 1 ) ;<br />

The number nn, <strong>the</strong> number of cases <strong>to</strong> “lag” back, must be a positive integer constant not exceeding 500.<br />

MOD (exp, nn)<br />

gives <strong>the</strong> remainder after <strong>the</strong> numeric expression has been divided by <strong>the</strong> positive constant (nn):<br />

SET Time.Hours = MOD ( Ship.Time, 12 ) ;<br />

This is sometimes called modular arithmetic.<br />

NCOT (exp, n-chotimization instructions)<br />

recodes <strong>the</strong> numeric variable specified in <strong>the</strong> expression according <strong>to</strong> <strong>the</strong> instructions given in <strong>the</strong> second<br />

argument:<br />

GEN Age = NCOT ( Age, 10, 20, 30, 40 ) ; or<br />

GEN Age = NCOT ( Age, 10, 40/10 ) ;<br />

Both <strong>the</strong> preceding instructions do an N-way dicho<strong>to</strong>mization or division of <strong>the</strong> variable values. All values<br />

of age less than or equal <strong>to</strong> 10 become 1, those less than or equal <strong>to</strong> 20 become 2, and so on up <strong>to</strong><br />

values of 40, which become 4. Above 40 becomes a 5.<br />

vnp=var name/position nn=number vn=variable name exp=expression


<strong>PPL</strong>: Functions and System Variables 6.23<br />

NUMEX (exp, 'XX00')<br />

extracts specific digits from a numeric variable value and yields a numeric representation of those digits.<br />

NUMEX operates only on <strong>the</strong> integer portion of <strong>the</strong> number — any fractional portion and sign are ignored.<br />

The two required arguments are a numeric expression and a character string mask enclosed in<br />

quotes:<br />

GEN Month = NUMEX (Date, 'XX00' ) ;<br />

The selection mask is composed of X and 0 (zero) characters and may be up <strong>to</strong> nine characters in length.<br />

An X retains a digit and a 0 drops a digit. The selection mask is aligned with <strong>the</strong> right-most digit of <strong>the</strong><br />

numeric value. Lead zeros are not retained in <strong>the</strong> output number. Thus, <strong>the</strong> selection mask “XX00X”<br />

applied <strong>to</strong> “156” yields <strong>the</strong> number 6. The character function CHAREX may be used if lead zeros are<br />

needed in <strong>the</strong> result.<br />

PLACES (exp, nn)<br />

sets <strong>the</strong> variable specified in <strong>the</strong> numeric expression <strong>to</strong> <strong>the</strong> number of places specified by <strong>the</strong> second argument,<br />

which must be a positive integer not greater than 9.<br />

GEN ##N = 1.2345 $<br />

PUT ##N > ( PLACES ( ##N, 1 )) > (PLACES ( ##N,3 )) $<br />

produces <strong>the</strong> following line:<br />

1.2345 1.2 1.235<br />

<strong>PPL</strong> Functions: Character and Numeric<br />

COUNT.GOOD (vnp, vnp)<br />

gives <strong>the</strong> number of non-missing values in <strong>the</strong> list of expressions. Only variable names or positions may<br />

be in <strong>the</strong> list.<br />

FIRST.GOOD (vnp, vnp)<br />

gives <strong>the</strong> value of <strong>the</strong> first non-missing variable in <strong>the</strong> list of expressions. Only variable names or positions<br />

may be in <strong>the</strong> list.<br />

GEN Date = FIRST.GOOD (Date.1 TO Date.4) ;<br />

LAST.GOOD (vnp, vnp)<br />

gives <strong>the</strong> value of <strong>the</strong> last non-missing variable in <strong>the</strong> list of expressions. Only variable names or positions<br />

may be in <strong>the</strong> list.<br />

FIRST (.FILE. or vn)<br />

is evaluated as true if it is <strong>the</strong> first case in <strong>the</strong> subgroup specified in <strong>the</strong> expression, and false if it is not<br />

<strong>the</strong> first case. The required expression is a variable name (vn) or a list of up <strong>to</strong> 5 variables, or <strong>the</strong> system<br />

value .FILE. (meaning <strong>the</strong> current file):<br />

IF FIRST (Grade, Sex), INC #Counter ;<br />

Changing values of <strong>the</strong> variable or variables define different subgroups.<br />

LAST (.FILE. or vn)<br />

is evaluated as true if it is <strong>the</strong> last case in <strong>the</strong> subgroup specified in <strong>the</strong> expression, and false if it is not<br />

<strong>the</strong> last case. The required expression is a variable name (vn) or a list of up <strong>to</strong> 5 variables, or <strong>the</strong> system<br />

vn=variable name exp=expression vnp=var name/position nn=number


6.24 <strong>PPL</strong>: Functions and System Variables<br />

value .FILE. (meaning <strong>the</strong> current file). Changing values of <strong>the</strong> variable or variables define different<br />

subgroups.<br />

RECODE (exp, recode instructions )<br />

recodes <strong>the</strong> character or numeric variable specified in <strong>the</strong> expression according <strong>to</strong> <strong>the</strong> instructions given<br />

in <strong>the</strong> second argument:<br />

SET Height =<br />

RECODE ( Height, 0 TO 65 = 1, 65.1 TO 100 = 2, G = 3) ;<br />

All values of height from 0 through 65 become 1, and all values from 65.1 through 100 become 2. Any<br />

o<strong>the</strong>r GOOD values become 3. (See <strong>the</strong> fourth <strong>PPL</strong> chapter for a full explanation of RECODE.)<br />

<strong>PPL</strong> System Variables<br />

System variables are variables that are defined and set by P-<strong>STAT</strong>. Their names are enclosed between<br />

decimal points <strong>to</strong> distinguish <strong>the</strong>m from user-defined variables. P-<strong>STAT</strong> au<strong>to</strong>matically sets <strong>the</strong> values of<br />

<strong>the</strong> system variables as a run progresses. Usually <strong>the</strong> values may not be changed by users, but <strong>the</strong>y may<br />

be accessed, tested and assigned <strong>to</strong> o<strong>the</strong>r variables. (System variables used especially in titles are fur<strong>the</strong>r<br />

described in <strong>the</strong> TITLES chapter.)<br />

.CHARACTER.<br />

.DATE.<br />

.e.<br />

.FILE.<br />

.G.<br />

.HERE.<br />

.M.<br />

is <strong>the</strong> list of character variables in a file. .CHARACTER. is used in KEEP and DROP selections:<br />

DROP .CHARACTER.;<br />

is <strong>the</strong> current date. Its value is set when <strong>the</strong> current command begins, and it is in character form:<br />

GENERATE Today:C = .DATE. ;<br />

It is equivalent <strong>to</strong> .CDATE. (<strong>the</strong> command date).<br />

is <strong>the</strong> system value for e, <strong>the</strong> base of natural logs. It equals 2.718281828.<br />

is <strong>the</strong> current P-<strong>STAT</strong> system file. Its value is <strong>the</strong> name of that file. It is used as <strong>the</strong> argument for <strong>the</strong><br />

functions FIRST and LAST, and also in titles.<br />

is a good or non-missing variable value. It tests whe<strong>the</strong>r good data is present in an expression:<br />

IF Test.Score EQ .G., RETAIN;<br />

is <strong>the</strong> count of <strong>the</strong> number of cases actually processed thus far by <strong>the</strong> current <strong>PPL</strong> clause.<br />

is a missing or non-good variable value. It is used <strong>to</strong> test whe<strong>the</strong>r missing data is present in an expression.<br />

.M. refers collectively <strong>to</strong> all three types of missing; it is <strong>the</strong> opposite of .G. (above).<br />

vnp=var name/position nn=number vn=variable name exp=expression


<strong>PPL</strong>: Functions and System Variables 6.25<br />

.M1., .M2., .M3.<br />

.N.<br />

.NEW.<br />

.NUMERIC.<br />

.NV.<br />

.ON.<br />

.OTHERS.<br />

.PAGE.<br />

.PI.<br />

.PUT.<br />

.TIME.<br />

are missing variable values of three types: MISSING1, MISSING2 and MISSING3. .M1., .M2. and .M3.<br />

are used for logical testing within an IF phrase and for recoding.<br />

is <strong>the</strong> case counter. Its value is <strong>the</strong> current case number after case (row) selection.<br />

are all variables newly generated in all <strong>PPL</strong> clauses in <strong>the</strong> current phrase. Its value is all of <strong>the</strong> names of<br />

<strong>the</strong>se new variables. .NEW. is used in KEEP and DROP selections:<br />

GEN Average = MEAN.GOOD ( Value? ) ;<br />

GEN Total = SUM.GOOD ( Value? ) ;<br />

KEEP ID .NEW. .OTHERS. ;<br />

is <strong>the</strong> list of numeric variables in a file. .NUMERIC. is used in KEEP and DROP selections:<br />

KEEP .NUMERIC. ;<br />

is <strong>the</strong> current number of variables in <strong>the</strong> file.<br />

is used in case and variable selection and in DO loops <strong>to</strong> indicate from here onward through <strong>the</strong> last case<br />

or variable:<br />

DO #J USING 1 .ON. ;<br />

IF V(#J) GOOD, SET V(#J) = V(#J)/10 );<br />

ENDDO;<br />

are all variables o<strong>the</strong>r than those explicitly referenced in a KEEP or DROP selection. It is used in reordering<br />

variables:<br />

KEEP SS.Number Department .OTHERS. Final.Grade;<br />

is <strong>the</strong> current page number since <strong>the</strong> command began. It is equivalent <strong>to</strong> .CPAGE. .RPAGE. is <strong>the</strong> current<br />

page number since <strong>the</strong> run or P-<strong>STAT</strong> session began.<br />

is <strong>the</strong> system value for pi. It equals 3.141592654.<br />

is <strong>the</strong> PUT counter. Its value is <strong>the</strong> number of times PUT was invoked in <strong>the</strong> current case.<br />

is <strong>the</strong> current time. Its value is set when <strong>the</strong> current command begins, and it is in character form:<br />

GENERATE Time:C = .TIME.;<br />

It is equivalent <strong>to</strong> .CTIME. (<strong>the</strong> command time).<br />

vn=variable name exp=expression vnp=var name/position nn=number


6.26 <strong>PPL</strong>: Functions and System Variables<br />

.USED.<br />

is <strong>the</strong> number of cases used after all <strong>PPL</strong> clauses are processed. The count does not include cases that<br />

are deleted because of logical tests.<br />

O<strong>the</strong>r Date and Time System Variables.<br />

The system variables .DATE. and .TIME. may be prefaced with N, X, R or C:<br />

.NDATE. .NTIME.<br />

.XDATE. .NXDATE. .XTIME. .NXTIME.<br />

.RDATE. .NRDATE. .RTIME. .NRTIME.<br />

.CDATE. .NCDATE. .CTIME. .NCTIME.<br />

The N specifies <strong>the</strong> numeric form of <strong>the</strong> date or time, ra<strong>the</strong>r than <strong>the</strong> character form. The X specifies <strong>the</strong><br />

exact date or time when <strong>the</strong> system variable is processed. The R specifies <strong>the</strong> run date or time — when<br />

<strong>the</strong> current run began. The C specifies <strong>the</strong> command date or time — when <strong>the</strong> current command began.<br />

The numeric form of exact, run, and command dates or times may also be specified. The dates and times<br />

are printed as <strong>the</strong>y are represented in <strong>the</strong> computer system on which P-<strong>STAT</strong> is being used.<br />

Note: The numeric forms of <strong>the</strong> date now all have <strong>the</strong> year returned as 4 digits in preparation for <strong>the</strong> year<br />

2000.<br />

vnp=var name/position nn=number vn=variable name exp=expression


7<br />

Random Number and<br />

Distribution Functions<br />

This chapter covers three different groups of functions: random number functions; distribution functions and functions<br />

which can be used <strong>to</strong> handle <strong>the</strong> “fuzzy equals” problem.l<br />

7.1 RANDOM NUMBER FUNCTIONS<br />

The <strong>PPL</strong> functions, RANNORM, RANUNI, RANBIN and RANTABLE, generate random (“pseudo” random)<br />

numbers from, respectively, <strong>the</strong> normal distribution, <strong>the</strong> uniform distribution, <strong>the</strong> binomial distribution and a user's<br />

tabled distribution. The random numbers may be used for many purposes, such as generating random data,<br />

selecting a random subset of cases from a file or assigning cases <strong>to</strong> ei<strong>the</strong>r a control or experimental treatment. Examples<br />

illustrating <strong>the</strong>se tasks follow <strong>the</strong> basic explanations.<br />

In a normal distribution, <strong>the</strong> random numbers are normal deviates (“standard scores”) that range from -6<br />

through +6 and <strong>the</strong> probability of obtaining specific values depends on <strong>the</strong> area under <strong>the</strong> normal curve. In a uniform<br />

or rectangular distribution, <strong>the</strong> random numbers range from zero through one and <strong>the</strong> probability of obtaining<br />

any value equals <strong>the</strong> probability of obtaining any o<strong>the</strong>r value. (The random numbers do not include <strong>the</strong> exact values<br />

zero and one.)<br />

In a binomial distribution, <strong>the</strong> random numbers are observations from a binomial distribution with <strong>the</strong> specified<br />

order — that is, <strong>the</strong>y are integers that range from 0 <strong>to</strong> <strong>the</strong> order of <strong>the</strong> binomial distribution. The probability<br />

depends on <strong>the</strong> likelihood of <strong>the</strong> possible observations and <strong>the</strong> probability of a single event ( a “win”), which is<br />

assumed <strong>to</strong> be .5 unless ano<strong>the</strong>r probability is supplied. In a user's tabled distribution, <strong>the</strong> random numbers are<br />

observations (integers) that range from one <strong>to</strong> <strong>the</strong> order of <strong>the</strong> distribution specified by <strong>the</strong> user. The probability<br />

of <strong>the</strong> various observations is also specified by <strong>the</strong> user.<br />

The arguments for any of <strong>the</strong> random number functions are: 1) an initial seed control argument, 2) three optional<br />

scratch variables, and 3) any function specific arguments. The initial argument controls how <strong>the</strong> seed<br />

functions that prime <strong>the</strong> random number genera<strong>to</strong>r are obtained. NOTE: <strong>the</strong> arguments are initialized at <strong>the</strong> beginning<br />

of <strong>the</strong> command. Except for <strong>the</strong> <strong>PPL</strong> command this is when <strong>the</strong> first case is processed. A BRANCH in<br />

a macro back <strong>to</strong> a location outside of <strong>the</strong> command which is generating <strong>the</strong> numbers causes <strong>the</strong> arguments <strong>to</strong> be<br />

re-initialized. Possible first argument values are:<br />

0 different seed values obtained from <strong>the</strong> current date and time are used<br />

-1 same default seed values are used every time <strong>the</strong> function is used<br />

-3 three seed values are supplied by <strong>the</strong> user as <strong>the</strong> next three arguments<br />

When 0 is specified, three seed values obtained from <strong>the</strong> current date and time are used <strong>to</strong> start <strong>the</strong> number<br />

genera<strong>to</strong>r. The seed values and <strong>the</strong> random numbers <strong>the</strong>y generate differ each time:<br />

RANNORM ( 0 )<br />

When -1 is specified, three default seed values are used — <strong>the</strong>y are <strong>the</strong> same each time one of <strong>the</strong> random number<br />

functions is used:<br />

RANUNI ( -1 )<br />

The argument -1 is used only when <strong>the</strong> same “random” values are desired. This may be <strong>the</strong> case when a specific<br />

procedure involving random numbers must be repeated exactly. When -3 is specified as <strong>the</strong> first argument, three


7.2 Random Number and Distribution Functions<br />

seed values should be supplied as <strong>the</strong> next three arguments. The values should be three constants that are integers<br />

between 1 and 30,000:<br />

RANNORM ( -3, 912, 4508, 7 )<br />

Three scratch variables may be given as <strong>the</strong> next arguments for any of <strong>the</strong> random number functions:<br />

RANUNI ( 0, #S1, #S2, #S3 )<br />

When three scratch variables are supplied, <strong>the</strong> final seed values are saved as <strong>the</strong> values of <strong>the</strong> scratch variables.<br />

Thus, a subsequent run can use <strong>the</strong>se values as starting seeds and continue a progression. The scratch variables<br />

should be generated prior <strong>to</strong> using <strong>the</strong>m. (See <strong>the</strong> second RANTABLE example in <strong>the</strong> final paragraph of this section.)<br />

Finally, any function specific arguments follow — only RANBIN and RANTABLE require <strong>the</strong>se. Here,<br />

<strong>the</strong> “2” is <strong>the</strong> order of <strong>the</strong> binomial distribution:<br />

RANBIN ( -1, 2 )<br />

7.2 Normal and Uniform Distributions<br />

The RANNORM function may be used <strong>to</strong> generate a file of random numbers with a specific mean and standard<br />

deviation. First, a file with one case is built:<br />

MAKE Random, VAR Random.Number ;<br />

- $<br />

Then, that file is modified <strong>to</strong> produce <strong>the</strong> desired number of cases and <strong>to</strong> set <strong>the</strong> values <strong>to</strong> random numbers. The<br />

REPEAT instruction repeats <strong>the</strong> one case 100 times:<br />

MOD Random [<br />

REPEAT 100 ;<br />

SET Random.Number = ( RANNORM (0) * 2.8 ) + 24 ],<br />

OUT RandomX $<br />

The RANNORM function generates a standardized random number, a “Z-score” with mean 0 and standard deviation<br />

1. That number is multiplied by <strong>the</strong> desired standard deviation and <strong>the</strong>n added <strong>to</strong> <strong>the</strong> desired mean.<br />

The RANUNI function is often used <strong>to</strong> select a random sample of cases from a file. This command selects a<br />

random subset of one third of <strong>the</strong> original cases in file Subjects:<br />

MOD Subjects [<br />

GEN #Temp EQ RANUNI (0) ;<br />

IF #Temp LT .333334, RETAIN ], OUT Sub.3 $<br />

These instructions do sampling with replacement — <strong>the</strong>y select a random sample of five cases from a file of 100<br />

cases (<strong>the</strong> same case could be selected more than once):<br />

MOD FileA [<br />

GEN #N = MOD (.N., 100) + 1 ;<br />

IF #N EQ 2, GEN #R = ( RANUNI (0) * 100 ) + 1 ;<br />

IF #N EQ INT (#R), RETAIN ]<br />

+ FileA (*) + FileA (*) + FileA (*) + FileA (*),<br />

OUT FileB $<br />

The scratch variable #N (a pseudo case number) is generated equal <strong>to</strong> <strong>the</strong> MOD of <strong>the</strong> case number plus one <strong>to</strong><br />

get numbers running from 2 <strong>to</strong> 100 followed by 1. (The actual case numbers run from 1 <strong>to</strong> 500 when <strong>the</strong> files are<br />

concatenated using <strong>the</strong> “+” opera<strong>to</strong>r. After <strong>the</strong> MOD function, <strong>the</strong>y run from 1 <strong>to</strong> 99 followed by 0.)<br />

When <strong>the</strong> first case of <strong>the</strong> file is processed (that is, when #N = 2), a random number between zero and one is<br />

generated. It is multiplied by 100 and one is added <strong>to</strong> it. (Random numbers exactly equal <strong>to</strong> zero or one are not<br />

generated. By multiplying by 100 and adding one, <strong>the</strong> range of <strong>the</strong> random numbers shifts from 0-<strong>to</strong>-1 <strong>to</strong> 1-


Random Number and Distribution Functions 7.3<br />

through-100.) If #N equals <strong>the</strong> integer value of <strong>the</strong> random scratch variable, <strong>the</strong> case is selected. The file is read<br />

four more times and <strong>the</strong> same instructions are executed each time.<br />

7.3 Binary and User's Tabled Distributions<br />

The RANBIN function could be used <strong>to</strong> assign cases <strong>to</strong> ei<strong>the</strong>r a control or an experimental treatment group. This<br />

command does this:<br />

MOD Expermt5 [<br />

GEN #Bin = RANBIN ( 0, 2 ) ;<br />

IF #Bin EQ 1, SET Group = 'C', F.SET Group = 'E' ],<br />

OUT Expermt5 $<br />

#Bin is generated equal <strong>to</strong> a random observation from an order 2 binomial distribution — that is, from a binomial<br />

distribution that contains <strong>the</strong> integers 0, 1 and 2 in <strong>the</strong>se proportions .25, .5, and .25. (You could think of this as<br />

<strong>the</strong> distribution obtained when <strong>to</strong>ssing two coins. Zero heads are observed 25% of <strong>the</strong> time, one head 50% of <strong>the</strong><br />

time and two heads 25% of <strong>the</strong> time, when <strong>the</strong> probability of obtaining a head in a single <strong>to</strong>ss is .5.) When RAN-<br />

BIN returns a 1, which it does half <strong>the</strong> time, a case is assigned <strong>to</strong> <strong>the</strong> control group; when it returns a 0 or 2, it is<br />

assigned <strong>to</strong> <strong>the</strong> experimental group.<br />

The RANTABLE function is similar <strong>to</strong> RANBIN, except that <strong>the</strong> probabilities are set by <strong>the</strong> user. This<br />

command:<br />

MOD Expermt6<br />

[ SET Group = RANTABLE ( 0, 1, 2, 2 ) ], OUT Expermt6 $<br />

assigns cases <strong>to</strong> one of three groups, with <strong>the</strong> probability of assignment <strong>to</strong> group one being 1/5, group two 2/5 and<br />

group three 2/5. The arguments for RANTABLE after <strong>the</strong> initial seed control argument give <strong>the</strong> number of values<br />

in <strong>the</strong> distribution and <strong>the</strong> proportions in which <strong>the</strong>y are observed. In this example, <strong>the</strong>re are three function arguments<br />

(1, 2, 2), so <strong>the</strong>re are three values in <strong>the</strong> distribution (1, 2 and 3). The sum of <strong>the</strong> arguments divided by <strong>the</strong><br />

value of a single argument gives <strong>the</strong> proportion of that value in <strong>the</strong> distribution. For example, 1 / (1 + 2 + 2) is<br />

1/5, which is <strong>the</strong> proportion of <strong>the</strong> <strong>to</strong>tal observations that are ones.<br />

This command does <strong>the</strong> same task <strong>the</strong> prior command does, but it sets <strong>the</strong> seed values with <strong>the</strong> three constants<br />

following <strong>the</strong> -3 and saves <strong>the</strong>m in <strong>the</strong> supplied scratch variables:<br />

MOD Expermt6 [<br />

GEN #A = .M., GEN #B = .M., GEN #C = .M. ;<br />

SET Group =<br />

RANTABLE ( -3, 657, 1469, 20078, #A, #B, #C, 1, 2, 2 ) ;<br />

IF LAST ( .FILE. ),<br />

PUT #A > #B > #C ],<br />

OUT Expermt6 $<br />

The initial argument of -3 for RANTABLE specifies that <strong>the</strong> initial seed values are supplied as three constants<br />

The three scratch variables follow. The constants and scratch variables come directly after <strong>the</strong> initial seed control<br />

argument and before <strong>the</strong> function specific arguments. Alternatively, <strong>the</strong> three scratch variables could be generated<br />

equal <strong>to</strong> <strong>the</strong> three initial seed values and those constants could be omitted from <strong>the</strong> RANTABLE arguments.<br />

7.4 DISTRIBUTION FUNCTIONS<br />

Distribution or probability functions return <strong>the</strong> area under a distribution from <strong>the</strong> lower tail of <strong>the</strong> distribution <strong>to</strong><br />

<strong>the</strong> specified critical value. The area is <strong>the</strong> probability that a random value falls below this critical value. Subtracting<br />

this value from one yields <strong>the</strong> significance level for a one-tailed test — that is, <strong>the</strong> percentage of <strong>the</strong><br />

distribution in <strong>the</strong> upper tail. To obtain <strong>the</strong> significance level for a two-tailed test, subtract <strong>the</strong> probability from<br />

one and multiply by two:


7.4 Random Number and Distribution Functions<br />

( 1 - PROBNORM ( ABS (nn), df) ) * 2<br />

Inverse probability functions return <strong>the</strong> critical value corresponding <strong>to</strong> <strong>the</strong> probability or area under <strong>the</strong> distribution<br />

that is supplied as <strong>the</strong> function argument. The critical value is <strong>the</strong> value that must be obtained for<br />

significance at one minus <strong>the</strong> supplied probability.<br />

7.5 Probability Distributions<br />

The probability functions may have expressions as <strong>the</strong>ir arguments. The expressions should reduce <strong>to</strong> one or more<br />

arguments appropriate for <strong>the</strong> function. These are <strong>the</strong> probability functions and <strong>the</strong>ir arguments:<br />

1. PROBBIN ( nn, n, p ) Binomial Distribution<br />

computes <strong>the</strong> probability that a variable from a binomial (Bernoulli) distribution with probability p<br />

and size or degree n is less than or equal <strong>to</strong> <strong>the</strong> first argument nn:<br />

PROBBIN ( 4, 10, .5 ) = .376953125<br />

This is <strong>the</strong> probability of getting four or fewer tails in ten <strong>to</strong>sses of a coin. (This is <strong>the</strong> same as <strong>the</strong><br />

probability of getting six or more heads.) The probability of a single value is <strong>the</strong> difference between<br />

two successive values:<br />

PROBBIN ( 4, 10, .5 ) - PROBBIN ( 3, 10, .5 ) = .205078125<br />

This is <strong>the</strong> probability of getting exactly four tails in ten <strong>to</strong>sses of a coin. (This is <strong>the</strong> same as <strong>the</strong><br />

probability of getting exactly six heads.)<br />

2. PROBCHI ( nn, df ) Chi-square Distribution<br />

computes <strong>the</strong> probability that a random variable from a chi-square distribution with degrees of freedom<br />

df is less than <strong>the</strong> specified argument:<br />

PROBCHI ( 31.264, 11 ) = .999<br />

Degrees of freedom must be an integer.<br />

3. PROBF ( nn, df1, df2 ) F Distribution<br />

computes <strong>the</strong> probability that a variable from an F distribution with numera<strong>to</strong>r degrees of freedom<br />

df1 and denomina<strong>to</strong>r degrees of freedom df2 is less than <strong>the</strong> specified argument:<br />

PROBF ( 3.32, 2, 30 ) = .950170464<br />

Degrees of freedom may be a whole or fractional number.<br />

4. PROBNORM ( nn ) Normal Distribution<br />

computes <strong>the</strong> probability that a random variable from a normal distribution is less than <strong>the</strong> specified<br />

argument:<br />

PROBNORM ( -1.96 ) = .02499789530314<br />

The critical value -1.96 is significant at <strong>the</strong> .025 level for a one-tail test and at <strong>the</strong> .05 level for a twotail<br />

or non-directional test.<br />

The argument for PROBNORM should be a deviate from a normal distribution with a mean of zero<br />

and standard deviation of one — that is, a standard score between -6 and +6.<br />

5. PROBPOIS ( nn, lambda ) Poisson Distribution<br />

computes <strong>the</strong> probability that a variable from a Poisson distribution is less than or equal <strong>to</strong> <strong>the</strong> first<br />

argument. Lambda is <strong>the</strong> mean of <strong>the</strong> distribution. The mean in this example is 1.12 — it is <strong>the</strong>


Random Number and Distribution Functions 7.5<br />

number of defects per length of material:<br />

PROBPOIS ( 2, 1.12 ) = .896355852<br />

PROBPOIS ( 2, 1.12 ) - PROBPOIS (1, 1.12 ) = .204642687<br />

.8964 is <strong>the</strong> probability of finding two or fewer defects in a length of material. .2046 is <strong>the</strong> probability<br />

of finding exactly 2 defects.<br />

6. PROBT ( nn, df ) t Distribution<br />

computes <strong>the</strong> probability that a random variable from a t distribution is less than <strong>the</strong> first argument<br />

nn when degrees of freedom equal <strong>the</strong> second argument df. This is <strong>the</strong> probability that a random<br />

variable is less than 2.179:<br />

PROBT ( 2.179, 12 ) = .975008377<br />

The significance level for a two-tail test is 1 minus <strong>the</strong> probability times 2:<br />

(1 - PROBT ( 2.179, 12 ) ) * 2 ) = .049983245959<br />

A critical value of 2.179 is significant at <strong>the</strong> .025 level for a one-tail test and at <strong>the</strong> .05 level for a<br />

two-tail test (.025 in each tail) when <strong>the</strong> degrees of freedom are 12.<br />

The first argument for PROBT should be a deviate or critical value from student's t distribution with<br />

a mean of zero and standard deviation of one. The degrees of freedom may be a whole or fractional<br />

number.<br />

7.6 Inverse Probability Distributions<br />

The inverse probability functions may have expressions as <strong>the</strong>ir arguments. However, <strong>the</strong> expressions should reduce<br />

<strong>to</strong> one or more arguments appropriate for <strong>the</strong> function. These are <strong>the</strong> inverse probability functions and <strong>the</strong>ir<br />

arguments:<br />

1. INVBIN ( nn, n, p ) Inverse Binomial Distribution<br />

INVBIN.RT ( nn, n, p ) Inverse Binomial Distribution — Right Tail<br />

returns <strong>the</strong> observation from <strong>the</strong> binomial distribution with probability p and size or degree n whose<br />

area is nn:<br />

INVBIN ( .38, 10, .5 ) = 4<br />

INVBIN.RT ( .38, 10, .5 ) = 6<br />

Approximately 38% of <strong>the</strong> time, when <strong>to</strong>ssing 10 coins, you will get 4 or fewer tails and 6 or more<br />

heads. INVBIN.RT returns an observation from <strong>the</strong> right tail of <strong>the</strong> binomial distribution. INVBIN<br />

is <strong>the</strong> inverse of <strong>the</strong> PROBBIN function.<br />

2. INVCHI ( nn, df ) Inverse Chi-Square Distribution<br />

returns <strong>the</strong> critical value from <strong>the</strong> chi-square distribution with degrees of freedom df and whose area<br />

is <strong>the</strong> argument nn:<br />

INVCHI ( .999, 11 ) = 31.2641339<br />

INVCHI is <strong>the</strong> inverse of <strong>the</strong> PROBCHI function.<br />

3. INVF ( nn, df1, df2 ) Inverse F Distribution<br />

returns <strong>the</strong> critical value from <strong>the</strong> F distribution with degrees of freedom df1 and df2 whose area is<br />

<strong>the</strong> argument nn:


7.6 Random Number and Distribution Functions<br />

INVF ( .95, 2, 30 ) = 3.315829544<br />

INVF is <strong>the</strong> inverse of <strong>the</strong> PROBF function.<br />

4. INVNORM ( nn ) Inverse Normal or Probit Distribution<br />

returns <strong>the</strong> deviate or critical value from <strong>the</strong> normal distribution whose area is <strong>the</strong> specified argument.<br />

PROBIT is a synonym:<br />

PROBIT ( 0.025 ) = -1.959964<br />

A critical value of -1.96 or less is required for a one-tail test with a significance level of .025, or a<br />

value of 1.96 or greater is required if a difference in <strong>the</strong> opposite direction is expected. For a twotail<br />

or non-directional test with a significance level of .05, a critical value of -1.96 or less or 1.96 or<br />

more is required (.025 in each of <strong>the</strong> two tails).<br />

The argument for INVNORM is an area, measured from <strong>the</strong> lower tail of <strong>the</strong> normal distribution,<br />

that is <strong>the</strong> probability of obtaining a value less than <strong>the</strong> calculated deviate. It should be a number<br />

between 0 and 1. This function is <strong>the</strong> inverse of PROBNORM.<br />

5. INVPOIS ( nn, lambda ) Inverse Poisson Distribution<br />

INVPOIS.RT ( nn, lambda ) Inverse Poisson Distribution — Right Tail<br />

returns <strong>the</strong> observation from <strong>the</strong> Poisson distribution with mean lambda whose area is <strong>the</strong> argument<br />

nn:<br />

INVPOIS ( .9, 1.12 ) = 2<br />

Approximately 90% of <strong>the</strong> time, 2 or fewer defects will be found in a unit length of material with<br />

1.12 defects per unit. INVPOIS.RT returns an observation from <strong>the</strong> right tail of <strong>the</strong> Poisson distribution.<br />

INVPOIS is <strong>the</strong> inverse of <strong>the</strong> PROBPOIS function.<br />

6. INVT ( nn, df ) Inverse t Distribution<br />

returns <strong>the</strong> critical value from <strong>the</strong> t distribution with degrees of freedom df whose area is <strong>the</strong> argument<br />

nn:<br />

INVT ( .975, 12 ) = 2.178812725<br />

A critical value of 2.179 is required for a one-tail test with a significance level of .025 or a two-tail<br />

test with a significance level of .05. INVT is <strong>the</strong> inverse of <strong>the</strong> PROBT function.<br />

7.7 THE FUZZY EQUALS PROBLEM<br />

The internal representation of fractional decimal numbers in a binary computer can be exact for numbers (like .5<br />

or .75) that can be expressed as sums of reciprocals of powers of two. This is true up <strong>to</strong> a point: .5 + 1/2**53 is<br />

accurate on a pentium chip (which uses 53 bits <strong>to</strong> represent <strong>the</strong> fractional part), but .5 + 1/2**54 and beyond would<br />

not be accurately represented.<br />

Most fractional numbers however cannot be represented accurately. Computation involving <strong>the</strong>m is consequently<br />

approximate. It is quite possible for two different sequences of calculation that ‘should’ produce <strong>the</strong> same result<br />

<strong>to</strong> instead produce results that differ slightly, perhaps by one bit, sometimes by several.<br />

For example, consider this P-<strong>STAT</strong> statement.<br />

IF .1 + .2 EQ .3, PUT ‘YES’, F.PUT ‘NO’ $<br />

This ought <strong>to</strong> say YES, but on a Pentium PC it says NO because <strong>the</strong>y are not quite <strong>the</strong> same: a HEX display of <strong>the</strong><br />

result of adding .1 and .2 is one bit different from a HEX display of .3, and a one-bit difference prevents an equal<br />

result. This is not a P-<strong>STAT</strong> effect: exactly <strong>the</strong> same thing occurs in a trivial C or Fortran 95 program.


Random Number and Distribution Functions 7.7<br />

There may be situations when a FUZZY compare ra<strong>the</strong>r than an EXACT compare is appropriate. An exact compare<br />

returns equal only when <strong>the</strong> two numbers being compared are exactly <strong>the</strong> same. A fuzzy compare would<br />

accept as equal two numbers that are VERY close. The question is: how close ?<br />

Logical opera<strong>to</strong>rs like EQ and GT now have optional extensions like EQ.2 or GT.5 which cause <strong>the</strong> compare <strong>to</strong><br />

be fuzzy. For example, using EQ.2 ra<strong>the</strong>r than just EQ will treat two numbers as equal if <strong>the</strong>y are no more than<br />

two steps apart.<br />

We use ‘step’ <strong>to</strong> mean moving from a given 64-bit double-precision floating-point number <strong>to</strong> <strong>the</strong> next representable<br />

number. An upwards step from 0.1 is slightly more than 0.1, a downwards step is slightly less.<br />

A step can best be seen by using HEX notation. The HEX representation of <strong>the</strong> 64-bit value 0.1 is 3FB9 9999<br />

9999 999A. Each HEX character represents 4 bits; <strong>the</strong> characters 0-9 and A-F are used <strong>to</strong> show <strong>the</strong> 16 possible<br />

forms of 4 bits. Note: <strong>the</strong> actual 64-bit internal representation of 0.1 may differ slightly on computers using differing<br />

chips and compilers.<br />

The ending ‘A’ shows that <strong>the</strong> last 4 bits of 0.1 are 1010. The value one STEP.UP from 0.1 would be one bit<br />

greater; in this case it would have <strong>the</strong> same initial 15 bytes, and <strong>the</strong> final byte would be 1011, one bit more. A<br />

step affects <strong>the</strong> 15th or 16th significant digit on a Pentium type of chip. For example, it takes 2 steps <strong>to</strong> go<br />

from 30.11122233344411<br />

<strong>to</strong> 30.11122233344412 which differs in <strong>the</strong> 16th decimal digit.<br />

7.8 The Fuzzy Functions<br />

Four new functions have been added <strong>to</strong> manipulate such numbers.<br />

1. HEX ( number ) produces <strong>the</strong> HEX representation of <strong>the</strong> input in a character*16<br />

result.<br />

2. STEP.UP ( number, n ) produces <strong>the</strong> number that is N steps up from <strong>the</strong> input value. The<br />

second argument, <strong>the</strong> number of steps, can be from zero <strong>to</strong> 9999. If<br />

omitted, it defaults <strong>to</strong> one.<br />

3. STEP.DOWN( number, n ) produces <strong>the</strong> number that is N steps down from <strong>the</strong> input value. The<br />

second argument, <strong>the</strong> number of steps, can be from zero <strong>to</strong> 9999. If<br />

omitted, it defaults <strong>to</strong> one.<br />

4. STEPS ( nn1, nn2 ) produces <strong>the</strong> number of steps from <strong>the</strong> smaller of NN1 and NN2 <strong>to</strong><br />

<strong>the</strong> larger. Missing 3 is returned if more than one million steps separate<br />

<strong>the</strong> arguments.<br />

put ( HEX( .1 ))$ is ‘3FB999999999999A’<br />

put ( HEX( STEP.UP(.1 )))$ is ‘3FB999999999999B’<br />

put ( HEX( STEP.UP(.1, 2)))$ is ‘3FB999999999999C’<br />

put ( STEPS ( STEP.DOWN(.1), STEP.UP(.1) ))$ is 2<br />

7.9 Fuzzy Logical Opera<strong>to</strong>rs<br />

There are 6 logical opera<strong>to</strong>rs: GT, GE, EQ, NE, LE and LT. GT means greater than, EQ means equals, and<br />

so forth.<br />

There are also 6 eXact versions: XGT, XGE, XEQ, XNE, XLE and XLT. XEQ causes <strong>the</strong> compare of character<br />

values <strong>to</strong> be case-specific, whereas EQ is case-independent. For numeric compares, EQ and XEQ will by default<br />

do exact (non-fuzzy) compares. However, <strong>the</strong> EQ and GT type of opera<strong>to</strong>rs can be directed <strong>to</strong> do fuzzy compares.<br />

For numeric compares, <strong>the</strong> EQ opera<strong>to</strong>rs can be made <strong>to</strong> do fuzzy compares in two ways.<br />

1. EQ.2 or GT.5 or such can be used <strong>to</strong> cause a fuzzy compare of that many steps. The step part can<br />

be from 0 <strong>to</strong> 99, with 0 meaning no steps. EQ.2 is treated as a simple EQ when <strong>the</strong> compare involves<br />

character values.


7.8 Random Number and Distribution Functions<br />

2. FUZZ 5 $ is a new command that causes later use of <strong>the</strong> EQ type of logical opera<strong>to</strong>rs <strong>to</strong> use that<br />

many steps. It is ignored for character compares, and does not affect <strong>the</strong> XEQ type of opera<strong>to</strong>rs. It<br />

is also ignored for an opera<strong>to</strong>r like EQ.3 that already has a specific stepsize.<br />

In o<strong>the</strong>r words, <strong>the</strong> step count of 3 in EQ.3 has precedence over any current FUZZ command setting. Fuzz 0 $<br />

would turn it off.<br />

7.10 How Fuzzy Opera<strong>to</strong>rs Work<br />

Consider<br />

IF aaa EQ.2 bbb.<br />

The above test will be true whenever AAA is ei<strong>the</strong>r equal <strong>to</strong> BBB or within 2 steps of BBB (it does not matter<br />

which is <strong>the</strong> larger).<br />

The following 5 lines would do exactly <strong>the</strong> same thing:<br />

Consider<br />

IF STEP.DOWN(aaa, 2) XEQ bbb or<br />

STEP.DOWN(aaa ) XEQ bbb or<br />

aaa XEQ bbb or<br />

STEP.UP (aaa ) XEQ bbb or<br />

STEP.UP (aaa, 2) XEQ bbb<br />

IF aaa GT.5 bbb.<br />

It is first determined if AAA and BBB are ‘equal’, which in this case means no more than 5 steps apart in ei<strong>the</strong>r<br />

direction. Since <strong>the</strong> GT test is true only when (1) AAA is greater and (2) <strong>the</strong>y are not equal, AAA must be more<br />

than 5 steps greater than BBB for a true result <strong>to</strong> occur.<br />

In <strong>the</strong> first of <strong>the</strong>se next two statements, <strong>the</strong> values being compared are not equal, so a GT result can be true. In<br />

<strong>the</strong> second, <strong>the</strong> GT.1 test has enough fuzz <strong>to</strong> cause <strong>the</strong> two values <strong>to</strong> be considered <strong>to</strong> be equal, so one cannot be<br />

greater.<br />

IF STEP.UP( 999 ) GT.0 999 will be true,<br />

IF STEP.UP( 999 ) GT.1 999 will be false.<br />

Thus, aaa GT.5 bbb asks if AAA is more than 5 steps greater than BBB. The o<strong>the</strong>r opera<strong>to</strong>rs work in a similar<br />

manner.<br />

7.11 FUZZY Summary<br />

The GT, GE, EQ, NE, LE and LT logical opera<strong>to</strong>rs have always done exact compares on numeric values; <strong>the</strong> default<br />

has not changed.<br />

These 6 opera<strong>to</strong>rs have been extended: EQ.3 for example will return an equal result if <strong>the</strong> two values being compared<br />

are separated by no more than 3 steps. A step is <strong>the</strong> distance from one internally representable number <strong>to</strong><br />

<strong>the</strong> next one.<br />

The step part (<strong>the</strong> .3 in EQ.3) can be from 0 <strong>to</strong> 99. Using 5 steps should be sufficient <strong>to</strong> cover random differences.<br />

NEAR is supported as a more readable form of EQ.5 . Similarly, NOTNEAR means NE.5 .<br />

A new command, FUZZ 3 $ or such, causes subsequent use of EQ, etc. <strong>to</strong> do fuzzy compares of that many steps<br />

au<strong>to</strong>matically. However, this does NOT change an explicitly supplied step like GT.0 or EQ.1 .<br />

Using FUZZ 2 $ or such might be useful when pages of <strong>PPL</strong> are involved and you want <strong>to</strong> quickly see if fuzz<br />

makes a difference.<br />

XGT, XGE, XEQ, XNE, XLE and XLT can still be used in numeric compares. They always do an exact (nonfuzzy)<br />

compare. In o<strong>the</strong>r words, XEQ and EQ.0 are <strong>the</strong> same.


Random Number and Distribution Functions 7.9<br />

SUMMARY<br />

<strong>PPL</strong> Functions: Numeric — Random Numbers<br />

The number of arguments for <strong>the</strong> random number functions depends on how <strong>the</strong>y are used. There may<br />

be from one <strong>to</strong> three types of arguments: 1) a required initial seed control argument, 2) three optional<br />

scratch variables, and 3) any function specific arguments. The initial seed control argument is one of<br />

<strong>the</strong>se constants: 0, -1 or -3. When it is 0, three seed values from <strong>the</strong> current date and time are used <strong>to</strong><br />

start <strong>the</strong> random number genera<strong>to</strong>r. When it is -1, three default seed values that are <strong>the</strong> same every time<br />

are used. When it is -3, three constants <strong>to</strong> be used as <strong>the</strong> initial seed values should follow.<br />

Three scratch variables may be supplied next. When <strong>the</strong>y are supplied, <strong>the</strong> final seed values are saved<br />

as <strong>the</strong> values of <strong>the</strong> scratch variables. They may be used as initial seeds at a future time <strong>to</strong> continue a<br />

progression. Any function specific arguments come last.<br />

RANBIN (nn, nn, nn, nn, #vn, #vn, #vn, nn, p)<br />

generates random observations from a binomial distribution with <strong>the</strong> order and probability specified as<br />

<strong>the</strong> right-most arguments. When <strong>the</strong> probability is .5, it need not be given:<br />

[ GEN Obs = RANBIN (0, 2) ;<br />

IF Obs EQ 1, SET Group = 1, F.SET Group = 2 ]<br />

The GEN instruction generates observations from a binomial distribution of order 2 and probability .5 —<br />

that is, with <strong>the</strong> integers 0, 1 and 2 in <strong>the</strong> proportions .25, .5 and .25. For example, this is <strong>the</strong> distribution<br />

of heads (or tails) obtained when <strong>to</strong>ssing two coins. The IF statement tests <strong>the</strong> value of <strong>the</strong> random number<br />

and assigns group membership, with 50% in each group.<br />

RANNORM (nn, nn, nn, nn, #vn, #vn, #vn)<br />

generates random numbers from <strong>the</strong> normal distribution:<br />

GEN Random = (RANNORM (0) * 2.5) + 43.6 ;<br />

The random numbers are standard scores that range from -6 through +6 and <strong>the</strong> probability of obtaining<br />

specific values depends on <strong>the</strong> area under <strong>the</strong> normal curve. The example above generates random numbers<br />

with a standard deviation of 2.5 and a mean of 43.6.<br />

RANTABLE (nn, nn, nn, nn, #vn, #vn, #vn, nn, nn, nn)<br />

generates random observations from a user's tabled distribution. The values and <strong>the</strong> probabilities of each<br />

are given as <strong>the</strong> right-most arguments:<br />

GEN Section = RANTABLE (0, 15, 5, 10, 20) ;<br />

This instruction generates <strong>the</strong> random section numbers 1, 2, 3 and 4 because four arguments are supplied<br />

(not counting <strong>the</strong> initial seed control argument). They are generated in <strong>the</strong> following proportions: 15/50<br />

= .3, 5/50 = .1, 10/50 = .2 and 20/50 = .4. (The arguments are summed <strong>to</strong> get <strong>the</strong> <strong>to</strong>tal, and <strong>the</strong> value of<br />

each argument is <strong>the</strong> proportion of <strong>the</strong> <strong>to</strong>tal desired for that value.)<br />

RANUNI (nn, nn, nn, nn, #vn, #vn, #vn)<br />

generates random numbers from a uniform distribution. The random numbers range from zero <strong>to</strong> one<br />

and <strong>the</strong> probability of obtaining any value equals <strong>the</strong> probability of obtaining any o<strong>the</strong>r value. The result<br />

can be multiplied by a constant <strong>to</strong> change <strong>the</strong> range of <strong>the</strong> generated values. A random subset of cases<br />

may be selected using RANUNI:


7.10 Random Number and Distribution Functions<br />

GEN #Random EQ RANUNI (-1) ;<br />

IF #Random LE .7, RETAIN ;<br />

These instructions do <strong>the</strong> same things as <strong>the</strong> previous ones, but <strong>the</strong>y also set and save <strong>the</strong> seed values:<br />

GEN #A = .M., GEN #B = .M., GEN #C = .M. ;<br />

GEN #Random = RANUNI (-3, 257,25,8004, #A,#B,#C ) ;<br />

IF #Random LE .7, RETAIN ;<br />

IF LAST (.FILE.), PUT #A ' ' #B ' ' #C ;<br />

<strong>PPL</strong> Functions: Numeric — Probability<br />

The following probability functions require one or more expressions as <strong>the</strong>ir arguments. Each expression<br />

should reduce <strong>to</strong> <strong>the</strong> argument appropriate for <strong>the</strong> function.<br />

PROBBIN (nn, n, p)<br />

computes <strong>the</strong> probability that a variable from a binomial (Bernoulli) distribution with probability p and<br />

size or degree n is less than or equal <strong>to</strong> <strong>the</strong> first argument nn (that is, has nn or fewer successes in n trials).<br />

The probability of a single value is <strong>the</strong> difference between two successive values:<br />

PROBBIN ( 4, 10, .5 ) - PROBBIN ( 3, 10, .5 ) = .205078125<br />

.205 is <strong>the</strong> probability of getting exactly four tails in ten <strong>to</strong>sses of a coin.<br />

PROBCHI (nn, df)<br />

computes <strong>the</strong> probability that a random variable from a chi-square distribution with degrees of freedom<br />

df is less than <strong>the</strong> specified argument. Degrees of freedom must be an integer.<br />

PROBF (nn, df1, df2)<br />

computes <strong>the</strong> probability that a variable from an F distribution with numera<strong>to</strong>r degrees of freedom df1<br />

and denomina<strong>to</strong>r degrees of freedom df2 is less than <strong>the</strong> specified argument.<br />

PROBNORM (nn)<br />

computes <strong>the</strong> probability that a random variable is less than <strong>the</strong> specified argument. The argument should<br />

be a deviate from a normal distribution with a mean of zero and standard deviation of one — that is, it<br />

should be a standard score between -6 and +6. This is <strong>the</strong> probability that a random variable from a normal<br />

distribution is less than 1.96:<br />

PROBNORM ( 1.96 ) = .975002105<br />

For <strong>the</strong> significance level of a two-tail test, multiply 1 minus <strong>the</strong> probability of <strong>the</strong> absolute value of <strong>the</strong><br />

deviate by 2:<br />

( 1 - PROBNORM ( ABS( -1.96) ) ) * 2 ) = .04999579060628<br />

PROBPOIS (nn, lambda)<br />

computes <strong>the</strong> probability that a variable from a Poisson distribution is less than or equal <strong>to</strong> <strong>the</strong> specified<br />

argument. Lambda is <strong>the</strong> mean of <strong>the</strong> distribution.<br />

PROBT (nn, df)<br />

computes <strong>the</strong> probability that a random variable is less than <strong>the</strong> first argument when degrees of freedom<br />

equal <strong>the</strong> second argument. The first argument should be a deviate from student's t distribution with a<br />

mean of zero and standard deviation of one. The degrees of freedom may be a whole or fractional<br />

number.


Random Number and Distribution Functions 7.11<br />

<strong>PPL</strong> Functions: Numeric — Inverse Probability<br />

The following inverse probability functions require one or more expressions as <strong>the</strong>ir arguments. Each<br />

expression should reduce <strong>to</strong> <strong>the</strong> argument appropriate for <strong>the</strong> function.<br />

INVBIN (nn, n, p)<br />

returns <strong>the</strong> observation from <strong>the</strong> binomial distribution with probability p and size or degree n whose area<br />

is nn:<br />

INVBIN ( .38, 10, .5 ) = 4<br />

Approximately 38% of <strong>the</strong> time, when <strong>to</strong>ssing 10 coins, 4 or fewer will be tails. INVBIN.RT returns an<br />

observation from <strong>the</strong> right tail of <strong>the</strong> distribution. INVBIN is <strong>the</strong> inverse of <strong>the</strong> PROBBIN function.<br />

INVCHI (nn, df)<br />

returns <strong>the</strong> critical value from <strong>the</strong> chi-square distribution with degrees of freedom df whose area is nn.<br />

INVCHI is <strong>the</strong> inverse of <strong>the</strong> PROBCHI function.<br />

INVF (nn, df1, df2)<br />

returns <strong>the</strong> critical value from <strong>the</strong> F distribution with degrees of freedom df1 and df2 whose area is <strong>the</strong><br />

argument nn. INVF is <strong>the</strong> inverse of <strong>the</strong> PROBF function.<br />

INVNORM (nn)<br />

returns <strong>the</strong> deviate or critical value from <strong>the</strong> normal distribution whose area is <strong>the</strong> specified argument.<br />

The area, measured from <strong>the</strong> lower tail of <strong>the</strong> normal distribution, is a number between 0 and 1 that is <strong>the</strong><br />

probability of obtaining a value less than <strong>the</strong> calculated deviate. PROBIT is a synonym for INVNORM:<br />

PROBIT ( .95 ) = 1.644853628<br />

A critical value of approximately 1.64 is required for a significance level of 5% for a one-tail test. This<br />

function is <strong>the</strong> inverse of PROBNORM.<br />

INVPOIS (nn, lambda)<br />

returns <strong>the</strong> observation from <strong>the</strong> Poisson distribution with mean lambda whose area is nn. INVPOIS is<br />

<strong>the</strong> inverse of <strong>the</strong> PROBPOIS function. INVPOIS.RT returns an observation from <strong>the</strong> right tail of <strong>the</strong><br />

distribution.<br />

INVT (nn, df)<br />

returns <strong>the</strong> critical value from <strong>the</strong> t distribution with degrees of freedom df whose area is nn. INVT is<br />

<strong>the</strong> inverse of <strong>the</strong> PROBT function.<br />

<strong>PPL</strong> Functions: Fuzzy Numeric<br />

HEX ( nn )<br />

produces <strong>the</strong> HEX representation of <strong>the</strong> input in a character*16 result.<br />

STEP.UP ( nn, n )<br />

produces <strong>the</strong> number that is N steps up from <strong>the</strong> input value. The second argument, <strong>the</strong> number of steps,<br />

can be from zero <strong>to</strong> 9999. If omitted, it defaults <strong>to</strong> one.


7.12 Random Number and Distribution Functions<br />

STEP.DOWN ( nn, n )<br />

produces <strong>the</strong> number that is N steps down from <strong>the</strong> input value. The second argument, <strong>the</strong> number of<br />

steps, can be from zero <strong>to</strong> 9999. If omitted, it defaults <strong>to</strong> one.<br />

STEPS ( n, n )<br />

produces <strong>the</strong> number of steps from <strong>the</strong> smaller of NN1 and NN2 <strong>to</strong> <strong>the</strong> larger. Missing 3 is returned if<br />

more than one million steps separate <strong>the</strong> arguments.<br />

EQ / NE / LT / LE / GT / GE<br />

EQ.2 or GT.5 or such can be used <strong>to</strong> cause a fuzzy compare of that many steps. The step part can be from<br />

0 <strong>to</strong> 99, with 0 meaning no steps. EQ.2 is treated as a simple EQ when <strong>the</strong> compare involves character<br />

values.<br />

FUZZ 5 $ is a new command that causes later use of <strong>the</strong> EQ type of logical opera<strong>to</strong>rs <strong>to</strong> use that many<br />

steps. It is ignored for character compares, and does not affect <strong>the</strong> XEQ type of opera<strong>to</strong>rs. It is also ignored<br />

for an opera<strong>to</strong>r like EQ.3 that already has a specific step size.<br />

NEAR is a synonym for EQ. NOTNEAR is a synonym for NE. NEAR and NOTNEAR can be used<br />

after <strong>the</strong> FUZZ command has set a fuzz level.


8<br />

<strong>PPL</strong>:<br />

Across-Case Modifications<br />

Changes and summary statistics on groups of related cases are produced by data modification and aggregation<br />

across cases. Related cases are groups of cases in a file that is ordered by one or more variables defining group<br />

membership. For example, cases having <strong>the</strong> same value of a key variable such as Household.Number could be<br />

grouped <strong>to</strong>ge<strong>the</strong>r in <strong>the</strong> file. They are related by <strong>the</strong>ir common values of Household.Number. Across-case modifications<br />

use:<br />

• variables that exist across cases <strong>to</strong> hold accumulated values, and<br />

• functions <strong>to</strong> identify particular cases within a group of related cases.<br />

Scratch variables and <strong>the</strong> permanent vec<strong>to</strong>r permit <strong>the</strong> incrementing and saving of variables across cases.<br />

Scratch variables hold ei<strong>the</strong>r numeric or character values. The permanent vec<strong>to</strong>r is referenced with a P(J) notation<br />

allowing for calculation of <strong>the</strong> index value. It is created at <strong>the</strong> beginning of a run and can be used <strong>to</strong> pass values<br />

between commands as well as between cases. The P vec<strong>to</strong>r holds only numeric values. Multi-dimensional userdefined<br />

arrays are easier <strong>to</strong> use when an array is intrinsically multi-dimensional and can be defined <strong>to</strong> hold ei<strong>the</strong>r<br />

character or numeric data.<br />

<strong>PPL</strong>, <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong>, provides functions that identify or manipulate particular cases in<br />

a subgroup for modification and aggregation:<br />

• FIRST is true when <strong>the</strong> current case is <strong>the</strong> first case in a group.<br />

• LAST is true when <strong>the</strong> current case is <strong>the</strong> last case in a group.<br />

• SPLIT splits a case in<strong>to</strong> a number of new cases.<br />

• COLLECT collects a number of cases in<strong>to</strong> one large case.<br />

Splitting single cases in<strong>to</strong> multiple cases reorganizes data for plotting, t tests, or analysis of variance For example,<br />

monthly water flow measurements for multiple years may each be split in<strong>to</strong> 12 separate cases, and new<br />

variables showing <strong>the</strong> month and year may be created. Splitting also “undoes” collecting. Family or patient cases<br />

that are collected for modification may be split back in<strong>to</strong> <strong>the</strong>ir original cases afterwards.<br />

Collecting related cases permits subsequent modification of those cases: a family telephone number may be<br />

corrected for all members of a family, or a diagnosis may be added <strong>to</strong> all of a patient’s visit records. Collecting<br />

cases also permits <strong>the</strong> calculation of statistics that summarize <strong>the</strong> related cases, such as counts, means, <strong>to</strong>tals and<br />

o<strong>the</strong>rs. Total sales may be tallied for all <strong>the</strong> salesmen in each department, or mean income may be calculated for<br />

voters in each district.<br />

Several P-<strong>STAT</strong> commands also perform data modification and aggregation across related cases. The AG-<br />

GREGATE and DUPLICATES commands both produce files that contain summary records of a file or subgroups<br />

within a file. Aggregation and modification using <strong>the</strong>se commands are most appropriate when a file of summary<br />

information is <strong>the</strong> desired result. However, if <strong>the</strong> goal is <strong>to</strong> join summary information back on<strong>to</strong> <strong>the</strong> cases of <strong>the</strong><br />

original data file, COLLATE or LOOKUP must <strong>the</strong>n be used <strong>to</strong> do a hierarchical join. Using <strong>PPL</strong> for across-case<br />

modification and aggregation, as <strong>the</strong> file is read by a command such as MODIFY or LIST, often saves some extra<br />

steps.


8.2 <strong>PPL</strong>: Across-Case Modifications<br />

8.1 BASIC ACROSS-CASE AGGREGATION<br />

The FIRST and LAST functions are generally used with scratch variables or <strong>the</strong> permanent vec<strong>to</strong>r P for most basic<br />

types of aggregation. <strong>PPL</strong> instructions (such as GENERATE, SET and INCREASE), opera<strong>to</strong>rs (such as + , * and<br />

CONTAINS), and functions (such as MEAN, SQRT and TRIM) perform <strong>the</strong> actual modifications and<br />

calculations.<br />

8.2 Accessing FIRST and LAST Cases<br />

The FIRST and LAST functions determine whe<strong>the</strong>r a case is <strong>the</strong> first or last case in a file or, for cases ordered by<br />

subgroups, whe<strong>the</strong>r a case is <strong>the</strong> first or last case in <strong>the</strong> subgroup. FIRST and LAST are used with <strong>the</strong> system<br />

variable .FILE.<br />

<strong>to</strong> test for <strong>the</strong> beginning and ending cases of a file. (Any case selection is done before any o<strong>the</strong>r <strong>PPL</strong>, including<br />

testing for FIRST and LAST cases.) Figure 8.1 illustrates <strong>the</strong> use of FIRST and LAST.<br />

__________________________________________________________________________<br />

Figure 8.1 FIRST and LAST with Subgroups<br />

Age Sex<br />

Given <strong>the</strong>se 1 12 1<br />

six cases: 2 12 2<br />

3 12 2<br />

4 13 2<br />

5 14 1<br />

6 14 1<br />

The following statements are true for... case numbers:<br />

IF FIRST (.FILE. ), ... 1<br />

IF LAST (.FILE. ), ... 6<br />

IF FIRST ( Age ), ... 1, 4, 5<br />

IF FIRST ( Age, Sex ), ... 1, 2, 4, 5<br />

IF LAST ( Age ), ... 3, 4, 6<br />

IF LAST ( Age, Sex ), ... 1, 3, 4, 6<br />

__________________________________________________________________________<br />

The statement:<br />

IF FIRST ( .FILE. ),<br />

is true only when <strong>the</strong> first case of <strong>the</strong> file is processed. Similarly,<br />

IF LAST ( .FILE. ),<br />

is true only when <strong>the</strong> last case of <strong>the</strong> file is processed.<br />

If a file is ordered or sorted by one or more variables, <strong>the</strong> FIRST and LAST functions determine if a given<br />

case is <strong>the</strong> first or last member of <strong>the</strong> subgroup defined by those variables:<br />

IF FIRST (Division), GEN #Counter = 0 ;


<strong>PPL</strong>: Across-Case Modifications 8.3<br />

The first case of each division satisfies this test. Each time <strong>the</strong> first case in a division is processed, this IF statement<br />

is true and <strong>the</strong> variable Counter is set <strong>to</strong> zero. The LAST function is similar except that only <strong>the</strong> last case of a<br />

subgroup satisfies a LAST test.<br />

The FIRST and LAST functions are shown accessing cases in subgroups in Figure 8.1. The statement:<br />

( IF FIRST ( Age, Sex )<br />

is true for cases 1, 2, 4, and 5 — each time <strong>the</strong> value of Age changes or Sex within an Age group changes. (Notice<br />

that a comma separates <strong>the</strong> variables defining <strong>the</strong> subgroups.)<br />

The FIRST and LAST functions are used primarily with scratch variables and <strong>the</strong> permanent vec<strong>to</strong>r, both of<br />

which pass information between cases. Scratch variables contain ei<strong>the</strong>r numeric or character values. Depending<br />

on how <strong>the</strong>y are created, <strong>the</strong>y may be temporary or permanent. A temporary scratch variable exists only during<br />

<strong>the</strong> current command or macro. A permanent scratch variable retains its values across commands. The permanent<br />

vec<strong>to</strong>r exists for <strong>the</strong> duration of a P-<strong>STAT</strong> run. The permanent or (P) vec<strong>to</strong>r contains only numeric values.<br />

8.3 Scratch Variables<br />

Scratch variables may contain numeric or character values. They are created with GENERATE. A “#” (crosshatch<br />

or pound sign) is <strong>the</strong> first character in <strong>the</strong> scratch variable name. A scratch variable that starts with a single crosshatch<br />

exists for <strong>the</strong> duration of a single command or macro. A scratch variable that is created with two<br />

crosshatches exists for <strong>the</strong> duration of <strong>the</strong> P-<strong>STAT</strong> run. A scratch variable that is <strong>to</strong> contain character information<br />

must be defined as a character variable when it is generated. Its length, if greater than 40, must be cited:<br />

GENERATE #Name:C50 = Department.Name;<br />

GENERATE ##NAME:C50 = Department.Name;<br />

A scratch variable is not associated with a case; <strong>the</strong>refore, it has no position in <strong>the</strong> file. Scratch variables may<br />

not be used after ANY or ALL, or in list functions such as MEAN, SUM, MIN, MAX and SDEV.<br />

Scratch variables may be used within a command <strong>to</strong> hold temporary information:<br />

GENERATE #Temp = Rdg1 * Rdg2 + SQRT ( F.Fac<strong>to</strong>r ),<br />

GENERATE Result = ROUND ( #Temp / 10 );<br />

The scratch variable #Temp breaks up a complex calculation in<strong>to</strong> simpler components, without creating a new<br />

variable in <strong>the</strong> file. This calculation could be done in a one <strong>PPL</strong> clause with nested functions. However, several<br />

simple statements are more apt <strong>to</strong> be written correctly than a single complicated statement.<br />

The major use of scratch variables is across-case modification and aggregation. The scratch variable does not<br />

au<strong>to</strong>matically change when a new case is read, but only when it is explicitly changed. It is this property that makes<br />

it useful for passing information across cases in <strong>the</strong> file. Ano<strong>the</strong>r frequent use of scratch variables is in <strong>the</strong> TITLES<br />

command.<br />

TITLES 'Study Number #Study.Number'<br />

The following is an example which uses FIRST and LAST <strong>to</strong> count <strong>the</strong> number of cases which have <strong>the</strong> value<br />

of 'male’ on variable Sex:<br />

[ IF FIRST ( .FILE. ), GENERATE #Total.Males = 0;<br />

IF SEX EQ 'male', INCREASE #Total.Males;<br />

IF LAST ( .FILE. ), RETAIN ;<br />

KEEP .OTHERS. #Total.Males ]<br />

The scratch variable #Total.Males is generated and set equal <strong>to</strong> zero when <strong>the</strong> first case in <strong>the</strong> file is processed. A<br />

scratch variable remains zero until it is explicitly changed, typically with INCREASE or SET. The IF Sex EQ test<br />

is done for every case that is processed. When <strong>the</strong> result of <strong>the</strong> IF test is true, <strong>the</strong> value of #Total.Males is increased<br />

by 1. Each case is tested <strong>to</strong> see if it is <strong>the</strong> last case in <strong>the</strong> file. If it is not, <strong>the</strong> next case is read. The last case in<br />

<strong>the</strong> file is <strong>the</strong> only case that is retained.


8.4 <strong>PPL</strong>: Across-Case Modifications<br />

The last case contains all <strong>the</strong> original variables for that case plus <strong>the</strong> scratch variable Total.Males. When a<br />

scratch variable is used in a KEEP instruction, a regular variable with a position in <strong>the</strong> file is created. The # or ##<br />

is removed from <strong>the</strong> variable name and <strong>the</strong> variable can be referred <strong>to</strong> in subsequent <strong>PPL</strong> as Total.Males.<br />

__________________________________________________________________________<br />

Figure 8.2 Using Scratch Variables and FIRST<br />

File Staff:<br />

Rank Name Division Salary<br />

3 Sulley 12 22000<br />

2 De Jong 13 34300<br />

2 Swartz 13 27700<br />

5 Bryan 14 19500<br />

2 Fernald 12 26500<br />

3 Widmer 13 25000<br />

4 Williams 14 21300<br />

SORT Staff, BY Division Rank, OUT StaffSor $<br />

LIST StaffSor<br />

[ IF FIRST (Division), GEN #Cum.Salary = 0 );<br />

INCREASE #Cum.Salary BY Salary;<br />

KEEP .OTHERS. #Cum.Salary ] ,<br />

CONTROL Division $<br />

Cum<br />

Rank Name Division Salary Salary<br />

2 Fernald 12 26500 26500<br />

3 Sulley 12 22000 48500<br />

2 De Jong 13 34300 34300<br />

2 Swartz 13 27700 62000<br />

3 Widmer 13 25000 87000<br />

4 Williams 14 21300 21300<br />

5 Bryan 14 19500 40800<br />

__________________________________________________________________________<br />

Figure 8.2 illustrates <strong>the</strong> use of scratch variables with FIRST <strong>to</strong> get cumulative salary <strong>to</strong>tals. The file first<br />

must be sorted or ordered by <strong>the</strong> variables defining subgroup membership. It may also be sorted by o<strong>the</strong>r variables<br />

of interest. Fernald is <strong>the</strong> first member of Division 12 after <strong>the</strong> sort. Thus #Cum.Salary is generated equal <strong>to</strong> zero<br />

when his case is read:<br />

[ IF FIRST (Division), GEN #Cum.Salary = 0;<br />

#Cum.Salary is <strong>the</strong>n increased by 26,500, <strong>the</strong> value of Fernald’s Salary:<br />

INCREASE #Cum.Salary BY Salary;<br />

The KEEP instruction creates a new variable Cum.Salary. For Fernald’s case, Cum.Salary is set <strong>to</strong> 26,500, <strong>the</strong><br />

current value of <strong>the</strong> scratch variable #Cum.Salary:<br />

KEEP .OTHERS. #Cum.Salary ]


<strong>PPL</strong>: Across-Case Modifications 8.5<br />

Sulley, <strong>the</strong> next case after <strong>the</strong> sort, is also a member of Division 12. Therefore, #Cum.Salary is not reset <strong>to</strong><br />

zero, but it is increased by <strong>the</strong> value of Sulley’s salary <strong>to</strong> 48,500. This value, 48,500, is <strong>the</strong>n moved in<strong>to</strong> <strong>the</strong><br />

Cum.Salary variable for Sulley. De Jong is <strong>the</strong> first case in <strong>the</strong> next division, so #Cum.Salary is reset <strong>to</strong> zero, and<br />

<strong>the</strong> procedure is repeated.<br />

__________________________________________________________________________<br />

Figure 8.3 Creating a Summary Case with FIRST and LAST<br />

File Depts:<br />

Name Department Age Sex Position<br />

John Jones Hardware 33 m sales<br />

Jim Smith Hardware 42 m clerk<br />

Sara Clark Hardware 25 f service<br />

Gerry Walker Hardware 52 m manager<br />

Arlene Burns Personnel 29 f secretary<br />

George Dun Personnel 32 m clerk<br />

Jane Mason Personnel 43 f manager<br />

LIST Depts<br />

[ IF FIRST (Department), GENERATE #Employees = 0 ]<br />

[ INCREASE #Employees ) ]<br />

[ IF LAST (Department) RETAIN ;<br />

KEEP Department #Employees ] $<br />

Department Employees<br />

Hardware 4<br />

Personnel 3<br />

__________________________________________________________________________<br />

Figure 8.3 also illustrates aggregation using scratch variables. However, only a single summary case is retained<br />

for each department. The input file is already ordered by Department, so it need not be sorted.<br />

Each time <strong>the</strong> FIRST test is true, #Employees is initialized. It is increased as each case is processed. When <strong>the</strong><br />

LAST test is not true <strong>the</strong> case is deleted, and processing of <strong>the</strong> next case begins immediately. When <strong>the</strong> LAST<br />

test is true, KEEP is used <strong>to</strong> select variables for <strong>the</strong> summary case. The result is a report with one line per department<br />

containing <strong>the</strong> variables Department and Employees.<br />

8.4 The Permanent Vec<strong>to</strong>r<br />

The permanent (P) vec<strong>to</strong>r holds double-precision numeric values. The length of <strong>the</strong> P vec<strong>to</strong>r, like that of <strong>the</strong> V<br />

vec<strong>to</strong>r, is <strong>the</strong> maximum number of variables possible in a P-<strong>STAT</strong> system file. When a run begins, P-<strong>STAT</strong> generates<br />

<strong>the</strong> P vec<strong>to</strong>r with all <strong>the</strong> values set <strong>to</strong> missing type 1. New values are placed in <strong>the</strong> P vec<strong>to</strong>r using SET (using<br />

GENERATE will cause an error), and <strong>the</strong>y remain <strong>the</strong>re until <strong>the</strong>y are changed. The P vec<strong>to</strong>r is not re-initialized<br />

when a new P-<strong>STAT</strong> command begins.<br />

The values in <strong>the</strong> P vec<strong>to</strong>r are referenced by position number — P(5) refers <strong>to</strong> <strong>the</strong> value in <strong>the</strong> fifth location<br />

of <strong>the</strong> vec<strong>to</strong>r. This “subscript” notation permits <strong>the</strong> use of a variable or an expression as <strong>the</strong> index. Thus, locations<br />

in <strong>the</strong> P vec<strong>to</strong>r may be referenced with a DO loop:


8.6 <strong>PPL</strong>: Across-Case Modifications<br />

DO #J = 1, 8; SET P(#J) = V(#J) ; ENDDO;<br />

This will set <strong>the</strong> first eight values in <strong>the</strong> permanent vec<strong>to</strong>r <strong>to</strong> <strong>the</strong> first eight values in <strong>the</strong> current case. The contents<br />

of <strong>the</strong> expression denoting which P variable is <strong>to</strong> be used may be calculated:<br />

DO #L = 1, 6;<br />

SET P(#L) = V(#L);<br />

SET P(#L+6) = SQRT( V(#L) );<br />

ENDDO;<br />

However, <strong>the</strong> result of #L+6 must be an integer between 1 and <strong>the</strong> maximum number of variables in a file.<br />

Figure 8.4 illustrates <strong>the</strong> use of <strong>the</strong> permanent vec<strong>to</strong>r <strong>to</strong> move information between files. The initial PRO-<br />

CESS command is used as a vehicle for <strong>PPL</strong>; <strong>the</strong>re is no output file. The locations P(1) and P(2) are set <strong>to</strong> zero<br />

when <strong>the</strong> first case in <strong>the</strong> file is processed. Then, for each case (including <strong>the</strong> first), if <strong>Inc</strong>ome is not missing, P(1)<br />

is increased by 1 and P(2) is increased by <strong>the</strong> value of <strong>Inc</strong>ome. Thus, P(1) has <strong>the</strong> count of cases with good values<br />

for <strong>Inc</strong>ome and P(2) contains <strong>to</strong>tal <strong>Inc</strong>ome for all cases. P(1) and P(2) are available <strong>to</strong> <strong>the</strong> second MODIFY command,<br />

permitting <strong>the</strong> calculation of #Mean.<strong>Inc</strong>ome. As each case is processed, a simple subtraction produces <strong>the</strong><br />

difference between <strong>Inc</strong>ome for that case and #Mean.<strong>Inc</strong>ome.<br />

__________________________________________________________________________<br />

Figure 8.4 Moving Values Between Files with <strong>the</strong> P Vec<strong>to</strong>r<br />

C 'Get number of people and income <strong>to</strong>tals.' $<br />

PROCESS File1<br />

[ IF FIRST (.FILE.),<br />

SET P(1) = 0, SET P(2) = 0 ]<br />

[ IF <strong>Inc</strong>ome GOOD,<br />

INCREASE P(1),<br />

INCREASE P(2) BY <strong>Inc</strong>ome ] $<br />

C 'Now get mean income and income differences.' $<br />

MODIFY File1<br />

[ IF FIRST (.FILE.),<br />

GENERATE #Mean.<strong>Inc</strong>ome = P(2) / P(1) ]<br />

[ GENERATE Difference = <strong>Inc</strong>ome - #Mean.<strong>Inc</strong>ome ],<br />

OUT File2 $<br />

___________________________________________________________________________<br />

The permanent vec<strong>to</strong>r is also useful for passing a large number of numeric variables across cases within a<br />

command. Given some number of tests, each with a variable name beginning with “Test”, <strong>the</strong>se instructions will<br />

get class <strong>to</strong>tals for all <strong>the</strong> tests:<br />

[ IF FIRST ( Class ),<br />

DO #J USING Test?; SET P(#J) = 0; ENDDO;<br />

DO #J USING Test?; INC P(#J) BY V(#J);<br />

IF LAST ( Class ), RETAIN;<br />

DO #J USING Test?; SET V(#J) = P(#J; ENDDO ]<br />

Given any number of tests in any locations in <strong>the</strong> file, <strong>the</strong> P values in corresponding locations are initialized when<br />

<strong>the</strong> first case of a class is processed. (The index J uses <strong>the</strong> positions of all <strong>the</strong> Test? variables in <strong>the</strong> file as its<br />

values.) Each of <strong>the</strong> P values is increased by <strong>the</strong> associated test value.<br />

Each case is <strong>the</strong>n evaluated <strong>to</strong> determine if it is <strong>the</strong> last case for a class. If <strong>the</strong> test result is false, that case is<br />

not retained, <strong>the</strong> next case is read and processing resumes with <strong>the</strong> first <strong>PPL</strong> statement. If <strong>the</strong> test result is true,


<strong>PPL</strong>: Across-Case Modifications 8.7<br />

<strong>the</strong> <strong>PPL</strong> continues and <strong>the</strong> test values are set <strong>to</strong> <strong>the</strong> accumulated <strong>to</strong>tals. This final case for <strong>the</strong> class is <strong>the</strong> only case<br />

that is seen by <strong>the</strong> P-<strong>STAT</strong> command.<br />

The choice of whe<strong>the</strong>r <strong>to</strong> use <strong>the</strong> P vec<strong>to</strong>r or scratch variables depends on <strong>the</strong> number of variables involved,<br />

whe<strong>the</strong>r any are character variables, and <strong>the</strong> desired tasks. The P vec<strong>to</strong>r is usually easier <strong>to</strong> use when many variables<br />

are treated <strong>the</strong> same way, as in initialization:<br />

DO #Q = 1 TO 16; SET P(#Q) = 0 ; ENDDO;<br />

Scratch variables may be more convenient when only a few variables or character data are involved:<br />

GENERATE #T1 = 0, GENERATE #NN:C = ' ' ;<br />

8.5 User-defined Arrays<br />

Like scratch variables, arrays <strong>the</strong>y are defined during a run and used in <strong>PPL</strong> statements. An array can have up <strong>to</strong><br />

7 dimensions, and can be character or numeric. Array names have two characters, <strong>the</strong> second being <strong>the</strong> same as<br />

<strong>the</strong> first, like XX or cc or Zz. Case doesn’t matter. There can be up <strong>to</strong> 26 active arrays.<br />

How do arrays compare <strong>to</strong> <strong>the</strong> P vec<strong>to</strong>r? The P vec<strong>to</strong>r allows N numeric values, where N is <strong>the</strong> maximum<br />

number of variables in a file in a given version of P-<strong>STAT</strong>. This is usually 6,000. The P vec<strong>to</strong>r in one-dimensional;<br />

P(1) through P(N) can be used. Arrays are an improvement over <strong>the</strong> P vec<strong>to</strong>r in 3 ways:<br />

1. allowing character as well as numeric arrays.<br />

2. allowing dimensioning like XX(2,1,5).<br />

3. providing an array buffer (where arrays are placed) that is 3 times larger than <strong>the</strong> P vec<strong>to</strong>r.<br />

Arrays are defined by using <strong>the</strong> DEFINE.ARRAY command.<br />

DEFINE.ARRAY xx (10,30) TO 0 $<br />

This command defines XX as a numeric array with 2 dimensions. The first subscript will be 1 <strong>to</strong> 10, <strong>the</strong> second<br />

1 <strong>to</strong> 30. The 300 array values are initialized <strong>to</strong> zero. Initialization (<strong>the</strong> “TO 0” part) is optional; if it is not used<br />

<strong>the</strong> values are set <strong>to</strong> missing type 1.<br />

A dimensioning using zero or negative integers is allowed; for example: for example,<br />

DEFINE.ARRAY aa (-4:4, 0:10, -100:0) $<br />

A character array is defined by adding a numeric value which is <strong>the</strong> length for each of <strong>the</strong> character values in <strong>the</strong><br />

array<br />

DEFINE.ARRAY KK:12 (2, 10, 101:140) TO ' ' $<br />

This defines KK as a character array with 3 dimensions. Each value can hold 12 characters. This is shown by <strong>the</strong><br />

KK:12. The first subscript can be 1 or 2, <strong>the</strong> second 1 <strong>to</strong> 10, and <strong>the</strong> third 101 <strong>to</strong> 140. The 800 array values are<br />

initialized <strong>to</strong> blank. The maximum size of a character value is 50,000 characters.<br />

Each character value has a status word (<strong>to</strong> indicate missing or good), followed by <strong>the</strong> characters of <strong>the</strong> value.<br />

A character:4 value uses one array buffer element (as does a numeric value). A character:12 value needs 2 array<br />

buffer elements, a character:20 value needs 3 elements, and so on. The array buffer is very large and even <strong>the</strong><br />

Whopper II size can hold an array of 6000 C20 elements.<br />

SHOW.ARRAYS $<br />

The show.arrays command displays <strong>the</strong> names, size (if character), number of dimensions, and defined subscript<br />

range for each array.<br />

DROP.ARRAY aa zz pp $<br />

This command ends <strong>the</strong> definition of <strong>the</strong> indicated array or arrays and releases <strong>the</strong> array buffer space, making it<br />

available for o<strong>the</strong>r definitions.


8.8 <strong>PPL</strong>: Across-Case Modifications<br />

DROP.P.VECTOR $<br />

This command takes <strong>the</strong> P vec<strong>to</strong>r space and adds it <strong>to</strong> <strong>the</strong> array buffer. This allows larger arrays, but ends<br />

any use of <strong>the</strong> P vec<strong>to</strong>r in <strong>the</strong> run.<br />

Suppose we used 'DEFINE.ARRAY xx(3,5)$' and set #n <strong>to</strong> 2. These 3 (unrelated) standalone <strong>PPL</strong> statements<br />

would be valid, as would <strong>the</strong> nested DO loop.<br />

SET XX (2, #n ) = 77 $<br />

PUT XX (#n, #n-1 ) $<br />

IF XX (1, 3 ) LT XX(2,4), SET XX(1,3) <strong>to</strong> .m1. $<br />

DO #j = 1,3;<br />

DO #k = 1,5;<br />

SET XX( #j, #k ) = #j * 10 + #k;<br />

ENDDO;<br />

ENDDO $<br />

__________________________________________________________________________<br />

Figure 8.5 DEFINE.ARRAY and SHOW.ARRAYS<br />

DEFINE.ARRAY xx ( 0:3, 5 ) $<br />

DEFINE.ARRAY cc:12 ( 44, 2,2 ) $<br />

SHOW ARRAYS $<br />

---------Numeric array xx has been defined---------<br />

It has 20 values, organized in<strong>to</strong> 2 dimensions.<br />

The array buffer now has 17,980 unused elements.<br />

---------------------------------------------------<br />

-------Character array cc:12 has been defined-------<br />

It has 176 values, organized in<strong>to</strong> 3 dimensions.<br />

The array buffer now has 17,628 unused elements.<br />

----------------------------------------------------<br />

---------------array summary---------------<br />

There are 2 user-defined arrays:<br />

cc:12 ( 44, 2, 2 )<br />

xx ( 0:3, 5 )<br />

The array buffer contains 18,000 elements.<br />

372 are in use by existing arrays.<br />

17,628 are available for array definition.<br />

-------------------------------------------<br />

__________________________________________________________________________<br />

Once an array is defined it can be used by any command in <strong>the</strong> same way that <strong>the</strong> P vec<strong>to</strong>r is used. Given a<br />

file containing at least <strong>the</strong> following 3 variables:<br />

Gender coded 1=male<br />

2=female<br />

Age coded 1=le 30<br />

2=31 - 40<br />

3=Over 40<br />

<strong>Inc</strong>ome coded in dollar amounts.


<strong>PPL</strong>: Across-Case Modifications 8.9<br />

Produce <strong>the</strong> following report where Group represents one of <strong>the</strong> 6 possible gender/age groups.<br />

Group n had <strong>the</strong> highest average income of $xx,xxx.xx<br />

This type of question can be solved in a variety of ways. Because <strong>the</strong>re are 6 groups and it is necessary <strong>to</strong><br />

save both <strong>the</strong> number of cases in each group and <strong>the</strong> <strong>to</strong>tal income of each group across all <strong>the</strong> cases, arrays provide<br />

an easy way <strong>to</strong> handle <strong>the</strong> data collection.<br />

__________________________________________________________________________<br />

Figure 8.6 One-dimensional Arrays<br />

DEFINE.ARRAY gg (6) <strong>to</strong> 0 $<br />

DEFINE.ARRAY tt (6) <strong>to</strong> 0 $<br />

GEN ##High = 0; GEN ##Group $<br />

PROCESS px1298a [<br />

GEN #N = 0;<br />

DO #A = 1, 3;<br />

DO #G = 1, 2;<br />

INC #N;<br />

IF Age.ban = #A and Gender = #G,<br />

INCREASE GG(#N),<br />

INCREASE TT(#N) BY <strong>Inc</strong>ome;<br />

IF LAST ( .FILE. ) AND TT(#N) / GG(#N) GT ##High,<br />

SET ##High = TT(#N) / GG(#N),<br />

SET ##GROUP = #n;<br />

ENDDO;<br />

ENDDO;<br />

] $<br />

PUT "Group " ##Group<br />

" had <strong>the</strong> highest average income of $"<br />

@COMMAS @PLACES2 ##High $<br />

__________________________________________________________________________<br />

The first two commands in Figure 8.6 define two arrays with 6 elements in each. Array gg is used <strong>to</strong> accumulate<br />

<strong>the</strong> cases for each group while tt is used <strong>to</strong> accumulate <strong>to</strong>tal income. Figure 8.7 illustrates <strong>the</strong> same solution<br />

using two-dimensional arrays. In this particular example, <strong>the</strong> use of <strong>the</strong> one-dimensional arrays is somewhat easier<br />

<strong>to</strong> follow and <strong>the</strong>re is little difference in <strong>the</strong> amount of code required.<br />

It would be possible <strong>to</strong> use a single three dimensional array (3,2,2) <strong>to</strong> hold all twelve of <strong>the</strong> values that are<br />

needed for this particular problem. Such complexity might serve as an exercise in nesting do loops and handling<br />

scratch variables but would only complicate <strong>the</strong> solution of a fairly simple problem.<br />

Multiple dimensions are most useful when <strong>the</strong> contents of <strong>the</strong> cells is similar. For example: data on sales on<br />

5 divisions for 12 months from 6 regions of <strong>the</strong> country and might best be processed if s<strong>to</strong>red in a single 5,12,6<br />

array.


8.10 <strong>PPL</strong>: Across-Case Modifications<br />

__________________________________________________________________________<br />

Figure 8.7 Two-dimensional Arrays<br />

DEFINE.ARRAY gg (3,2) <strong>to</strong> 0 $<br />

DEFINE.ARRAY tt (3,2) <strong>to</strong> 0 $<br />

GEN ##High=0; GEN ##Group $<br />

PROCESS Myfile [<br />

DO #A = 1, 3;<br />

DO #G = 1, 2;<br />

IF Age = #A and Gender = #G,<br />

INCREASE GG(#A,#G),<br />

INCREASE TT(#A,#G) BY <strong>Inc</strong>ome;<br />

IF LAST (.FILE.) AND TT(#A,#G) / GG(#A,#G) GT ##HIGH,<br />

SET ##High = TT(#A,#G) / GG(#A,#G),<br />

SET ##Group = #A + (#G-1) * 3;<br />

ENDDO;<br />

ENDDO;<br />

] $<br />

PUT "Group " ##Group<br />

" had <strong>the</strong> highest average income of $"<br />

@COMMAS @PLACES2 ##High $<br />

__________________________________________________________________________<br />

8.6 Interaction of FIRST, LAST and O<strong>the</strong>r <strong>PPL</strong><br />

The FIRST and LAST functions interact with o<strong>the</strong>r <strong>PPL</strong> instructions in <strong>the</strong> following manner:<br />

1. Case selection, such as CASES 11 TO 30, is done first. The rest of <strong>the</strong> <strong>PPL</strong> sees only those 30 cases<br />

and has no idea that <strong>the</strong>y came from a larger file.<br />

2. An internal FIRST/NOTFIRST and LAST/NOTLAST flag is set for each FIRST or LAST test used<br />

in <strong>the</strong> <strong>PPL</strong>. This is done as soon as <strong>the</strong> case passes <strong>the</strong> CASES filter. FIRST(.FILE.), for example,<br />

is true for <strong>the</strong> first case processed. That case may have been <strong>the</strong> eleventh case of <strong>the</strong> original input<br />

file.<br />

3. The <strong>PPL</strong> for <strong>the</strong> current case is done now. Because <strong>the</strong> FIRST and LAST settings for a case are determined<br />

before o<strong>the</strong>r <strong>PPL</strong> begins, FIRST and LAST testing cannot be done on newly generated<br />

variables. Also, recoding a FIRST or LAST variable has no effect on <strong>the</strong> FIRST and LAST settings,<br />

since those settings are done before recodes occur.<br />

4. The DELETE and RETAIN instructions should not be used until all FIRST and LAST tests are<br />

complete.<br />

To summarize, FIRST and LAST logic is based on <strong>the</strong> pre-<strong>PPL</strong> values of only those cases that remain after any<br />

case selection.<br />

The last portion of this section on basic across-case modification gives two detailed examples that use scratch<br />

variables, <strong>the</strong> P vec<strong>to</strong>r and <strong>the</strong> FIRST function, along with IF tests, DO loops, <strong>the</strong> PUT instruction and <strong>the</strong> PUT<br />

counter (.PUT.). The examples integrate <strong>the</strong> various <strong>PPL</strong> procedures covered thus far in handling realistic problems<br />

encountered in data modification.


<strong>PPL</strong>: Across-Case Modifications 8.11<br />

8.7 Example: Checking a List of Variables<br />

In creating a new variable from a series of dummy variables, it may be sensible <strong>to</strong> check that only one of <strong>the</strong> dummy<br />

variables contains <strong>the</strong> value 1 and that <strong>the</strong> rest are zero.<br />

The instructions shown in Figure 8.8 test that only one of <strong>the</strong> dummy variables has been coded 1 and <strong>the</strong>y<br />

create <strong>the</strong> new variable Region. Scratch variables are used <strong>to</strong> contain <strong>the</strong> results of <strong>the</strong> IF test. The scratch variables<br />

#Test1 and #Test2 are generated equal <strong>to</strong> 0 in all cases.<br />

#Test1 is incremented each time <strong>the</strong> DO loop test is true. #Test1 is 0 at <strong>the</strong> end of <strong>the</strong> DO loop if none of <strong>the</strong><br />

variables contained a 1. #Test1 is greater than 1 if more than one of <strong>the</strong> variables contained a 1. #Test2 is incremented<br />

by <strong>the</strong> value of <strong>the</strong> variable in <strong>the</strong> DO loop each time <strong>the</strong> IF test is false or missing. If #Test2 is missing<br />

at <strong>the</strong> end of <strong>the</strong> loop, one of <strong>the</strong> dummy variable values is missing. If #Test2 is greater than 0 at <strong>the</strong> end of <strong>the</strong><br />

loop, one or more of <strong>the</strong> dummy variables was some value o<strong>the</strong>r than 0 or 1.<br />

__________________________________________________________________________<br />

Figure 8.8 Checking Variables Using PUT and Scratch Variables<br />

MODIFY Regional<br />

[ KEEP North.East TO South.West Age Sex ;<br />

GENERATE Region = .M1. ;<br />

GEN #Test1 = 0, GEN #Test2 = 0 ]<br />

[ DO #J = 1 TO 4;<br />

IF V(#J) EQ 1, SET Region = J,<br />

T.INCREASE #Test1, FM.INCREASE #Test2 BY V(#J) ;<br />

ENDDO;<br />

IF #Test1 EQ 0, PUT .N. ;<br />

IF #Test1 GT 1, PUT .N. ,<br />

SET Region = .M2.;<br />

IF #Test2 MISSING OR #Test2 GT 0,<br />

PUT .N. ,<br />

SET Region = .M3. ;<br />

IF .PUT. GT 0, RETAIN ],<br />

OUT Errors $<br />

__________________________________________________________________________<br />

The PUT instruction reports any error conditions. Region is set <strong>to</strong> ei<strong>the</strong>r missing type 2 or missing type 3,<br />

depending on whe<strong>the</strong>r <strong>the</strong> error is a missing or multiple coding of 1 or a missing or non-zero coding of 0. Finally,<br />

if <strong>the</strong>re were errors, <strong>the</strong> case is retained and written <strong>to</strong> an error file — <strong>the</strong> output file named “Errors”. .PUT. is a<br />

system variable which is reset <strong>to</strong> 0 as each new case is read. It is incremented each time that a PUT instruction is<br />

issued. In Figure 8.8 <strong>the</strong> PUT instructions are only made <strong>to</strong> report errors and any case with a .PUT. value of 0 is<br />

error free.<br />

Consider:<br />

[ IF FIRST ( .FILE.) GEN #n = 0] and<br />

[ GEN #n = 0]<br />

The generate by itself zeros #n whenever a new case is read. The generate hung on FIRST (.FILE.) only zeros #n<br />

when <strong>the</strong> initial case is read.


8.12 <strong>PPL</strong>: Across-Case Modifications<br />

8.8 Example: Selecting a Block of Cases<br />

Selecting a block of cases, such as all cases from <strong>the</strong> one with <strong>the</strong> value “Jones” on Last.Name up <strong>to</strong> (but not including)<br />

<strong>the</strong> case with <strong>the</strong> value “Smith”, is a bit more complicated than selecting cases by <strong>the</strong>ir position in <strong>the</strong> file<br />

(.N.) or by specific values of a variable. Values in <strong>the</strong> permanent vec<strong>to</strong>r may be used <strong>to</strong> delineate <strong>the</strong> block of<br />

cases:<br />

[ IF FIRST ( .FILE. ), SET P(1) = 0 ;<br />

IF Last.Name EQ 'Jones', SET P(1) = 1 ;<br />

IF P(1) = 0, DELETE ;<br />

IF Last.Name EQ 'Smith', QUITFILE ]<br />

When <strong>the</strong> first case in <strong>the</strong> file is processed, P(1) is set equal <strong>to</strong> 0. It is reset <strong>to</strong> 1 when <strong>the</strong> Jones case is processed.<br />

Any cases with values of 0 for P(1) are deleted from fur<strong>the</strong>r processing. Thus, cases prior <strong>to</strong> Jones are deleted.<br />

When <strong>the</strong> Smith case is found, processing of <strong>the</strong> file s<strong>to</strong>ps (without using this case).<br />

8.9 THE SPLIT FUNCTION<br />

The SPLIT function divides a case in<strong>to</strong> multiple cases. When data are collected with related information in<br />

a single case, reorganization in<strong>to</strong> multiple cases may be necessary for various commands. For example, a household<br />

survey may have both household information and information for several household members in <strong>the</strong> same<br />

case. A medical study may have several patient visits or lab results in <strong>the</strong> same case. This organization is often<br />

inappropriate for many statistical analyses such as TTEST or ANOVA, which require explicit grouping variables<br />

or indices.<br />

Special forms of <strong>the</strong> SPLIT function:<br />

[ SPLIT ] or [ SPLIT * ]<br />

reverse <strong>the</strong> effects of COLLECT, <strong>the</strong> function which ga<strong>the</strong>rs multiple cases in<strong>to</strong> one case. They are described after<br />

<strong>the</strong> discussion of COLLECT and in <strong>the</strong> summary ending this chapter.<br />

8.10 Splitting a Case<br />

The simplest usage of SPLIT divides each case in<strong>to</strong> a designated number of cases. This file has two cases and<br />

each case has two variables:<br />

Test1 Test2<br />

16 12<br />

17 11<br />

Suppose you wanted <strong>to</strong> split each case in<strong>to</strong> 2 cases. A command such as:<br />

LIST X [ SPLIT INTO 2 ] $<br />

receives only <strong>the</strong> newly created cases. When SPLIT INTO 2 is encountered, each case in <strong>the</strong> file is converted in<strong>to</strong><br />

two cases:<br />

Test1<br />

16<br />

12<br />

17<br />

11<br />

There is an error message if <strong>the</strong> number of variables being split is not a multiple of <strong>the</strong> SPLIT argument. For example,<br />

you cannot do a simple SPLIT INTO 3 when <strong>the</strong>re are 5 variables, but you can when <strong>the</strong>re are 3, 6, 9, 12,<br />

etc.


<strong>PPL</strong>: Across-Case Modifications 8.13<br />

Ei<strong>the</strong>r:<br />

SPLIT INTO N or SPLIT N<br />

may be used; <strong>the</strong> word INTO is optional. N, <strong>the</strong> number of new cases, must be an integer. It can be an integer<br />

constant, a permanent scratch variable (##n), or (in a macro) a temporary scratch variable (#n). Thus, when <strong>the</strong><br />

input case has 40 variables, SPLIT INTO 4 yields ten variables in each of <strong>the</strong> four new cases. The first ten variable<br />

names are used. Case one has values 1 <strong>to</strong> 10, case two has values 11 <strong>to</strong> 20, and so on. There is an error message<br />

if <strong>the</strong> variables are split in<strong>to</strong> cases such that a numeric and a character variable would be combined in<strong>to</strong> a single<br />

variable (would be in <strong>the</strong> same column).<br />

The variables present in <strong>the</strong> new cases are also determined by additional options used with <strong>the</strong> SPLIT function.<br />

These options can occur in any order, as often as needed. Their order determines <strong>the</strong> order of <strong>the</strong> variables<br />

in <strong>the</strong> new cases. SPLIT itself must precede any options.<br />

8.11 CARRYing Identifying Variables<br />

CARRY is an optional instruction that specifies one or more variables <strong>to</strong> be carried in every case formed by <strong>the</strong><br />

SPLIT. CARRY requires one or more variables as its argument:<br />

CARRY Name, or CARRY Name Age Sex,<br />

Figure 8.9 illustrates <strong>the</strong> results of a SPLIT where CARRY is used <strong>to</strong> position <strong>the</strong> variables Name, Age and<br />

Sex in each of <strong>the</strong> new cases. Only <strong>the</strong> variables not mentioned in <strong>the</strong> CARRY instruction are split. (STUB may<br />

be used with LIST <strong>to</strong> highlight <strong>the</strong> hierarchical relationship between <strong>the</strong> carried variables and <strong>the</strong> split variables.)<br />

__________________________________________________________________________<br />

Figure 8.9 Using CARRY in <strong>the</strong> SPLIT Function<br />

FILE Students:<br />

Name Age Sex Test1 Test2<br />

Smith, Jason 11 1 16 12<br />

Wilson, Ann 14 2 17 11<br />

LIST Students [ SPLIT INTO 2, CARRY Name Age Sex ] $<br />

Name Age Sex Test1<br />

Smith, Jason 11 1 16<br />

Smith, Jason 11 1 12<br />

Wilson, Ann 14 2 17<br />

Wilson, Ann 14 2 11<br />

LIST Students [ SPLIT INTO 2, CARRY Name Age Sex ],<br />

STUB Name Age Sex $<br />

Name Age Sex Test1<br />

Smith, Jason 11 1 16<br />

12<br />

Wilson, Ann 14 2 17<br />

11<br />

_________________________________________________________________________


8.14 <strong>PPL</strong>: Across-Case Modifications<br />

8.12 Selecting Variables To USE<br />

The variables <strong>to</strong> be used in <strong>the</strong> SPLIT may first be selected by using KEEP in a separate modification clause,<br />

[ KEEP Test1 Test2 ;<br />

SPLIT INTO 2 ]<br />

or <strong>the</strong>y can be specified as part of <strong>the</strong> SPLIT function with <strong>the</strong> USE option:<br />

SPLIT INTO 2, USE Test1 Test2 ;<br />

Figure 8.10 shows <strong>the</strong> results of a USE selection.<br />

__________________________________________________________________________<br />

Figure 8.10 Selecting Variables for SPLIT with USE<br />

FILE Students:<br />

Name Age Sex Test1 Test2<br />

Smith, Jason 11 1 16 12<br />

Wilson, Ann 14 2 17 11<br />

LIST Students [ SPLIT INTO 2, USE Test1 Test2 ] $<br />

Test1<br />

16<br />

12<br />

17<br />

11<br />

__________________________________________________________________________<br />

USE requires ei<strong>the</strong>r one or more variable names as its argument. The number of variables must be a multiple<br />

of <strong>the</strong> SPLIT argument. If SPLIT INTO 6 is used and USE specifies 18 variables, <strong>the</strong> 18 variables are split in<strong>to</strong><br />

six output cases with three variables each. The variable names are those of <strong>the</strong> first three variables specified after<br />

USE. The USE variables can include ranges:<br />

USE Test1 TO Test9 Test99<br />

The USE option may be used solely <strong>to</strong> reorder <strong>the</strong> variables that are in <strong>the</strong> output cases:<br />

SPLIT INTO 2, USE Test2 Test1 ;<br />

When all variables are <strong>to</strong> be used, USE is not necessary — <strong>the</strong>se two instructions are equivalent:<br />

SPLIT INTO 2;<br />

SPLIT INTO 2, USE V(1) .ON. ;<br />

USE is also not necessary when o<strong>the</strong>r options, such as CARRY, are present and <strong>the</strong> number of variables not being<br />

carried is a multiple of <strong>the</strong> SPLIT argument.<br />

8.13 Defining New Variables with CREATE<br />

When <strong>the</strong> cases in a file are split, <strong>the</strong> names of <strong>the</strong> variables are those of <strong>the</strong> variables present in <strong>the</strong> first new case.<br />

Test1 and Test2 may be appropriate names before <strong>the</strong> SPLIT, when <strong>the</strong> variables are in one case. However, when<br />

<strong>the</strong> case is SPLIT, a variable name such as Test.Score may be more appropriate for all <strong>the</strong> Test? variables. The<br />

CREATE option gives an output variable a new name and also specifies just which variable values are <strong>to</strong> be used<br />

for that variable. CREATE takes <strong>the</strong> place of USE.


<strong>PPL</strong>: Across-Case Modifications 8.15<br />

The first argument for CREATE is <strong>the</strong> new variable name. The subsequent arguments are <strong>the</strong> existing variables<br />

whose values will be those of <strong>the</strong> new variable:<br />

CREATE Test.Score Test1 Test2<br />

The new variable created is “Test.Score”. The first new case output from SPLIT gets <strong>the</strong> value of <strong>the</strong> variable<br />

Test1, <strong>the</strong> first variable in <strong>the</strong> current input case <strong>to</strong> be used, for <strong>the</strong> new variable Test.Score. The second new case<br />

gets <strong>the</strong> value of Test2, <strong>the</strong> second variable in <strong>the</strong> current input case, for <strong>the</strong> same new variable, and so on. Figure<br />

8.11 shows <strong>the</strong> effect of CREATE. Note that <strong>the</strong> variables produced by <strong>the</strong> SPLIT are in <strong>the</strong> order in which <strong>the</strong>y<br />

are mentioned.<br />

__________________________________________________________________________<br />

Figure 8.11 Naming <strong>the</strong> New Variables with CREATE<br />

FILE Students:<br />

Name Age Sex Test1 Test2<br />

Smith, Jason 11 1 16 12<br />

Wilson, Ann 14 2 17 11<br />

LIST Students<br />

[ SPLIT 2,<br />

CREATE Test.Score Test1 Test2 ,<br />

CARRY Name ] $<br />

Test<br />

Score Name<br />

16 Smith, Jason<br />

12 Smith, Jason<br />

17 Wilson, Ann<br />

11 Wilson, Ann<br />

__________________________________________________________________________<br />

The number of variables in <strong>the</strong> list following after CREATE and <strong>the</strong> name for <strong>the</strong> created variable must equal<br />

<strong>the</strong> number of new cases being produced. Several CREATE instructions may follow a SPLIT. However, <strong>the</strong> number<br />

of variables in each CREATE list must equal <strong>the</strong> number of new cases being produced. Figure 8.12 shows<br />

three new cases produced from each existing case. Thus, three variables are in each CREATE list. Two new variables<br />

are defined. Two new variables, each using three existing variables, equal six variables, which is <strong>the</strong> number<br />

of variables in <strong>the</strong> original case <strong>to</strong> be split.<br />

When CREATE is used, any variables not cited in <strong>the</strong> CARRY or CREATE instructions are omitted from <strong>the</strong><br />

SPLIT unless USE is also included. When USE is included without a variable list, all <strong>the</strong> remaining variables are<br />

included in <strong>the</strong> SPLIT. The variable names for <strong>the</strong>se additional variables are those of <strong>the</strong> variables in <strong>the</strong> first case<br />

of <strong>the</strong> output file.<br />

8.14 Wildcard Notation and Masks<br />

Often cases which contain <strong>the</strong> type of data that is appropriate for splitting have variable names in which part of<br />

<strong>the</strong> name is a prefix and <strong>the</strong> rest is a counter or additional text <strong>to</strong> distinguish <strong>the</strong> values. When this situation exists,<br />

<strong>the</strong> ? wildcard notation can be used.


8.16 <strong>PPL</strong>: Across-Case Modifications<br />

The ? ei<strong>the</strong>r follows a prefix <strong>to</strong> indicate all variables starting with that prefix, or it precedes a suffix <strong>to</strong> indicate<br />

all variables ending with that suffix. Ei<strong>the</strong>r of <strong>the</strong> following produce <strong>the</strong> same result:<br />

SPLIT 2, CREATE Test.Score Test1 Test2 ;<br />

SPLIT 2, CREATE Test.Score Test? ;<br />

Ano<strong>the</strong>r way <strong>to</strong> select certain variables is <strong>to</strong> use a mask after a range:<br />

USE Test1 TO Test8 (MASK 1001),<br />

is <strong>the</strong> same as saying:<br />

USE Test1 Test4 Test5 Test8,<br />

given of course that Test1 through Test8 are consecutive variables in <strong>the</strong> file.<br />

Figure 8.12 Multiple CREATE Lists<br />

File Field121:<br />

Crop Date Y1 Y2 Y3 Y4 Y5 Y6<br />

Alfalfa 8/24/83 181 179 182 195 192 198<br />

Alfalfa 8/30/82 179 177 176 192 190 199<br />

LIST Field121<br />

[ SPLIT INTO 3, CARRY Crop Date,<br />

CREATE Plot.1 Y1 TO Y3, CREATE Plot.2 Y4 TO Y6 ],<br />

STUB Crop Date $<br />

Crop Date Plot.1 Plot.2<br />

Alfalfa 8/24/83 181 195<br />

179 192<br />

182 198<br />

8/30/82 179 192<br />

177 190<br />

176 199<br />

__________________________________________________________________________<br />

8.15 INDEXing Cases<br />

The INDEX option sequences <strong>the</strong> cases created by SPLIT. Several different indices may be built at <strong>the</strong> same time.<br />

Multiple indices are most useful when SPLIT is used <strong>to</strong> reorganize data for analysis of variance.<br />

INDEX requires a name for <strong>the</strong> new variable:<br />

SPLIT 2, INDEX Treatment ;<br />

A new variable named “Treatment” is created. It has <strong>the</strong> value 1 in <strong>the</strong> first case created by SPLIT and <strong>the</strong> value<br />

2 in <strong>the</strong> second case created by SPLIT. Figure 8.13 illustrates <strong>the</strong> use of INDEX.<br />

Multiple indices may also be created:<br />

INDEX Plot 2 Subplot 3,<br />

This creates two new variables named “Plot” and “Subplot”. Plot has <strong>the</strong> values 1 and 2. Subplot has <strong>the</strong> values<br />

1, 2, and 3. The first index moves more slowly than <strong>the</strong> second index, so that Plot remains 1 as Subplot is succes-


<strong>PPL</strong>: Across-Case Modifications 8.17<br />

sively 1, 2, and 3. Then Plot becomes 2, and Subplot is successively 1, 2, and 3. This means that <strong>the</strong>re must be<br />

six cases created by <strong>the</strong> SPLIT.<br />

When <strong>the</strong> right-most index value is omitted, <strong>the</strong> appropriate value is assumed. INDEX A 2 B is equivalent <strong>to</strong><br />

INDEX A 2 B 3 when SPLIT INTO 6 has been used, because <strong>the</strong> product of <strong>the</strong> INDEX values equals <strong>the</strong> SPLIT<br />

argument.<br />

__________________________________________________________________________<br />

Figure 8.13 Producing an Index Variable<br />

FILE Students:<br />

Name Age Sex Test1 Test2<br />

Smith, Jason 11 1 16 12<br />

Wilson, Ann 14 2 17 11<br />

LIST Students<br />

( SPLIT 2, CARRY Age Sex,<br />

INDEX Seq, CREATE Test.Score Test? ) $<br />

Test<br />

Age Sex Seq Score<br />

11 1 1 16<br />

11 1 2 12<br />

14 2 1 17<br />

14 2 2 11<br />

__________________________________________________________________________<br />

8.16 Ordering Variables with STEP and CYCLE<br />

The order of <strong>the</strong> variables in a file is sometimes not <strong>the</strong> desired one. Variables may be rearranged by using <strong>the</strong><br />

<strong>PPL</strong> instruction KEEP with variable selection and possibly a MASK, or within a SPLIT, by using CREATE and<br />

USE with lists of variables. In addition, if <strong>the</strong> variables are arranged in a regular pattern, <strong>the</strong>y may be ordered<br />

using <strong>the</strong> STEP and CYCLE options, which permit more concise specification when <strong>the</strong>re are many variables.<br />

The STEP option selects every second variable when its argument is two, every third variable when its argument<br />

is three, and so on. For example, given a file with 26 variables named A <strong>to</strong> Z, this:<br />

SPLIT 13, USE ( A TO Z) STEP 2;<br />

selects every second variable between A and Z, beginning with A.<br />

STEP moves through <strong>the</strong> list of variables selecting <strong>the</strong> first (A), advancing <strong>the</strong> step size (2), selecting <strong>the</strong> designated<br />

variable (C), and so on, until <strong>the</strong> list is exhausted. The number of variables selected by <strong>the</strong> STEP<br />

procedure must be a multiple of <strong>the</strong> number of variables required by <strong>the</strong> SPLIT function. In <strong>the</strong> prior example, 13<br />

variables are selected from <strong>the</strong> 26 variables in <strong>the</strong> USE list, and <strong>the</strong>se are divided in<strong>to</strong> 13 cases. There is one variable<br />

per case. The USE list should be specified “B TO Z” if every o<strong>the</strong>r variable beginning with B is <strong>to</strong> be selected.<br />

When STEP and CYCLE are used, <strong>the</strong> variable list following USE or CREATE must be enclosed in paren<strong>the</strong>ses.<br />

CYCLE works in a similar manner, except that when <strong>the</strong> variable list is exhausted, CYCLE goes back <strong>to</strong> <strong>the</strong> beginning<br />

of <strong>the</strong> list and begins selecting from <strong>the</strong> unused variables. (STEP does not return <strong>to</strong> <strong>the</strong> start of <strong>the</strong> list.)


8.18 <strong>PPL</strong>: Across-Case Modifications<br />

Because <strong>the</strong> initial starting place in <strong>the</strong> list changes when CYCLE is used, different variables are selected in each<br />

iteration. The number of iterations depends on <strong>the</strong> CYCLE argument and <strong>the</strong> number of variables in <strong>the</strong> USE list:<br />

SPLIT 6, USE ( V(1) TO V(12) ) CYCLE 3 ;<br />

The CYCLE instruction selects variables 1, 4, 7, 10; 2, 5, 8, 11; and 3, 6, 9, 12; in that order.<br />

The selection order is a result of <strong>the</strong> initial variable in <strong>the</strong> USE list, <strong>the</strong> number of variables in <strong>the</strong> USE list,<br />

and <strong>the</strong> CYCLE argument. Ultimately, all <strong>the</strong> variables in <strong>the</strong> USE list are selected. Thus, CYCLE differs from<br />

STEP, where only a fraction of <strong>the</strong> variables in <strong>the</strong> USE or CREATE list are selected. Note that <strong>the</strong> number of<br />

variables in <strong>the</strong> variable list (12) must be a multiple of <strong>the</strong> number of cases in<strong>to</strong> which <strong>the</strong> current case is being<br />

SPLIT (6). This is true for both <strong>the</strong> STEP and CYCLE procedures. Also, both STEP and CYCLE must follow<br />

ei<strong>the</strong>r USE or CREATE and must not have a comma preceding <strong>the</strong>m.<br />

Figure 8.14 shows <strong>the</strong> differing results that depend on whe<strong>the</strong>r STEP or CYCLE is used. STEP moves<br />

through <strong>the</strong> entire USE list, beginning with <strong>the</strong> first variable (Q2) and selecting every o<strong>the</strong>r variable. Four variables<br />

are selected and that meets <strong>the</strong> requirement of this SPLIT that a multiple of 4 be chosen. Four variables split<br />

in<strong>to</strong> four cases yield one variable per case.<br />

__________________________________________________________________________<br />

Figure 8.14 Using STEP and CYCLE<br />

File F:<br />

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9<br />

1 2 3 4 5 6 7 8 9<br />

11 12 13 14 15 16 17 18 19<br />

LIST F<br />

[ SPLIT 4, INDEX A 2 B, INDEX C, CARRY Q1,<br />

USE ( Q2 TO Q9 ) xxxx 2 ] $<br />

produces: if xxxx = STEP if xxxx = CYCLE<br />

A B C Q1 Q2 Q2 Q4<br />

1 1 1 1 2 2 4<br />

1 2 2 1 4 6 8<br />

2 1 3 1 6 3 5<br />

2 2 4 1 8 7 9<br />

1 1 1 11 12 12 14<br />

1 2 2 11 14 16 18<br />

2 1 3 11 16 13 15<br />

2 2 4 11 18 17 19<br />

__________________________________________________________________________<br />

CYCLE moves through <strong>the</strong> entire USE list, beginning with <strong>the</strong> first variable and selecting every o<strong>the</strong>r variable<br />

also. However, since eight variables are cited in <strong>the</strong> USE list, CYCLE returns <strong>to</strong> <strong>the</strong> first unused variable<br />

(Q3) in <strong>the</strong> USE list, and begins selecting again. It keeps cycling until all of <strong>the</strong> variables in <strong>the</strong> USE list are selected.<br />

Eight variables are chosen, meeting <strong>the</strong> requirement of this SPLIT that a multiple of 4 variables be chosen.<br />

These are split in<strong>to</strong> four cases, yielding two variables per case.<br />

SPLIT 1 and Cycle can be used <strong>to</strong> rearrange <strong>the</strong> variables in a file so that <strong>the</strong> variables in <strong>the</strong> second half of<br />

<strong>the</strong> file are interleaved with <strong>the</strong> variables in <strong>the</strong> first half of <strong>the</strong> file.


<strong>PPL</strong>: Across-Case Modifications 8.19<br />

A1 A2 A3 A4 B1 B2 B3 B4 becomes<br />

A1 B1 A2 B2 A3 B3 A4 B4 with <strong>the</strong> following <strong>PPL</strong><br />

LIST AB [ SPLIT 1, USE ( V(1) .ON. ) CYCLE 4 ] $<br />

8.17 How SPLIT Interacts With O<strong>the</strong>r <strong>PPL</strong><br />

There can be only one SPLIT function per command. Normal <strong>PPL</strong> can precede or follow SPLIT. Using <strong>PPL</strong> first<br />

allows selection and modification of cases in <strong>the</strong> usual manner before SPLIT is used.<br />

The interaction of <strong>PPL</strong> with SPLIT is as follows:<br />

1. The first case passes from <strong>the</strong> first <strong>PPL</strong> phrase <strong>to</strong> <strong>the</strong> next, and it is modified and retained or deleted<br />

in <strong>the</strong> usual manner. If retained, it reaches <strong>the</strong> SPLIT instruction.<br />

2. When SPLIT receives <strong>the</strong> case, <strong>the</strong> current number of variables and <strong>the</strong>ir names change as <strong>the</strong> original<br />

case is split in<strong>to</strong> a number of new cases.<br />

3. The first of <strong>the</strong>se new cases passes <strong>to</strong> subsequent <strong>PPL</strong> phrases, one after ano<strong>the</strong>r, until it is deleted<br />

or retained and received by <strong>the</strong> command in use. The second new case <strong>the</strong>n passes <strong>to</strong> <strong>the</strong> <strong>PPL</strong> phrases<br />

following <strong>the</strong> SPLIT instruction, and so on, until all of <strong>the</strong> new cases resulting from <strong>the</strong> split of <strong>the</strong><br />

first original case have passed through.<br />

4. The second original case is now processed. It passes <strong>to</strong> <strong>the</strong> <strong>PPL</strong> preceding <strong>the</strong> SPLIT, and <strong>the</strong>n it<br />

passes <strong>to</strong> <strong>the</strong> SPLIT instruction. It is split in<strong>to</strong> multiple new cases, which each pass in turn <strong>to</strong> <strong>the</strong><br />

<strong>PPL</strong> following <strong>the</strong> SPLIT. When all <strong>the</strong> new cases resulting from splitting <strong>the</strong> second case have been<br />

processed, <strong>the</strong> process begins again with <strong>the</strong> third original case.<br />

__________________________________________________________________________<br />

Figure 8.15 A Simple COLLECT<br />

File MyFile:<br />

Id Age Sex<br />

1 29 M<br />

2 26 F<br />

3 42 F<br />

4 - M<br />

LIST MyFile [ COLLECT 2 ] $<br />

Age Sex Age Sex<br />

Id.1 .1 .1 Id.2 .2 .2<br />

1 29 M 2 26 F<br />

3 42 F 4 - M<br />

__________________________________________________________________________<br />

8.18 THE COLLECT FUNCTION<br />

COLLECT is used <strong>to</strong> ga<strong>the</strong>r two or more adjacent cases in<strong>to</strong> a single larger case. This larger case can be used<br />

with <strong>PPL</strong> <strong>to</strong> modify related variables or <strong>to</strong> generate across-case statistics. After <strong>the</strong> <strong>PPL</strong>, SPLIT can be used <strong>to</strong>


8.20 <strong>PPL</strong>: Across-Case Modifications<br />

break <strong>the</strong> collected case back up in<strong>to</strong> its original cases with any new variables appended. Additional <strong>PPL</strong> can precede<br />

or follow <strong>the</strong> COLLECT function.<br />

COLLECT is always followed by an integer argument which indicates how many cases <strong>to</strong> collect:<br />

COLLECT 4<br />

This integer is <strong>the</strong> “COLLECT counter”. It specifies <strong>the</strong> maximum number of cases <strong>to</strong> collect. It can be an integer<br />

constant, a permanent scratch variable (#nn), or (in a macro) a temporary scratch variable (#n).<br />

Figure 8.15 illustrates a simple COLLECT. As <strong>the</strong> input file is processed, every two cases are collected in<strong>to</strong><br />

a single case. Because variable names in a P-<strong>STAT</strong> file must be unique, <strong>the</strong> variable names in <strong>the</strong> collected case<br />

have a suffix added <strong>to</strong> <strong>the</strong> original variable name. The maximum suffix value is equal <strong>to</strong> <strong>the</strong> COLLECT counter.<br />

Variables in <strong>the</strong> first case get a suffix of .1, variables in <strong>the</strong> second case get a suffix of .2, and so on. When <strong>the</strong><br />

number of cases in <strong>the</strong> file is not a multiple of <strong>the</strong> COLLECT counter, missing data are generated <strong>to</strong> fill <strong>the</strong> remaining<br />

variables in <strong>the</strong> final case.<br />

8.19 Collecting BY Groups<br />

Usually, in a COLLECT situation, <strong>the</strong> number of cases <strong>to</strong> be collected is not a constant. All households are not<br />

<strong>the</strong> same size. The BY option may be used with COLLECT <strong>to</strong> specify <strong>the</strong> variable or variables indicating group<br />

membership. When BY is used, <strong>the</strong> COLLECT counter indicates a maximum number of cases <strong>to</strong> collect, ra<strong>the</strong>r<br />

than an absolute number of cases. A maximum of 999 cases may be collected at once. However, if <strong>the</strong>re are a<br />

great many variables in <strong>the</strong> file, <strong>the</strong> actual maximum will be less due <strong>to</strong> memory size limitations.<br />

Figure 8.16 illustrates using COLLECT with BY. When <strong>the</strong> value of <strong>the</strong> variable House.Id changes, <strong>the</strong> end<br />

of a group is signaled and <strong>the</strong> current COLLECT is considered complete. When COLLECT 4 is specified and a<br />

Household has only three members, missing data are generated in that collected case for <strong>the</strong> variables with <strong>the</strong><br />

suffix .4. When a household has only two members, missing data are generated for all <strong>the</strong> .3 and .4 variables. A<br />

household with more than four members causes an error because <strong>the</strong> COLLECT counter is a maximum value. Notice<br />

that <strong>the</strong> variable defining group membership, House.Id, is carried only once in each collected case.<br />

__________________________________________________________________________<br />

Figure 8.16 Collecting BY Group Membership<br />

File Caseload:<br />

House<br />

Id Sex Age<br />

1001 M 43<br />

1001 F 44<br />

1001 M 19<br />

1002 F 23<br />

1002 M 29<br />

LIST Caseload ( COLLECT 4, BY House.Id ) $<br />

House Sex Age Sex Age Sex Age Sex Age<br />

Id .1 .1 .2 .2 .3 .3 .4 .4<br />

1001 M 43 F 44 M 19 - -<br />

1002 F 23 M 29 - - - -<br />

__________________________________________________________________________


<strong>PPL</strong>: Across-Case Modifications 8.21<br />

8.20 CARRYing Common Information<br />

There may be several variables in a group of related cases that do not define group membership, but that are usually<br />

<strong>the</strong> same for all <strong>the</strong> cases in <strong>the</strong> group. For example, in <strong>the</strong> file Caseload, each case might have Address as a<br />

variable. This would normally be collected as Address.1, Address.2, and so on. This is reasonable when Address<br />

is expected <strong>to</strong> be different for each case. However, when Address has <strong>the</strong> same value for each case in a household<br />

group, it is more reasonable <strong>to</strong> have Address as a single variable in <strong>the</strong> collected case. Using CARRY followed<br />

by one or more variables:<br />

CARRY Address<br />

causes <strong>the</strong>se variables <strong>to</strong> be placed in <strong>the</strong> collected case only once.<br />

A CARRY variable may be missing for some of <strong>the</strong> cases. However, if it is not missing, it must be <strong>the</strong> same<br />

for <strong>the</strong> entire group of collected cases unless ei<strong>the</strong>r FIRST or LAST is also used as a collect option:<br />

CARRY Address, FIRST<br />

FIRST requests that <strong>the</strong> first non-missing value of <strong>the</strong> CARRY variable be used. LAST requests that <strong>the</strong> last nonmissing<br />

value be used. (Note that FIRST and LAST are not <strong>the</strong> previously described logical functions.)<br />

8.21 Ordering Cases with INDEX and SORT<br />

INDEX and SORT are COLLECT options which permit <strong>the</strong> individual cases <strong>to</strong> be placed in <strong>the</strong> collected case in<br />

a different order. INDEX may only be used when <strong>the</strong> BY option is also used.<br />

The use of INDEX is illustrated in Figure 8.17. The variable Visit indicates <strong>the</strong> order that each case should<br />

have in <strong>the</strong> collected case. The first patient has three visits in 1, 3, 2 order. When <strong>the</strong>y are collected with Visit as<br />

<strong>the</strong> INDEX variable, <strong>the</strong> second case is placed in <strong>the</strong> .3 position because <strong>the</strong> value of Visit is 3. The third case for<br />

that patient is placed in <strong>the</strong> .2 position because <strong>the</strong> value of visit is 2.<br />

__________________________________________________________________________<br />

Figure 8.17 Collecting Cases in a Specified Order<br />

File Patients:<br />

Id Visit WBC<br />

1354 1 98<br />

1354 3 72<br />

1354 2 70<br />

4211 2 83<br />

4211 3 85<br />

LIST Patients<br />

[ COLLECT 3, BY Id, INDEX Visit ] $<br />

Visit WBC Visit WBC Visit WBC<br />

Id .1 .1 .2 .2 .3 .3<br />

1354 1 98 2 70 3 72<br />

4211 - - 2 83 3 85<br />

__________________________________________________________________________<br />

An INDEX value may not be missing, but <strong>the</strong> set of index values for a given collect need not be complete.<br />

For example, in Figure 8.17, <strong>the</strong> patient with Id 4211 has values 2 and 3 for <strong>the</strong> INDEX variable Visit but no value<br />

1. INDEX values are assumed <strong>to</strong> be both: 1) within <strong>the</strong> range of <strong>the</strong> COLLECT counter, and 2) unique integers.


8.22 <strong>PPL</strong>: Across-Case Modifications<br />

If INDEX values are out of range or repeated, one of <strong>the</strong> options WARN, IGNORE, FIRST, or LAST may<br />

be used <strong>to</strong> prevent an error message and indicate what <strong>to</strong> do. When WARN is used, a warning message is printed.<br />

When IGNORE is used, out of range or repeated INDEX values are ignored. When FIRST is used, <strong>the</strong> first of <strong>the</strong><br />

cases with <strong>the</strong> repeated index are collected. When LAST is used, <strong>the</strong> last of <strong>the</strong> cases are collected. The o<strong>the</strong>r<br />

cases are ignored. WARN and IGNORE may be used only with BY and INDEX.<br />

SORT is ano<strong>the</strong>r way of rearranging <strong>the</strong> cases in <strong>the</strong> collected case. SORT is followed by one or more variables<br />

giving <strong>the</strong> sort order in which <strong>the</strong> collected cases should be arranged:<br />

SORT WBC<br />

The sort direction can be controlled:<br />

SORT WBC (D)<br />

by specifying a direction. An upwards (U) or downwards (D) sort may be specified. When a direction is not specified,<br />

an upwards sort is assumed. Figure 8.18 shows <strong>the</strong> results produced by SORT.<br />

SORT variables may not be BY or CARRY variables. The use of both INDEX and SORT is redundant — indexing<br />

is done before sorting. Thus, sorting may “undo” indexing.<br />

__________________________________________________________________________<br />

Figure 8.18 Sorting <strong>the</strong> Collected Case<br />

File Patients:<br />

Id Visit WBC<br />

1354 1 98<br />

1354 3 72<br />

1354 2 70<br />

4211 2 83<br />

4211 3 80<br />

LIST Patients [ COLLECT 3, BY Id, SORT WBC ] $<br />

Visit WBC Visit WBC Visit WBC<br />

Id .1 .1 .2 .2 .3 .3<br />

1354 2 70 3 72 1 98<br />

4211 2 83 3 85 - -<br />

__________________________________________________________________________<br />

25.21 Complex Modification Using COLLECT<br />

Usually when a case is collected, <strong>the</strong>re are additional <strong>PPL</strong> instructions <strong>to</strong> calculate summary statistics or <strong>to</strong> do<br />

cross-case comparisons or aggregations. Because <strong>the</strong> collected case has variables with <strong>the</strong> same prefix followed<br />

by .1, .2, and so on, <strong>the</strong> use of wildcards and DO loops is helpful in specifying <strong>the</strong> <strong>PPL</strong> instructions.<br />

Figure 8.19 illustrates a complex modification problem — locating all <strong>the</strong> salesmen in a department who earn<br />

more than <strong>the</strong>ir manager. It illustrates COLLECT, additional <strong>PPL</strong>, and finally a SPLIT <strong>to</strong> break <strong>the</strong> collected case<br />

back up in<strong>to</strong> individual cases.<br />

In Figure 8.19, a new variable, Total.Pay, and a scratch variable, #Mgr.Total.Pay, are generated. #Mgr.Total.Pay<br />

is initialized. The COLLECT counter is set <strong>to</strong> 20, and cases are collected by Department. The five<br />

variables, Name, Position, Salary, Commission and Total.Pay, are each represented 20 times in <strong>the</strong> collected case<br />

as variables of <strong>the</strong> same names but with suffixes .1 <strong>to</strong> .20. The BY variable, Department, is present only once.


<strong>PPL</strong>: Across-Case Modifications 8.23<br />

When a complete department is collected in<strong>to</strong> one case, <strong>the</strong>re are 101 variables (1, plus 5 times 20) in <strong>the</strong> collected<br />

case even though a given department may have fewer than 20 members.<br />

When <strong>the</strong> manager’s <strong>to</strong>tal pay is located, <strong>the</strong> scratch variable #Mgr.Total.Pay is set equal <strong>to</strong> it. The value of<br />

#Mgr.Total.Pay is compared with Total.Pay for each salesman.<br />

__________________________________________________________________________<br />

Figure 8.19 A Complex Modification Problem<br />

FILE Staff:<br />

Department Name Position Salary Commission<br />

Furniture Adams Manager 20540 2875.25<br />

Furniture Brown Sales 17000 7230.80<br />

Hardware Mason Sales 16000 952.65<br />

Hardware Smith Manager 20300 862.95<br />

Hardware Green Sales 17000 4495.50<br />

LIST Staff<br />

[ GENERATE Total.Pay = Salary + Commission;<br />

GENERATE #Mgr.Total.Pay = 0 ;<br />

COLLECT 20, BY Department ]<br />

[ DO #J = 1, 20;<br />

IF Position?(#J) EQ 'Manager',<br />

SET #Mgr.Total.Pay = Total.Pay?(#J);<br />

ENDDO;<br />

DO #J = 1, 20;<br />

IF Total.Pay?(#J) LE #Mgr.Total.Pay,<br />

SET Total.Pay?(#J) = 0;<br />

ENDDO ]<br />

[ SPLIT ;<br />

IF Total.Pay GT 0, RETAIN ],<br />

COMMAS, MIN.PLACES 2 $<br />

Total<br />

Department Name Position Salary Commission Pay<br />

Furniture Brown Sales 17,000 7,230.80 24,230.80<br />

Hardware Green Sales 17,000 4,495.50 21,495.50<br />

__________________________________________________________________________<br />

The first DO loop goes from 1 <strong>to</strong> 20 <strong>to</strong> match <strong>the</strong> maximum possible size of <strong>the</strong> COLLECT. Note: instead of<br />

<strong>the</strong> constant 20 we could use <strong>the</strong> system variable .COLLECTSIZE., which is <strong>the</strong> number of cases found in <strong>the</strong><br />

most recent collect. A powerful attribute of wildcard notation is illustrated:<br />

[ DO #J = 1, 20 ;<br />

IF Position?(#J) EQ 'Manager',<br />

SET #Mgr.Total.Pay = Total.Pay?(#J);<br />

ENDDO ]


8.24 <strong>PPL</strong>: Across-Case Modifications<br />

One <strong>PPL</strong> instruction using wildcard notation takes <strong>the</strong> place of twenty instructions without it. Using a wildcard<br />

creates a vec<strong>to</strong>r of all <strong>the</strong> variables which begin with <strong>the</strong> wildcard prefix or suffix. The DO loop scratch<br />

variable #J accesses specified locations in this vec<strong>to</strong>r, just as it accesses locations in <strong>the</strong> V and P vec<strong>to</strong>rs.<br />

In Figure 8.19, each of <strong>the</strong> 20 variables which begin with “Position” is tested in turn <strong>to</strong> find <strong>the</strong> one with <strong>the</strong><br />

value “Manager”. If “Manager” is found in <strong>the</strong> fourteenth Position? variable, <strong>the</strong>n #Mgr.Total.Pay is set equal <strong>to</strong><br />

<strong>the</strong> fourteenth Total.Pay? variable.<br />

The second DO loop examines <strong>the</strong> Total.Pay of each salesman. Again <strong>the</strong> wildcard notation and loop scratch<br />

variable simplify <strong>the</strong> procedure:<br />

DO #J = 1, 20 ;<br />

IF Total.Pay?(#J) LE #Mgr.Total.Pay,<br />

SET Total.Pay?(#J) EQ 0 ; ENDDO;<br />

Each value in <strong>the</strong> vec<strong>to</strong>r of Total.Pay variables is tested, and any value that is less than or equal <strong>to</strong> #Mgr.Total.Pay<br />

is set <strong>to</strong> zero.<br />

The instruction:<br />

SPLIT ;<br />

is a special form of <strong>the</strong> SPLIT function that res<strong>to</strong>res collected cases and variable names <strong>to</strong> <strong>the</strong>ir original form. It<br />

uses <strong>the</strong> .1, .2 suffixes <strong>to</strong> ascertain <strong>the</strong> original number of cases and variable names. The order of <strong>the</strong> split cases<br />

may be somewhat different if SORT or INDEX is used in <strong>the</strong> COLLECT or if new variables are generated. The<br />

instruction:<br />

SPLIT *;<br />

produces all possible cases resulting from a COLLECT, even if no such cases existed before <strong>the</strong> COLLECT.<br />

Some cases may have all missing values of <strong>the</strong> suffixed variables when SPLIT * is used, whereas when SPLIT is<br />

used, only cases with at least one non-missing value of a suffixed variable are produced.)<br />

In Figure 8.19, when <strong>the</strong> collected case is split back up, each Department splits back in<strong>to</strong> its original cases.<br />

The final <strong>PPL</strong> instruction:<br />

IF Total.Pay GT 0, RETAIN ]<br />

only retains cases where Total.Pay is greater than zero.<br />

In <strong>PPL</strong> where COLLECT has been used, a DO loop with a scratch variable and <strong>the</strong> wildcard notation are very<br />

convenient for referring <strong>to</strong> <strong>the</strong> collected variables. This is <strong>the</strong> case in <strong>the</strong> prior example in Figure 8.19. It is important,<br />

however, that <strong>the</strong> variable name prefixes (<strong>the</strong> part preceding <strong>the</strong> ?) be unique. This example gives<br />

unexpected results:<br />

[ KEEP ID Policy.No Agent Amount Age Class;<br />

COLLECT 4, BY ID;<br />

DO #J = 1, 4;<br />

IF Age?(#J) LT 18, SET Class?(#J) = 0;<br />

ENDDO;<br />

After <strong>the</strong> COLLECT takes place, all <strong>the</strong> variables (except <strong>the</strong> BY variable) have names such as Policy.No.1,<br />

Policy.No.2, Policy.No.3, Policy.No.4, Agent.1, Agent.2, and so on. The notation “Class?(#J)” in <strong>the</strong> prior example<br />

refers <strong>to</strong> <strong>the</strong> first #J variables (<strong>the</strong> first four since #J takes on <strong>the</strong> values 1 TO 4) beginning with “Class”. Thus,<br />

when #J = 1, if <strong>the</strong> result of <strong>the</strong> IF test is true, <strong>the</strong> variable Class.1 is set <strong>to</strong> 0. When #J = 2, if <strong>the</strong> IF test is true,<br />

Class.2 is set <strong>to</strong> 0, and so on.<br />

Similarly, <strong>the</strong> notation “Age?(#J)” refers <strong>to</strong> <strong>the</strong> first #J variables beginning with “Age”. These are Agent.1,<br />

Agent.2, Agent.3 and Agent.4, and not <strong>the</strong> intended variables Age.1, Age.2, Age.3 and Age.4. This is because <strong>the</strong><br />

Agent variables precede <strong>the</strong> Age variables. “Age?(#J)” is not specific enough <strong>to</strong> refer <strong>to</strong> just <strong>the</strong> Age variables;


<strong>PPL</strong>: Across-Case Modifications 8.25<br />

“Age.?(#J)” (with <strong>the</strong> dot) is unique. Remember, we want <strong>to</strong> wildcard against variables Age.1, Age.2, Age.3 and<br />

Age.4 .<br />

__________________________________________________________________________<br />

Figure 8.20 A Second Complex Problem<br />

File Patients:<br />

Last First<br />

ID Name Name Date Diagnosis Description Charges<br />

12425 Adams John 831105 - Room Fee 35.95<br />

12425 Adams John 831104 - Lab Tests 182.45<br />

12425 Adams John 831106 Ulcer Diagnosis -<br />

15743 Blair Sally 831221 - Blood Tests 36.00<br />

15743 Blair Sally 831222 Kidney S<strong>to</strong>nes Diagnosis -<br />

15743 Blair Sally 831221 - Room Fee 35.00<br />

15743 Blair Sally 831222 - Surgery 745.25<br />

15743 Blair Sally 831222 - Room Fee 35.00<br />

15743 Blair Sally 831223 - Room Fee 35.00<br />

15743 Blair Sally 831223 - Blood Tests 45.00<br />

12269 Knox Tom 840304 - Lab Tests 69.50<br />

12269 Knox Tom 840304 - Room Fee 35.00<br />

12269 Knox Tom 840305 - Cat Scan 545.00<br />

12269 Knox Tom 840306 - Room Fee 35.00<br />

12269 Knox Tom 840306 Brain Tumor Diagnosis -<br />

12269 Knox Tom 840305 - Room Fee 35.00<br />

LIST Patients<br />

Resulting Listing:<br />

[ COLLECT 10, BY ID,<br />

CARRY Last.Name First.Name, SORT Date ;<br />

GENERATE Diagnosis:C32 = FIRST.GOOD (Diagnosis?) ;<br />

GENERATE Total.Charges = SUM.GOOD (Charges?) ;<br />

GENERATE Admit.Date = Date.1 ;<br />

GENERATE Discharge.Date = LAST.GOOD (Date?) ;<br />

KEEP Last.Name First.name .NEW. ] $<br />

Last First Total Admit Discharge<br />

Name Name Diagnosis Charges Date Date<br />

Adams John Ulcer 218.40 110483 110683<br />

Blair Sally Kidney S<strong>to</strong>nes 931.25 122183 122383<br />

Knox Tom Brain Tumor 719.50 30484 30684<br />

__________________________________________________________________________


8.26 <strong>PPL</strong>: Across-Case Modifications<br />

Ano<strong>the</strong>r example using COLLECT is illustrated in Figure 8.20. Whereas <strong>the</strong> problem in Figure 8.16 could<br />

be solved in o<strong>the</strong>r, perhaps simpler, ways, <strong>the</strong> report produced in Figure 8.20 would be extremely difficult <strong>to</strong> do<br />

without a function such as COLLECT, and it would require several steps using <strong>the</strong> SORT and COLLATE commands.<br />

With COLLECT, often only a single command is needed <strong>to</strong> produce a complex report.<br />

Figure 8.21 shows <strong>the</strong> variables and values for a single patient immediately after <strong>the</strong> COLLECT step:<br />

COLLECT 10, BY ID, CARRY Last.Name First.Name, SORT Date ;<br />

Each input case has seven variables. A collected case has 43 variables. There are 3 CARRY variables plus 10<br />

times <strong>the</strong> 4 remaining variables. Because <strong>the</strong> COLLECT is done using <strong>the</strong> SORT instruction, a patient’s cases are<br />

rearranged by Date so that <strong>the</strong> case with earliest date will be in <strong>the</strong> .1 position. The case with <strong>the</strong> next earliest date<br />

will be in <strong>the</strong> .2 position, and so on. The case with <strong>the</strong> last date can be located by looking for <strong>the</strong> last non-missing<br />

value for a Date? variable.<br />

__________________________________________________________________________<br />

Figure 8.21 Before and After COLLECT<br />

John Adams' Records Before and After COLLECT:<br />

BEFORE: There are 7 variables in each case.<br />

Last First<br />

ID Name Name Date Diagnosis Description Charges<br />

12425 Adams John 831105 - Room Fee 35.95<br />

12425 Adams John 831104 - Lab Tests 182.45<br />

12425 Adams John 831106 Ulcer Diagnosis -<br />

AFTER: There are 43 variables in <strong>the</strong> collected case.<br />

Id Last.Name First.Name<br />

12425 Adams John<br />

Date.1 Diagnosis.1 Description.1 Charges.1<br />

831104 - Lab Tests 182.45<br />

Date.2 Diagnosis.2 Description.2 Charges.2<br />

831105 - Room Fee 35.95<br />

Date.3 Diagnosis.3 Description.3 Charges.3<br />

831106 Ulcer Diagnosis -<br />

Date.4 Diagnosis.4 Description.4 Charges.4<br />

- - - -<br />

- - - -<br />

- - - -<br />

Date.10 Diagnosis.10 Description.10 Charges.10<br />

__________________________________________________________________________


<strong>PPL</strong>: Across-Case Modifications 8.27<br />

Because a suffix is appended on<strong>to</strong> each of <strong>the</strong> collected variables, Diagnosis is now Diagnosis.1 <strong>to</strong> Diagnosis.10.<br />

Therefore, this instruction (in Figure 8.20):<br />

GENERATE Diagnosis:C32 = FIRST.GOOD (Diagnosis?) ;<br />

does not cause a variable name conflict. The new variable Diagnosis is given <strong>the</strong> value of <strong>the</strong> first non-missing<br />

value of any variable beginning with “Diagnosis”. For John Adams, Diagnosis.1 and Diagnosis.2 are missing, but<br />

Diagnosis.3 is not missing. Its value, “Ulcer”, is used as <strong>the</strong> value of <strong>the</strong> newly generated variable Diagnosis.<br />

Creation of <strong>the</strong> o<strong>the</strong>r three new variables is similar. Total.Charges is <strong>the</strong> sum of all <strong>the</strong> non-missing values of any<br />

variable which begins with <strong>the</strong> prefix “Charges”:<br />

GENERATE Total.Charges = SUM.GOOD (Charges?) ;<br />

Because of <strong>the</strong> sort order, Admit.Date is Date.1 :<br />

GENERATE Admit.Date = Date.1 ;<br />

Even though it is not known how many cases were collected, it is easy <strong>to</strong> locate Discharge.Date with <strong>the</strong><br />

LAST.GOOD function:<br />

GENERATE Discharge.Date = LAST.GOOD (Date?) ;<br />

The last good (non-missing) value of any variable which begins with “Date” becomes <strong>the</strong> Discharge.Date. For<br />

John Adams, this is 831106, <strong>the</strong> value of Date.3.<br />

The final <strong>PPL</strong> in Figure 8.20 is a KEEP <strong>to</strong> select <strong>the</strong> variables Last.Name, First.Name and all <strong>the</strong> new (.NEW.)<br />

variables generated by <strong>the</strong> <strong>PPL</strong>.<br />

8.22 COLLECT System Variables<br />

COLLECT sets 5 system variables.<br />

1. .COLLECTSIZE. The number of cases in <strong>the</strong> most recent collect<br />

2. .COLLECTMIN. The size of <strong>the</strong> smallest collected group so far<br />

3. .COLLECTMAX. The size of <strong>the</strong> largest collected group so far<br />

4. .COLLECTIONS. The <strong>to</strong>tal number of collects that occured<br />

5. .COLLECTSUM. The number of cases that have been collected.<br />

For example, if a file is collected by household number:<br />

1. .COLLECTSIZE. The size of <strong>the</strong> household most recently collected (<strong>the</strong> current household)<br />

2. .COLLECTMIN. The smallest household<br />

3. .COLLECTMAX. The largest household<br />

4. .COLLECTIONS. The <strong>to</strong>tal number of households<br />

5. .COLLECTSUM. The number of people in all collected households<br />

These variables are reset as each new COLLECT occurs. Thus, if a file has 312 households, <strong>the</strong> 5 variables are<br />

reset 312 times. As a result, .COLLECTSIZE., for example, can by used in <strong>the</strong> <strong>PPL</strong> following a collect:<br />

LIST House<br />

( COLLECT 10, BY household )<br />

( DO #j = 1, .COLLECTSIZE. ) etc....<br />

The final settings remain until some later COLLECT begins reading cases anew.


8.28 <strong>PPL</strong>: Across-Case Modifications<br />

<strong>PPL</strong><br />

Across-case modification and aggregation are facilitated by:<br />

SUMMARY<br />

• Scratch Variables,<br />

• <strong>the</strong> Permanent Vec<strong>to</strong>r,<br />

• user-defined multi-dimensional arrays, and<br />

• <strong>the</strong> programming language functions FIRST, LAST, SPLIT and COLLECT.<br />

Scratch Variables have no position in a file. They are created using GENERATE followed by a name<br />

starting with one or two pound signs (#). The values of scratch variables created with one pound sign<br />

remain only for <strong>the</strong> duration of a command or macro. The values of scratch variables created with two<br />

pound signs remain for <strong>the</strong> duration of <strong>the</strong> run. They are explicitly changed with SET.<br />

The Permanent Vec<strong>to</strong>r is similar in behavior <strong>to</strong> scratch variables except that it has <strong>the</strong> name P assigned<br />

<strong>to</strong> it, and individual positions in P are located by subscript. The subscripts can be calculated. Permanent<br />

variables are set with SET. The P vec<strong>to</strong>r may only contain numeric values but <strong>the</strong>se values may be passed<br />

not only across cases of a file, but between commands.<br />

The wildcard character ? may be used <strong>to</strong> reference <strong>the</strong> suffixed variables created by COLLECT (as well<br />

as any o<strong>the</strong>r variables with a common prefix or suffix):<br />

[ COLLECT 20, BY Department;<br />

DO #J = 1, 20 ;<br />

IF Position?(#J) EQ 'Manager',<br />

SET #Mgr.Total.Pay = Total.Pay?(#J) )<br />

ENDDO ]<br />

“Position?(#J)” refers in turn <strong>to</strong> <strong>the</strong> first #J variables (20 in <strong>the</strong> above example) that begin with “Position”.<br />

ARRAY Commands<br />

An array can have up <strong>to</strong> 7 dimensions, and can be character or numeric. Array names have two characters,<br />

<strong>the</strong> second being <strong>the</strong> same as <strong>the</strong> first, like XX or cc or Zz. Case doesn’t matter. There can be up <strong>to</strong><br />

26 active arrays.<br />

DEFINE.ARRAY cc ( n,n,...)<br />

defines <strong>the</strong> array and (optionally) initializes it. Character arrays are declared by adding <strong>the</strong> desired character<br />

length immediately following <strong>the</strong> array name and a colon, i.e. AA:20 .<br />

DEFINE.ARRAY AA (5,8) TO 0 $<br />

DEFINE.ARRAY CC:20 (13,3) <strong>to</strong> ' ' $<br />

SHOW.ARRAYS<br />

reports on <strong>the</strong> status of all <strong>the</strong> defined arrays<br />

DROP.ARRAY aa zz<br />

requests that <strong>the</strong> listed arrays be dropped so that <strong>the</strong> space can be reused.<br />

nn=number list=variable list vn=variable name


<strong>PPL</strong>: Across-Case Modifications 8.29<br />

DROP.P.VECTOR<br />

releases <strong>the</strong> space normally used for <strong>the</strong> P vec<strong>to</strong>r and makes it available for use in arrays.<br />

<strong>PPL</strong> Functions: Across-Case<br />

FIRST (vn or .FILE.)<br />

is evaluated as true if this is <strong>the</strong> first case in <strong>the</strong> subgroup specified in <strong>the</strong> expression, and false if it is not<br />

<strong>the</strong> first case. The required expression is a variable name (vn) or a list of variables, or <strong>the</strong> system value<br />

.FILE. (meaning <strong>the</strong> entire file):<br />

IF FIRST (District, Department), SET P(1) = 0;<br />

Changing values of <strong>the</strong> variable or variables in an ordered file define different subgroups.<br />

LAST (vn or .FILE.)<br />

is evaluated as true if this is <strong>the</strong> last case in <strong>the</strong> subgroup specified in <strong>the</strong> expression, and false if it is not<br />

<strong>the</strong> last case. The required expression is a variable name (vn) or a list of variables, or <strong>the</strong> system value<br />

.FILE. (meaning <strong>the</strong> entire file):<br />

IF LAST (.FILE.), RETAIN;<br />

Changing values of <strong>the</strong> variable or variables define different subgroups.<br />

COLLECT nn<br />

specifies <strong>the</strong> number of adjacent cases <strong>to</strong> collect in<strong>to</strong> one case. Additional <strong>PPL</strong> may precede or follow<br />

<strong>the</strong> COLLECT function. <strong>PPL</strong> which follows COLLECT operates on <strong>the</strong> new longer case. A common<br />

usage is <strong>to</strong> COLLECT cases, do modifications, and <strong>the</strong>n SPLIT <strong>the</strong> long case back in<strong>to</strong> <strong>the</strong> original number<br />

of cases. Using (SPLIT) or (SPLIT *) “undoes” COLLECT. A number of additional options may be<br />

used. They must follow <strong>the</strong> COLLECT:<br />

LIST Patients [ COLLECT 4, BY Id, INDEX Visit ] $<br />

The options in <strong>the</strong> following list specify <strong>the</strong> cases <strong>to</strong> be collected and <strong>the</strong> variables in <strong>the</strong> new case.<br />

1. BY vn or list<br />

specifies one or more character and/or numeric variables that identify <strong>the</strong> cases that belong<br />

<strong>to</strong> a subgroup. The input file should be grouped or sorted by <strong>the</strong>se variables.<br />

Those cases with <strong>the</strong> same values of <strong>the</strong> BY variables, that is, members of <strong>the</strong> same<br />

subgroup, are collected in<strong>to</strong> one case. Values of missing1, missing2 and missing3 also<br />

define membership in different subgroups.<br />

When BY is used, <strong>the</strong> number of cases <strong>to</strong> be collected must still be specified. That<br />

number defines <strong>the</strong> maximum number of members of a subgroup. BY variables appear<br />

(are carried) only once in <strong>the</strong> new case.<br />

2. CARRY vn or list<br />

implies that <strong>the</strong> values of <strong>the</strong> specified variables are <strong>the</strong> same for all members of a subgroup,<br />

and that those variables should appear only once in <strong>the</strong> new case produced by<br />

COLLECT. If a value is missing, <strong>the</strong> first non-missing value for a CARRY variable is<br />

used. If <strong>the</strong> values differ, an error occurs unless FIRST and LAST are used.<br />

3. FIRST<br />

specifies that <strong>the</strong> first case be selected for collection if values of <strong>the</strong> INDEX variable<br />

are repeated or if values of <strong>the</strong> CARRY variable differ.<br />

vn=variable name nn=number list=variable list


8.30 <strong>PPL</strong>: Across-Case Modifications<br />

4. IGNORE<br />

specifies that any case with a value of <strong>the</strong> index variable that is repeated or out of range<br />

should be ignored. IGNORE can only be used with BY and INDEX.<br />

5. INDEX vn<br />

specifies a numeric variable whose values determine <strong>the</strong> order that <strong>the</strong> cases in a subgroup<br />

take in <strong>the</strong> collected case. INDEX values may not be missing or exceed <strong>the</strong><br />

COLLECT counter (<strong>the</strong> number of cases <strong>to</strong> be collected), without <strong>the</strong> use of IGNORE,<br />

WARN, FIRST, or LAST as well. INDEX may not be used without BY.<br />

6. LAST<br />

specifies that <strong>the</strong> last case be selected for collection if values of <strong>the</strong> INDEX variable<br />

are repeated or if values of <strong>the</strong> CARRY variable differ.<br />

7. SORT vn or list<br />

requests that <strong>the</strong> collected cases be sorted by <strong>the</strong> specified variables before being<br />

placed in <strong>the</strong> new, long case. SORT variables may not be BY or CARRY variables.<br />

The use of both INDEX and SORT is redundant, and since sorting is done after indexing,<br />

SORT “undoes” INDEX.<br />

8. WARN<br />

COLLECT System Variables<br />

An upward sort (U) or a downward sort (D) may be specified:<br />

[ COLLECT 5, BY Household,<br />

CARRY Last.Name, SORT Age (D) ]<br />

An upward sort is assumed when sort order is not explicitly specified.<br />

requests that a warning be printed if a case has a value on <strong>the</strong> index variable that is repeated<br />

or out of range. WARN may not be used without BY and INDEX.<br />

1. .COLLECTSIZE. The number of cases in <strong>the</strong> most recent collect<br />

2. .COLLECTMIN. The size of <strong>the</strong> smallest collected group so far<br />

3. .COLLECTMAX. The size of <strong>the</strong> largest collected group so far<br />

4. .COLLECTIONS. The <strong>to</strong>tal number of collects that occured<br />

5. .COLLECTSUM. The number of cases that have been collected.<br />

SPLIT INTO nn<br />

requests that each current case be split in<strong>to</strong> <strong>the</strong> specified number of cases. (The word INTO is optional.)<br />

Additional <strong>PPL</strong> may precede or follow <strong>the</strong> SPLIT function, but <strong>the</strong>re may be only one SPLIT per command.<br />

A number of options may be used. They must follow <strong>the</strong> SPLIT:<br />

LIST Filename<br />

[ SPLIT INTO 2, CARRY Name, INDEX Term 2,<br />

CREATE Grade (Grade.2 TO Grade.4) STEP 2 ] $<br />

The following options control <strong>the</strong> order in which <strong>the</strong> variables in SPLIT cases are placed in<strong>to</strong> <strong>the</strong> output<br />

cases and <strong>the</strong> naming of <strong>the</strong> variables which are created:<br />

1. CARRY vn or list<br />

specifies one or more variables whose values are <strong>to</strong> be carried in every case created by<br />

SPLIT.<br />

2. CYCLE nn<br />

specifies <strong>the</strong> size of steps <strong>to</strong> be taken in selecting variables <strong>to</strong> be used in <strong>the</strong> SPLIT.<br />

CYCLE follows a USE or CREATE variable list without a comma preceding it. The<br />

nn=number list=variable list vn=variable name


<strong>PPL</strong>: Across-Case Modifications 8.31<br />

3. CREATE<br />

first variable is used, <strong>the</strong> variable “nn” away from <strong>the</strong> first is used next, and so on. Multiple<br />

passes or cycles are made through <strong>the</strong> variable list until all of <strong>the</strong> variables in <strong>the</strong><br />

list are used.<br />

new.vn vn or new.vn list<br />

provides a new variable name and gives <strong>the</strong> current variables whose values are <strong>to</strong> be<br />

used in <strong>the</strong> split cases. They will be <strong>the</strong> values of <strong>the</strong> new variable. The number of<br />

variables <strong>to</strong> be used must be <strong>the</strong> same as “nn” (<strong>the</strong> number of cases in<strong>to</strong> which <strong>the</strong> current<br />

case is <strong>to</strong> be SPLIT).<br />

4. INDEX new.vn nn<br />

specifies that a new variable be present in each case created by SPLIT. That variable<br />

is an index with values going from 1 <strong>to</strong> “nn”. Multiple indices may be created, but <strong>the</strong><br />

product of <strong>the</strong> index values (<strong>the</strong> “nn’s”) must be equal <strong>to</strong> <strong>the</strong> number of cases created<br />

by SPLIT.<br />

5. STEP nn<br />

specifies <strong>the</strong> size of steps <strong>to</strong> be taken in selecting variables <strong>to</strong> be used in <strong>the</strong> SPLIT.<br />

STEP follows a USE or CREATE list without a comma preceding it. The first variable<br />

is used, <strong>the</strong> variable “nn” away from <strong>the</strong> first is used next, and so on. Only one pass<br />

through <strong>the</strong> variable list is made.<br />

6. USE vn or list<br />

specifies <strong>the</strong> variables <strong>to</strong> be used in <strong>the</strong> split case. They must be a multiple of “nn” (<strong>the</strong><br />

number of cases in<strong>to</strong> which <strong>the</strong> current case is <strong>to</strong> be SPLIT).<br />

SPLIT and SPLIT *<br />

These are special versions of SPLIT that “uncollect” a case created by COLLECT:<br />

[ SPLIT ] or [ SPLIT * ]<br />

SPLIT produces only those cases that have at least one non-missing value of a suffixed variable, whereas<br />

SPLIT * produces all possible cases from a COLLECT, even if no such cases existed before <strong>the</strong> COL-<br />

LECT. For example, if COLLECT 10 has been used, SPLIT * results in ten cases. SPLIT, on <strong>the</strong> o<strong>the</strong>r<br />

hand, produces 10 cases only if <strong>the</strong>re are some non-missing values of <strong>the</strong> suffixed variables (Test.10,<br />

Age.10 and so on).<br />

vn=variable name nn=number list=variable list


9<br />

<strong>PPL</strong>:<br />

Modification of Character Variables<br />

Character variables may be modified in many of <strong>the</strong> same ways that numeric variables are modified. However,<br />

since character and numeric variables have different properties, <strong>the</strong>re are several opera<strong>to</strong>rs and a number of functions<br />

that are specific <strong>to</strong> character variables.<br />

This chapter briefly discusses basic character procedures — <strong>the</strong> recoding of existing character variables, <strong>the</strong><br />

generation of new character variables and <strong>the</strong> logical testing of character values. The major portion of <strong>the</strong> chapter<br />

deals with special character opera<strong>to</strong>rs and functions that:<br />

• Test character variables<br />

• Trim and pad character strings<br />

• Left and right justify or center strings;<br />

• Extract substrings and access words within character strings;<br />

• Change character strings in<strong>to</strong> numeric values and vice-versa;<br />

• Concatenate character strings.<br />

9.1 BASIC CHARACTER PROCEDURES<br />

Data may be entered in a P-<strong>STAT</strong> system file as ei<strong>the</strong>r character strings — mixtures of letters, digits and o<strong>the</strong>r<br />

characters — or as numbers. Generally, it is clear which way a variable’s value should be entered. A person’s<br />

name is entered as a character variable, whereas his or her age is entered as a numeric variable. One can find a<br />

substring of Name and <strong>the</strong> mean Age, but <strong>the</strong> substring of Age and <strong>the</strong> mean Name do not make sense.<br />

Often <strong>the</strong>re is not a <strong>to</strong>tally clear-cut line between character and numeric data. There are situations in which<br />

a variable is coded with a character string when it really has some numeric attributes. The variable Sex, for example,<br />

may be coded with <strong>the</strong> numbers 1 and 2, or with <strong>the</strong> character strings “M” and “F”, or “Male” and<br />

“Female”. If such a variable is <strong>to</strong> be used in a listing, <strong>the</strong> character representation is preferred. If <strong>the</strong> variable is<br />

<strong>to</strong> be given <strong>to</strong> a correlation program, <strong>the</strong> numeric representation is necessary. In <strong>the</strong>se situations, functions are<br />

used <strong>to</strong> convert character representations in<strong>to</strong> numeric values, and numeric values in<strong>to</strong> character strings.<br />

P-<strong>STAT</strong> distinguishes between character and numeric values by using single or double quotes <strong>to</strong> enclose character<br />

strings. Numeric values are not enclosed in quotes. Thus, ’Sam Davis’ and ’924’ are character strings,<br />

whereas 924 is a numeric value.<br />

9.2 Generating New Character Variables<br />

Character variables are generated much as numeric ones are. However, when a character variable is created, it is<br />

necessary <strong>to</strong> specify that it is a character variable and, if it is o<strong>the</strong>r than 40 characters long, <strong>to</strong> specify its size. This<br />

instruction:<br />

[ GENERATE New.Name:C32 = Name ]<br />

creates a new variable named “New.Name”, that is 32 characters long (defined size 32) and equal <strong>to</strong> <strong>the</strong> value of<br />

<strong>the</strong> existing character variable named “Name”. If <strong>the</strong> variable Name has a length less than 32, <strong>the</strong> new variable<br />

New.Name will be padded with blanks on <strong>the</strong> right end until it is 32 characters long. If Name is longer than 32,<br />

characters will be truncated from <strong>the</strong> right end until only 32 characters remain.


9.2 <strong>PPL</strong>: Modification of Character Variables<br />

Character variables may be generated equal <strong>to</strong> a specific value:<br />

[ GENERATE City:C = 'Hous<strong>to</strong>n' ]<br />

The character variable named City, of size 40, is created and set equal <strong>to</strong> <strong>the</strong> string or value “Hous<strong>to</strong>n”, followed<br />

by nine blanks. The size of City is 40, since a size was not specified. Character variables may be generated with<br />

default names:<br />

[ DO #J = 1, 6; GEN ?:C = CHARACTER ( V(#J) ); ENDDO ]<br />

A character variable may be up <strong>to</strong> 50,000 characters long.<br />

9.3 Modifying Existing Character Variables<br />

Character variables are modified in much <strong>the</strong> same manner as numeric variables. The modification may be <strong>the</strong><br />

result of some logical test or may be an instruction by itself. A variable set equal <strong>to</strong> a character string must be a<br />

character variable. This instruction sets <strong>the</strong> variable State <strong>to</strong> <strong>the</strong> value “Iowa”:<br />

[ SET State = 'Iowa' ]<br />

This instruction sets <strong>the</strong> variable State <strong>to</strong> <strong>the</strong> value of <strong>the</strong> variable State.Name:<br />

[ SET State = State.Name ]<br />

Modification may occur as <strong>the</strong> result of a logical test:<br />

[ IF .N. EQ 10, SET Name = 'John Jones' ]<br />

The system variable .N., <strong>the</strong> case number, is tested as each case is processed. On <strong>the</strong> tenth case, <strong>the</strong> variable Name<br />

is set <strong>to</strong> <strong>the</strong> character string “John Jones”.<br />

9.4 Logical Selection of Character Variables<br />

The two logical opera<strong>to</strong>rs that operate in an identical fashion for both numeric and character data are equal (EQ)<br />

and not equal (NE). The o<strong>the</strong>r logical opera<strong>to</strong>rs work in a somewhat different manner. A character string being<br />

tested must be enclosed in quotes:<br />

[ IF Name EQ 'Jones', DELETE ]<br />

The concepts of less than and greater than are different. In P-<strong>STAT</strong>, <strong>the</strong>se opera<strong>to</strong>rs are honored for character<br />

data as a function of <strong>the</strong> sort sequence of each computing environment. The results may be different on different<br />

machines if <strong>the</strong> sort sequence of <strong>the</strong> characters is different.<br />

On computers using <strong>the</strong> ASCII character set (such as PC and SUN), <strong>the</strong> low <strong>to</strong> high order is numbers, uppercase<br />

letters, and lowercase letters. Most of <strong>the</strong> special characters, in particular, blank, are “lower” than ei<strong>the</strong>r<br />

letters or numbers.<br />

P-<strong>STAT</strong>, in its character comparisons, treats uppercase and lowercase letters as identical characters, that is,<br />

“A” = “a”, unless exact comparisons are specified. Such case-respecting comparisons are specified by prefacing<br />

logical opera<strong>to</strong>rs with “X” for eXact character comparisons:<br />

[ IF Grade XEQ 'f', SET Grade = 'I' ]<br />

The logical opera<strong>to</strong>rs that may be prefaced with “X” when <strong>the</strong>y are applied <strong>to</strong> character variables and values are:<br />

EQ, NE, LT, LE, GT, GE, AMONG and NOTAMONG.<br />

Logical opera<strong>to</strong>rs that have a list as <strong>the</strong>ir argument may be used with character strings in <strong>the</strong> same way that<br />

<strong>the</strong>y are used with numeric values:<br />

[ IF Name AMONG ( 'Jones' 'Smith' 'Wills' ), RETAIN ]


<strong>PPL</strong>: Modification of Character Variables 9.3<br />

Where <strong>the</strong> operation of <strong>the</strong> function depends on <strong>the</strong> concept of less than or greater than, <strong>the</strong> result will depend on<br />

<strong>the</strong> sort sequence on <strong>the</strong> individual computing environment. This example will continue all <strong>the</strong> cases that fall in<br />

<strong>the</strong> sort sequence between AAAA and AZZZ, that is, all <strong>the</strong> A’s:<br />

IF Name AMONG ( 'AAAA' TO 'AZZZ' ), RETAIN;<br />

The character opera<strong>to</strong>rs that parallel numeric opera<strong>to</strong>rs and that may be used in logical selection are:<br />

EQ NE LT LE GT GE<br />

XEQ XNE XLT XLE XGT XGE<br />

AMONG NOTAMONG GOOD MISSING<br />

XAMONG XNOTAMONG<br />

These opera<strong>to</strong>rs are discussed in <strong>the</strong> first <strong>PPL</strong> chapter. XEQ — <strong>the</strong> most useful of <strong>the</strong> exact character opera<strong>to</strong>rs,<br />

is fur<strong>the</strong>r explained later in this chapter in <strong>the</strong> section on character opera<strong>to</strong>rs.<br />

In addition <strong>to</strong> <strong>the</strong>se opera<strong>to</strong>rs, <strong>the</strong> opera<strong>to</strong>rs CONTAINS and MATCHES, which are specifically for character<br />

data, may be used in logical selection. They test if a character value contains a specific character string and if a<br />

character value matches a character string/wildcard combination. CONTAINS and MATCHES are explained in<br />

<strong>the</strong> section on character opera<strong>to</strong>rs.<br />

9.5 Locating Non-Missing Character Data<br />

The functions COUNT.GOOD, FIRST.GOOD and LAST.GOOD count or locate non-missing data values. These<br />

functions are used with character data <strong>the</strong> same way that <strong>the</strong>y are used with numeric data. The arguments for <strong>the</strong>se<br />

functions are lists of variable names or positions.<br />

The COUNT.GOOD function yields <strong>the</strong> number of non-missing (“good”) values of <strong>the</strong> variables specified in<br />

<strong>the</strong> list:<br />

[ GENERATE Count = COUNT.GOOD ( Midterm Final ) ]<br />

The variable Count is generated and set equal <strong>to</strong> <strong>the</strong> number of non-missing test scores of <strong>the</strong> variables Midterm<br />

and Final. Count is a numeric variable, even though Midterm and Final may be character variables. For each case<br />

in this example, <strong>the</strong> maximum possible value of Count is 2.<br />

The FIRST.GOOD and LAST.GOOD functions yield <strong>the</strong> first or last non-missing value of <strong>the</strong> variables specified<br />

in <strong>the</strong> list:<br />

[ GENERATE Name:C =<br />

FIRST.GOOD ( Last.Name First.Name Middle.Name ) ]<br />

The character variable Name is generated and set equal <strong>to</strong> <strong>the</strong> first non-missing value of <strong>the</strong> character variables<br />

Last.Name, First.Name and Middle.Name. The variables referenced in <strong>the</strong> function list can be referenced by name<br />

or position. The word TO, meaning all <strong>the</strong> variables from <strong>the</strong> first mentioned variable through <strong>the</strong> last mentioned<br />

variable, or <strong>the</strong> system variable .ON., meaning all <strong>the</strong> variables from <strong>the</strong> one mentioned through <strong>the</strong> last variable<br />

in <strong>the</strong> file, may be used in <strong>the</strong> function list. Ei<strong>the</strong>r of <strong>the</strong>se instructions:<br />

[ SET Last.Guess = LAST.GOOD ( V(1) TO V(9) ) ]<br />

[ SET Last.Guess = LAST.GOOD ( V(1) .ON. ) ]<br />

sets <strong>the</strong> variable Last.Guess <strong>to</strong> <strong>the</strong> value of <strong>the</strong> last non-missing variable in <strong>the</strong> list.<br />

9.6 CHARACTER OPERATORS<br />

Character opera<strong>to</strong>rs and character functions modify character expressions, variables and strings. Opera<strong>to</strong>rs generally<br />

have two operands, one before and one after <strong>the</strong> opera<strong>to</strong>r. Functions have single or multiple arguments that<br />

follow <strong>the</strong> function and are contained in paren<strong>the</strong>ses.


9.4 <strong>PPL</strong>: Modification of Character Variables<br />

9.7 The CONTAINS and XCONTAINS Opera<strong>to</strong>rs<br />

The opera<strong>to</strong>r CONTAINS tests if a character string is contained within <strong>the</strong> value of a character variable:<br />

[ IF Address CONTAINS 'NJ', RETAIN ]<br />

If <strong>the</strong> string “NJ” is contained anywhere within <strong>the</strong> variable Address, <strong>the</strong> case is continued.<br />

CONTAINS tests for <strong>the</strong> presence or absence of a string; <strong>the</strong> location of <strong>the</strong> string may be anywhere within<br />

<strong>the</strong> specified variable. To test for <strong>the</strong> absence of a string, preface <strong>the</strong> consequence with “F.” <strong>to</strong> indicate that it is<br />

done only when <strong>the</strong> IF test is false — that is, only when <strong>the</strong> string is not contained in <strong>the</strong> variable:<br />

[ IF Address CONTAINS 'NJ', F.RETAIN ]<br />

Alternatively, DELETE could be used instead of F.RETAIN in this situation.<br />

The XCONTAINS opera<strong>to</strong>r specifies case-respecting tests. The argument string, exactly as specified, must<br />

be contained within <strong>the</strong> value of <strong>the</strong> character variable:<br />

[ IF Comment XCONTAINS '<strong>STAT</strong>', SET Code = 1 ]<br />

CONTAINS and XCONTAINS are useful in locating cases with certain value strings when you do not know <strong>the</strong><br />

complete string or when <strong>the</strong> remainder of <strong>the</strong> string differs from case-<strong>to</strong>-case.<br />

9.8 The Concatenate Opera<strong>to</strong>r<br />

Character strings can be joined using <strong>the</strong> concatenate opera<strong>to</strong>r //. This opera<strong>to</strong>r abuts <strong>the</strong> value of one variable <strong>to</strong><br />

that of ano<strong>the</strong>r:<br />

[ GENERATE Name:C32 = First.Name // Last.Name ]<br />

If First.Name is 16 characters and Last.Name is 16 characters — for example:<br />

First Name Last Name<br />

Abe Adams<br />

Millicent Murphy<br />

Sharon Elizabeth Johnson-Mayfield<br />

<strong>the</strong> variable Name, created by <strong>the</strong> concatenation of <strong>the</strong> two strings, produces <strong>the</strong> following results:<br />

Name<br />

Abe Adams<br />

Millicent Murphy<br />

Sharon ElizabethJohnson-Mayfield<br />

The concatenate opera<strong>to</strong>r joins <strong>the</strong> input strings in <strong>the</strong>ir entirety. The shorter first names may incorporate<br />

more than <strong>the</strong> desired number of blanks, and <strong>the</strong> longer names may have no intervening blanks. A blank could be<br />

included in <strong>the</strong> concatenation:<br />

[ GEN Name:C32 = First.Name // ' ' // Last.Name ]<br />

This instruction joins <strong>to</strong>ge<strong>the</strong>r <strong>the</strong> three strings, First.Name, “ ” (a blank), and Last.Name. There will be at least<br />

one blank between <strong>the</strong> first and last names. The following results would be obtained:<br />

Name<br />

Abe Adams<br />

Millicent Murphy<br />

Sharon Elizabeth Johnson-Mayfiel


<strong>PPL</strong>: Modification of Character Variables 9.5<br />

Note <strong>the</strong> truncation that resulted because <strong>the</strong> variable Name has a defined size of 32 — <strong>the</strong> final letter on <strong>the</strong> right<br />

is missing. (The squeeze concatenate opera<strong>to</strong>r, discussed next, is more appropriate for an operation of this type).<br />

Any number of strings may be concatenated. f <strong>the</strong> <strong>to</strong>tal length of <strong>the</strong> concatenated strings exceeds that of <strong>the</strong><br />

target variable, <strong>the</strong> output is truncated on <strong>the</strong> right. The things which may be joined include character variables,<br />

literals in quotes and character expressions. Character variables are items such as Name and Telephone. Literals<br />

are character strings such as “ ” (a blank), “Susan” and “945-5600”. Character expressions are <strong>the</strong> results of functions<br />

such as LAST.GOOD ( V(1) .ON. ). For example, this instruction:<br />

[ GENERATE Telephone:C11 =<br />

'1' // Area.Code // FIRST.GOOD ( Phone, Alt.Phone ) ]<br />

illustrates a literal concatenated with a variable concatenated with an expression.<br />

A character expression on <strong>the</strong> right of <strong>the</strong> equal sign, however simple or complex, produces a result whose<br />

width can range from 0 (a null string) <strong>to</strong> 50,000 characters. Only when <strong>the</strong> result is moved across <strong>the</strong> equal sign<br />

in<strong>to</strong> <strong>the</strong> target variable does a Procrustean stretching (with blanks) or truncation take place.<br />

9.9 The Trim Concatenate Opera<strong>to</strong>r<br />

The trim concatenate opera<strong>to</strong>r /// joins strings by trimming out all leading and trailing blanks in each of <strong>the</strong> strings<br />

and inserting a single blank between <strong>the</strong> strings. The concatenation of first and last names, illustrated previously<br />

using <strong>the</strong> regular concatenate opera<strong>to</strong>r, produces different results when <strong>the</strong> trim concatenate opera<strong>to</strong>r is used. This<br />

instruction,<br />

yields this result:<br />

[ GENERATE Name:C32 = First.Name /// Last.Name ]<br />

Name<br />

Abe Adams<br />

Millicent Murphy<br />

Sharon Elizabeth Johnson-Mayfiel<br />

The leading (<strong>the</strong>re were none) and trailing blanks of each name have been trimmed out, and one blank has<br />

been inserted between <strong>the</strong> names. The variable Name could be defined as :C with no specified length which provides<br />

a default value of 40. In o<strong>the</strong>r aspects, <strong>the</strong> /// opera<strong>to</strong>r works just like <strong>the</strong> // opera<strong>to</strong>r.<br />

9.10 Exactly Equal Opera<strong>to</strong>r<br />

The XEQ opera<strong>to</strong>r tests whe<strong>the</strong>r two character strings are exactly equal. The strings must be identical in case, as<br />

well as in specific characters. Normally, comparisons in P-<strong>STAT</strong> are case-independent — “BILL” equals “Bill”<br />

or “biLl”. This is useful in most situations:<br />

[ IF Last EQ 'Smi<strong>the</strong>y' AND First EQ 'Bill',<br />

SET Dependents = 1 ]<br />

Occasionally, however, a comparison that respects case is required. The XEQ opera<strong>to</strong>r is used in those situations.<br />

It is functionally similar <strong>to</strong> IVAL, described in a subsequent section.<br />

Figure 9.1 illustrates using <strong>the</strong> XEQ opera<strong>to</strong>r. Character strings, containing information about logon and logoff<br />

activity on a mainframe computer, existed in a P-<strong>STAT</strong> file. Separate counts of logons and logoffs were<br />

desired. However, <strong>the</strong> logon and logoff instructions were typically abbreviated, and <strong>the</strong> abbreviations were differentiated<br />

only by case. The XEQ opera<strong>to</strong>r specifies a test of exact equality — one that respects <strong>the</strong> case of <strong>the</strong><br />

character string. The opera<strong>to</strong>rs XNE, XLT, XLE, XGT and XGE function similarly.


9.6 <strong>PPL</strong>: Modification of Character Variables<br />

__________________________________________________________________________<br />

Figure 9.1 The XEQ Opera<strong>to</strong>r for Tests that Respect Case<br />

FILE Filelog:<br />

Text<br />

L Fred Smith<br />

L Will Roys<br />

l<br />

disc<br />

L William<br />

l<br />

L Penelope Rt<br />

PROCESS Filelog<br />

[ IF FIRST (.FILE.), GEN #LogOn = 0, GEN #LogOff = 0 ;<br />

IF TOKEN (Text) XEQ 'L', INC #LogOn ;<br />

IF TOKEN (Text) XEQ 'l', INC #LogOff;<br />

IF LAST (.FILE.),<br />

PUT #LogOn ,><br />

#LogOff > ] $<br />

There were 4 logons and 2 logoffs.<br />

__________________________________________________________________________<br />

9.11 CHARACTER FUNCTIONS<br />

There are a number of functions that are used only with character values. These functions, in alphabetical order,<br />

and <strong>the</strong> tasks <strong>the</strong>y perform, are:<br />

1. BLANK Blank out specified characters within a string.<br />

2. XBLANK Like BLANK, but case respecting.<br />

3. CAPS Capitalize <strong>the</strong> first character of each <strong>to</strong>ken.<br />

4. CENTER Center a character string.<br />

5. CHANGE Correct a substring within a string.<br />

6. CLAG Performs a lag on a character argument.<br />

7. XCHANGE Like CHANGE, but case respecting.<br />

8. CHAREX Create a character value from digits in a number.<br />

9. CHARACTER Convert a number <strong>to</strong> a character string.<br />

10. COMPRESS Squeeze out specified characters.<br />

11. CVAL Give character equivalent of specified number.<br />

12. IVAL Give number equivalent of specified character.<br />

13. LEFT Left justify a character string.<br />

14. LENGTH Locate <strong>the</strong> last non-blank character in a string.<br />

15. LOWER Convert characters <strong>to</strong> lowercase equivalents.<br />

16. NUMBER Convert a character string <strong>to</strong> a number.<br />

17. PAD Pad a character string with specified characters.<br />

18. POSITION Give <strong>the</strong> position of one string within ano<strong>the</strong>r.


<strong>PPL</strong>: Modification of Character Variables 9.7<br />

19. XPOSITION Like POSITION, but case respecting.<br />

20. RIGHT Right justify a character string.<br />

21. SIZE Determine <strong>the</strong> defined size of a character variable.<br />

22. SUBSTRING Extract substrings from character strings.<br />

23. TOKEN Access “words” within character strings.<br />

24. TRIM Trim specified characters from strings.<br />

25. UPPER Convert characters <strong>to</strong> uppercase equivalents.<br />

26. VARNAME Convert a variable name <strong>to</strong> a character value.<br />

27. VERIFY Test for unexpected characters in a string.<br />

The function name is followed by paren<strong>the</strong>ses containing one or more expressions or constants. All expressions<br />

may be complex and can consist of variable names, literals and o<strong>the</strong>r functions. Expressions may be nested<br />

within o<strong>the</strong>r expressions. The mode and number of <strong>the</strong> arguments permitted depend on <strong>the</strong> individual function.<br />

Any character expression on <strong>the</strong> right of <strong>the</strong> equal sign, no matter how simple or complex, produces a result<br />

whose width can range from 0 (a null string) <strong>to</strong> 50,000 characters. Only when <strong>the</strong> result is moved across <strong>the</strong> equal<br />

sign in<strong>to</strong> <strong>the</strong> target variable does a padding (with blanks) or truncation take place, if necessary.<br />

9.12 Centering and Justifying Strings<br />

The functions CENTER, LEFT and RIGHT affect <strong>the</strong> position of a character string within its defined field. The<br />

CENTER function centers <strong>the</strong> string. The LEFT and RIGHT functions, respectively, left and right justify <strong>the</strong><br />

strings within <strong>the</strong>ir fields:<br />

LEFT ( ' ABC' ) = 'ABC '<br />

CENTER ( 'XYZ ' ) = ' XYZ '<br />

RIGHT ( 'SPQR ' ) = ' SPQR'<br />

9.13 Changing <strong>the</strong> Case of Strings<br />

UPPER ( 'abc' ) = ABC<br />

LOWER ( 'ABC' ) = abc<br />

CAPS ( 'ann smith' ) = Ann Smith<br />

UPPER, LOWER and CAPS are <strong>the</strong> three functions which change <strong>the</strong> case of a value:<br />

SET Name = UPPER ( Name )<br />

LOWER converts a value <strong>to</strong> all lowercase characters. It is possible <strong>to</strong> nest character functions. This permits conversion<br />

of all of a name <strong>to</strong> lowercase except <strong>the</strong> first letter:<br />

SET Name = SUBSTRING ( Name, 1, 1 ) //<br />

LOWER ( SUBSTRING ( Name, 2 ) )<br />

(The SUBSTRING function is discussed subsequently.)<br />

CAPS capitalizes <strong>the</strong> initial letters of words in character variables:<br />

[ GEN Name:C = CAPS ( 'JOHN paul JoNeS' ) ]<br />

More exactly, CAPS puts all initial letters in upper case and all o<strong>the</strong>r letters in lower case. A blank is <strong>the</strong> assumed<br />

delimiter between <strong>to</strong>kens (words). The output appears like this:<br />

John Paul Jones<br />

Optionally, a second argument giving a replacement delimiter for <strong>the</strong> blank or an additional delimiter may be specified.<br />

This instruction:<br />

[ SET V(1) = CAPS ( 'abc,def,ghi', ',' ) ]<br />

produces:


9.8 <strong>PPL</strong>: Modification of Character Variables<br />

Abc,Def,Ghi<br />

The comma is specified as <strong>the</strong> <strong>to</strong>ken delimiter. It is enclosed in single or double quotes.<br />

9.14 Length and Size of Strings<br />

LENGTH ( ' abc ' ) = 5<br />

SIZE ( ' abc ' ) = 8<br />

The functions LENGTH and SIZE yield information about <strong>the</strong> actual length and <strong>the</strong> defined size of character values.<br />

LENGTH gives <strong>the</strong> location of <strong>the</strong> right-most non-blank character:<br />

[ GENERATE Count = LENGTH ( Name ) ]<br />

The SIZE function yields a numeric value giving <strong>the</strong> defined size of a character value:<br />

[ GEN Width = SIZE ( Name ) ]<br />

The variable Width is generated and set equal <strong>to</strong> <strong>the</strong> longest possible length of <strong>the</strong> variable Name. This length is<br />

<strong>the</strong> defined size of Name or <strong>the</strong> size resulting after various character function procedures or operations.<br />

9.15 Locating Strings Within Variables<br />

POSITION ( 'ABC', 'B' ) = 2<br />

POSITION ( 'ABC', 'X' ) = 0<br />

XPOSITION ( 'ABab', 'ab' ) = 3<br />

VERIFY ( 'ABCDE', 'AEIOU' ) = 2<br />

The POSITION, XPOSITION and VERIFY functions yield a numeric value which is <strong>the</strong> location of a string within<br />

a character value. The simpler usage of POSITION has an expression and a character string as arguments:<br />

[ GEN Blank.Location = POSITION ( Name, ' ' ) ]<br />

The numeric variable Blank.Location is generated and set equal <strong>to</strong> <strong>the</strong> location of <strong>the</strong> first occurrence of a blank<br />

in <strong>the</strong> variable Name. The second argument may be a character variable whose value is <strong>the</strong> string <strong>to</strong> be located:<br />

[ GEN Locale = POSITION ( Address, Zip ) ]<br />

The variable Locale is <strong>the</strong> location of <strong>the</strong> value of Zip (<strong>the</strong> zip code string) within <strong>the</strong> variable Address. If <strong>the</strong><br />

string is not located, <strong>the</strong> result is zero. Values match regardless of whe<strong>the</strong>r <strong>the</strong>y are uppercase or lowercase.<br />

The more complex usage of POSITION permits searches for multiple strings. The left-most position of any<br />

successfully located string is given as <strong>the</strong> function result. The arguments for POSITION are <strong>the</strong> expression and<br />

<strong>the</strong> character strings <strong>to</strong> be located:<br />

[ GEN XX = POSITION ( Name, 'Jr.', 'Sr.', 'Esq.' ) ]<br />

The variable XX is <strong>the</strong> location of <strong>the</strong> left-most occurrence of any of <strong>the</strong> specified strings.<br />

An optional argument for length may be provided. It should be right-most in <strong>the</strong> argument list:<br />

[ SET Extra = POSITION ( Phone, ' ()-/.', 1 ) ]<br />

The contents of <strong>the</strong> character strings, whose positions are being sought, are divided in<strong>to</strong> strings of <strong>the</strong> specified<br />

length. Thus, portions of strings are treated as separate arguments. In <strong>the</strong> preceding example, <strong>the</strong> variable Extra<br />

is set equal <strong>to</strong> <strong>the</strong> position of <strong>the</strong> first occurrence of any of <strong>the</strong> characters in <strong>the</strong> search string. The “1” specifies<br />

that each single character in <strong>the</strong> search string is itself a search string. The length argument must be an integer<br />

between 1 and 50,000. The number of characters in <strong>the</strong> search string must be evenly divisible by <strong>the</strong> length.<br />

XPOSITION is just like POSITION, except that <strong>the</strong> case (upper, lower or mixed) of <strong>the</strong> character string whose<br />

position is being sought is respected:<br />

[ GEN Fatal = XPOSITION ( Symp<strong>to</strong>m, 'D' ) ]


<strong>PPL</strong>: Modification of Character Variables 9.9<br />

The variable Fatal is generated and set equal <strong>to</strong> <strong>the</strong> position of upper-case “D”; lower-case “d” is ignored. XPO-<br />

SITION may be used with <strong>the</strong> same types of arguments as POSITION.<br />

The VERIFY function returns <strong>the</strong> location of <strong>the</strong> first character in <strong>the</strong> initial arguments that is not in any of<br />

<strong>the</strong> remaining arguments:<br />

[ GEN BAD = VERIFY ( 'ABCDE', 'EA', 'B' ) ]<br />

BAD is set <strong>to</strong> 3, since its third character, “C”, is not in any of <strong>the</strong> remaining arguments. Thus, <strong>the</strong> presence of only<br />

specified characters may be verified. Multiple character string arguments are permitted, although each character<br />

is considered as a separate string.<br />

9.16 Extracting Substrings and Words<br />

SUBSTRING ( 'ABCDE', 3, 2 ) = 'CD'<br />

TOKEN ( 'Ann Smith' ) = 'Ann'<br />

The SUBSTRING and TOKEN functions access portions of character strings. SUBSTRING accesses a string<br />

starting at a specified location and of a given length. TOKEN accesses “words” within a character string — that<br />

is, strings delimited by blanks or ano<strong>the</strong>r specified character.<br />

SUBSTRING requires an expression, a start location and a length as arguments:<br />

[ GEN Initial:C1 = SUBSTRING ( First.Name, 1, 1 ) ]<br />

The variable Initial is generated as a character variable with a defined size of 1. It is set equal <strong>to</strong> <strong>the</strong> substring of<br />

First.Name, beginning at <strong>the</strong> character in position 1 and having a length of 1. The third argument giving <strong>the</strong> length<br />

is optional. When it is omitted, <strong>the</strong> assumption is that <strong>the</strong> rest of <strong>the</strong> string is needed. If <strong>the</strong> third argument is<br />

omitted when <strong>the</strong> second argument is 1, <strong>the</strong> entire input string is <strong>the</strong> substring.<br />

An expression may be used as ei<strong>the</strong>r <strong>the</strong> location or length argument in SUBSTRING:<br />

[ GEN #Len = LENGTH ( Phone.No ) ]<br />

[ GEN Code:C2 = SUBSTRING ( Phone.No, #Len-1, 2 ) ]<br />

The variable Code is set equal <strong>to</strong> <strong>the</strong> two right-most digits in <strong>the</strong> telephone number. (The scratch variable #Len<br />

is <strong>the</strong> length of <strong>the</strong> phone number. #Len-1 identifies <strong>the</strong> start location as <strong>the</strong> next-<strong>to</strong>-last digit, and 2 is <strong>the</strong> length<br />

of <strong>the</strong> substring.)<br />

The TOKEN function accesses words or strings of characters within a character variable. The strings are typically<br />

separated by blanks:<br />

[ GEN First.Name:C = TOKEN ( Name ) ]<br />

The variable First.Name is <strong>the</strong> first word in <strong>the</strong> variable Name.<br />

Optional arguments for <strong>the</strong> TOKEN function access specific words and specify <strong>the</strong> delimiter between words.<br />

These instructions,<br />

[ GEN First.Name:C = TOKEN ( Name, 1 ) ;<br />

GEN Middle.Name:C = TOKEN ( Name, 2 ) ;<br />

GEN Last.Name:C = TOKEN ( Name, 3 ) ;<br />

IF Last.Name MISSING,<br />

SET Last.Name = Middle.Name,<br />

SET Middle.Name = ' ' ]<br />

access <strong>the</strong> second and third <strong>to</strong>kens in Name, as well as <strong>the</strong> first. When <strong>the</strong> second argument is omitted, <strong>the</strong> first<br />

<strong>to</strong>ken is accessed. Note that accessing a <strong>to</strong>ken that is not present (<strong>the</strong> third <strong>to</strong>ken in a name that has only two <strong>to</strong>kens)<br />

yields a missing value.<br />

A delimiter o<strong>the</strong>r than <strong>the</strong> blank may be specified:<br />

[ GEN Year.of.Birth:C = TOKEN ( Birthdate, 3, '/' ) ]


9.10 <strong>PPL</strong>: Modification of Character Variables<br />

In this example, <strong>the</strong> slash is specified as <strong>the</strong> <strong>to</strong>ken delimiter. Assuming Birthdate has values such as 11/19/68, <strong>the</strong><br />

value of <strong>the</strong> third <strong>to</strong>ken, 68, will be used as <strong>the</strong> value of Year.of.Birth.<br />

TOKEN accesses <strong>to</strong>kens counting from <strong>the</strong> left. TOKEN is a synonym for LTOKEN. RTOKEN accesses<br />

<strong>to</strong>kens counting from <strong>the</strong> right. In all o<strong>the</strong>r aspects, it is <strong>the</strong> same as LTOKEN.<br />

The NTOKEN function yields a count of <strong>to</strong>kens within a character string. The result is a numeric value, not<br />

a character string. This instruction:<br />

[ GEN #Number = NTOKEN ( Address ) ]<br />

generates a scratch variable #Number that equals <strong>the</strong> number of words in <strong>the</strong> variable Address. The delimiter is<br />

assumed <strong>to</strong> be a blank unless a second argument specifies one or more alternate delimiters:<br />

[ GEN Number.Read = NTOKEN ( Magazines, ', ' ) ]<br />

A comma and a blank are specified as <strong>the</strong> <strong>to</strong>ken delimiters. The variable Number.Read is <strong>the</strong> number of strings<br />

separated by commas and/or blanks in <strong>the</strong> variable Magazines.<br />

9.17 Blanking Out and Changing Strings<br />

BLANK ( 'abcde', 'bd' ) = 'a de'<br />

BLANK ( 'abcde', 2, 2 ) = 'a de'<br />

CHANGE ( 'abcde', 'bc', '999' ) = 'a999de'<br />

CHANGE ( 'abcde', 2, 3, '2222' ) = 'a222e'<br />

The BLANK, XBLANK, CHANGE and XCHANGE functions alter character variables by replacing portions<br />

with ei<strong>the</strong>r blanks or specified new strings. This provides <strong>the</strong> ability <strong>to</strong> delete or replace substrings. BLANK and<br />

CHANGE ignore <strong>the</strong> case of character strings; XBLANK and XCHANGE respect <strong>the</strong> eXact case of character<br />

strings.<br />

The BLANK function has two usage modes. The simpler usage specifies an expression, a starting location<br />

for <strong>the</strong> blank string, and <strong>the</strong> length of <strong>the</strong> string. This instruction:<br />

[ GEN Birthday:C4 = BLANK ( Date.of.Birth, 5, 2 ) ]<br />

replaces <strong>the</strong> character string beginning with <strong>the</strong> fifth character and of length 2 with two blanks. The variable Birthday<br />

becomes '1215 ' instead of '121545'.<br />

The third argument, giving <strong>the</strong> length of <strong>the</strong> string being blanked out, may be omitted. The characters from<br />

<strong>the</strong> start location through <strong>the</strong> end will be replaced by blanks:<br />

LIST Vocab.Test<br />

[ KEEP Vocab.Words Definitions ;<br />

SET Vocab.Words = BLANK ( Vocab.Words, 2 ) ] $<br />

This listing will have only <strong>the</strong> initial letter of each vocabulary word and <strong>the</strong> definitions.<br />

The alternate usage of <strong>the</strong> BLANK function specifies a particular character string <strong>to</strong> be replaced by blanks,<br />

ra<strong>the</strong>r than a location and length. XBLANK may be used with this type of argument. The first occurrence of <strong>the</strong><br />

string is blanked out:<br />

BLANK ( 'abcde', 'CD' ) produces 'ab e'<br />

XBLANK ( 'abcde', 'CD' ) produces 'abcde'<br />

Multiple occurrences of a character string may be blanked out. An optional third argument <strong>to</strong> <strong>the</strong> BLANK<br />

function specifies <strong>the</strong> maximum number of occurrences of <strong>the</strong> string <strong>to</strong> be replaced:<br />

[ SET Comments = BLANK ( Comments, 'damn', 10 ) ]<br />

Up <strong>to</strong> ten occurrences of <strong>the</strong> word “damn” in <strong>the</strong> variable Comments will be replaced with an equivalent number<br />

of blanks. The size of <strong>the</strong> resultant variable always remains <strong>the</strong> same when <strong>the</strong> BLANK function is used.


<strong>PPL</strong>: Modification of Character Variables 9.11<br />

The CHANGE and XCHANGE functions have two usage modes comparable <strong>to</strong> those of <strong>the</strong> BLANK function.<br />

The most common usage of CHANGE specifies an expression, an old string and a new string. The first<br />

occurrence of <strong>the</strong> old string is replaced with <strong>the</strong> new string:<br />

[ SET State = CHANGE ( State, 'TX', 'Texas' ) ]<br />

XCHANGE is <strong>the</strong> same as CHANGE, but <strong>the</strong> case of <strong>the</strong> old string must be exactly as specified or it is not replaced<br />

by <strong>the</strong> new string.<br />

The old and new strings may be specified with expressions, which are variable names, literals and o<strong>the</strong>r<br />

functions:<br />

[ IF New.Area.Code GOOD, SET Phone.No =<br />

CHANGE ( Phone.No, Area.Code, New.Area.Code ) ]<br />

The area code in <strong>the</strong> phone number is changed <strong>to</strong> <strong>the</strong> new area code, unless <strong>the</strong> new one is missing. The value of<br />

Area.Code is <strong>the</strong> old string and a good (non-missing) value of New.Area.Code is <strong>the</strong> new string. If <strong>the</strong> value of<br />

Area.Code is not found in Phone.No, no change is made.<br />

Multiple changes may be specified. An optional fourth argument gives <strong>the</strong> maximum number of changes:<br />

[ SET Title = CHANGE ( Title, ' ', '.', 999 ) ]<br />

All blanks are changed <strong>to</strong> periods, including leading or trailing blanks.<br />

If a new character string is not specified, <strong>the</strong> old substring is removed, making <strong>the</strong> length of <strong>the</strong> result smaller.<br />

This instruction removes occurrences of <strong>the</strong> word “damn”:<br />

[ SET Comments = CHANGE ( Comments, 'damn', 10 ) ]<br />

The alternate usage of <strong>the</strong> CHANGE function specifies an expression, a starting location of a string, <strong>the</strong> length<br />

of <strong>the</strong> string, and a new string. The new string may or may not be <strong>the</strong> same length as <strong>the</strong> old one:<br />

[ IF SUBSTRING ( Telephone, 4, 3 ) = '897',<br />

SET Telephone = CHANGE ( Telephone, 4, 3, '807' ) ]<br />

The CHANGE and XCHANGE functions may make a value longer or shorter. This does not matter until <strong>the</strong><br />

final resulting value is moved across <strong>the</strong> equal sign. At that time, truncation or blank padding will occur as needed.<br />

9.18 Squeezing Out Specified Characters<br />

COMPRESS ( '12/35/95', '/' ) = '122595'<br />

COMPRESS ( '..AB...CD..', 1, '.' ) = '.AB.CD.'<br />

The COMPRESS function squeezes out ei<strong>the</strong>r blanks or specified characters. This instruction:<br />

[ SET SS.Number = COMPRESS ( SS.Number ) ]<br />

squeezes out all leading, trailing and embedded blanks contained in <strong>the</strong> variable SS.Number. Only non-blank<br />

characters remain.<br />

An expanded usage mode of <strong>the</strong> COMPRESS function permits specification of <strong>the</strong> number of delimiters that<br />

may remain between <strong>to</strong>kens (strings or words), and <strong>the</strong> delimiter character or characters that separate <strong>to</strong>kens. This<br />

instruction will leave only a single blank between words wherever one or more blanks are found:<br />

[ SET Sentence = COMPRESS ( Sentence, 1 ) ]<br />

This next instruction will generate a numeric variable from a character one, after all <strong>the</strong> specified characters are<br />

squeezed out:<br />

[ GEN <strong>Inc</strong>ome = NUMBER ( COMPRESS ( <strong>Inc</strong>ome, ',$' ) ) ]<br />

The <strong>to</strong>ken delimiters are specified as <strong>the</strong> comma and <strong>the</strong> currency sign. Since <strong>the</strong> second argument, <strong>the</strong> number<br />

of delimiters that may remain, is missing, zero is assumed. All specified delimiters are removed, leaving only


9.12 <strong>PPL</strong>: Modification of Character Variables<br />

numbers (it is hoped). The NUMBER function (discussed subsequently) converts <strong>the</strong> resultant string in<strong>to</strong> a numeric<br />

value.<br />

9.19 Trimming Strings<br />

TRIM ( ' abcd ' ) = ' abcd'<br />

LTRIM ( ' abcd ' ) = 'abcd '<br />

LRTRIM ( ' abcd ' ) = 'abcd'<br />

TRIM ( 'abc ***', '*' ) = 'abc '<br />

TRIM ( 'abc ***', 2, '*' ) = 'abc *'<br />

The TRIM functions remove ei<strong>the</strong>r blanks or one or more specified characters from one or both ends of a<br />

string. TRIM is a synonym for RTRIM — it trims blanks or characters from <strong>the</strong> right end of a string. LTRIM<br />

trims from <strong>the</strong> left end and LRTRIM trims from both ends.<br />

The TRIM functions have a character expression and a single character or a string of characters as arguments<br />

and an optional number that limits <strong>the</strong> number of characters <strong>to</strong> be trimmed. The string of trim characters is optional<br />

and, when it is not present, blank trim characters are assumed. In this example, blank characters are trimmed<br />

from <strong>the</strong> right end of <strong>the</strong> variable First.Name before concatenating it with a blank and variable Last.Name:<br />

[ SET Name = TRIM ( First.Name ) // ' ' // Last.Name ]<br />

Multiple trim characters may be specified:<br />

[ GEN Text:C = TRIM ( TRIM (Var1), '.,-' ) ]<br />

Any of <strong>the</strong> three specified punctuation marks that occur on <strong>the</strong> right end of values of Var1 will be trimmed off,<br />

yielding values of <strong>the</strong> new variable Text. Values of “hippo-”, “closing,” and “also...” will become “hippo”, “closing”<br />

and “also”. The resultant value may have a shorter length than it did prior <strong>to</strong> trimming. Notice that a simple<br />

TRIM is used <strong>to</strong> remove excess blanks first, so that <strong>the</strong> punctuation is right-most and <strong>the</strong>refore able <strong>to</strong> be trimmed.<br />

Note that:<br />

TRIM ( VAR1, '., -' )<br />

which adds a blank <strong>to</strong> <strong>the</strong> trim characters, is slightly different; it not only trims initial blanks on <strong>the</strong> right, it also<br />

trims blanks after o<strong>the</strong>r trim characters have been found. I.e., 'ab. - , ' would become 'ab' instead of<br />

<strong>the</strong>'ab. - ' which results when <strong>the</strong> simple TRIM of blanks is done before <strong>the</strong> TRIM of <strong>the</strong> punctuation<br />

characters.<br />

9.20 Padding Strings<br />

PAD ( 'abcd', 6 ) = 'abcd '<br />

LPAD ( 'abcd', 7, '-' ) = '---abcd'<br />

PAD ( 'abcd', 3 ) = 'abcd'<br />

LRPAD ( 'abcd', 9, '-' ) = '--abcd---'<br />

The PAD functions add blanks or a specified character <strong>to</strong> <strong>the</strong> right (PAD or RPAD), <strong>to</strong> <strong>the</strong> left (LPAD), or <strong>to</strong> both<br />

ends of a string (LRPAD).<br />

The PAD functions have a character expression, a minimum length, and a fill character as arguments. Only<br />

<strong>the</strong> character expression is required. When <strong>the</strong> third argument, <strong>the</strong> fill character, is omitted in any of <strong>the</strong> PAD<br />

functions, a blank character is assumed.<br />

[ GEN ABC:C = PAD ( TRIM ( V(5) ), 10, '-' ) ]<br />

The trimmed form of variable 5 is padded with dashes <strong>to</strong> a width of 10. If <strong>the</strong> trimmed form is already 10 or more,<br />

no dashes are added. Then, when <strong>the</strong> result is moved across <strong>the</strong> equal sign in<strong>to</strong> variable ABC, it will be fur<strong>the</strong>r<br />

padded with blanks if its length is less than 16.<br />

When <strong>the</strong> second argument, <strong>the</strong> length, is omitted, a length of 1 is assumed:


<strong>PPL</strong>: Modification of Character Variables 9.13<br />

[ SET Heading = PAD ( TRIM (Heading), '.' ) ]<br />

Blanks are trimmed from <strong>the</strong> end of variable Heading. If necessary, <strong>the</strong> resultant value is padded with a dot <strong>to</strong><br />

bring it up <strong>to</strong> a length of 1. Therefore, only values of Heading that are completely blank, and thus become null<br />

strings when <strong>the</strong>y are trimmed, are padded. This usage of PAD may be useful in locating blank values or in avoiding<br />

production of null strings, which may interfere with o<strong>the</strong>r procedures.<br />

PAD is often used with TRIM. Consider:<br />

[ GEN aa:c16 = 'cow';<br />

GEN bb:c16 = LRPAD ( aa, 16, '-' ) ]<br />

This LRPAD has no effect, because variable aa, being C16, literally contains 'cow '. Variable<br />

bb will contain <strong>the</strong> same thing because <strong>the</strong> input <strong>to</strong> LRPAD already has 16 characters. However:<br />

[ GEN aa:c16 = 'cow';<br />

GEN bb:c16 = LRPAD ( LRTIRM ( aa ), 16, '-' ) ]<br />

gives just 'cow' <strong>to</strong> <strong>the</strong> LRPAD function, so it will set bb <strong>to</strong> '------cow-------'<br />

In this vein, if LRPAD ( LRTRIM ( aa ), 40, '-' ) were used, LRPAD would cheerfully produce 18 dashes<br />

'cow', and 19 dashes. Then, because bb is character 16, it is truncated <strong>to</strong> <strong>the</strong> first 16 of those 40 characters, i.e., a<br />

series of 16 dashes.<br />

9.21 Converting Numbers <strong>to</strong> Characters and Vice Versa<br />

NUMBER ( '12' // '3' ) = 123<br />

CHARACTER ( 123 ) = '123'<br />

CHAREX ( 122596, '00XX00' ) = '25'<br />

The NUMBER function converts character strings containing digits in<strong>to</strong> numbers. The CHARACTER and CHA-<br />

REX functions convert numeric values in<strong>to</strong> character representations. However, CHARACTER converts an entire<br />

numeric variable, whereas CHAREX extracts and converts only specified digits.<br />

The NUMBER function converts a character value containing digits in<strong>to</strong> numeric form. NUMBER requires<br />

a character expression as its argument:<br />

[ SET Year = NUMBER ( SUBSTRING ( Date, 7, 2 ) ) ]<br />

This instruction takes <strong>the</strong> seventh and eighth characters from <strong>the</strong> character variable Date and converts <strong>the</strong>m in<strong>to</strong><br />

numeric form. For example, when Date has <strong>the</strong> value “09/15/29”, Year has <strong>the</strong> value 29.<br />

If <strong>the</strong> character value of <strong>the</strong> argument is all blank, <strong>the</strong> number is set <strong>to</strong> missing type 1. If <strong>the</strong> result of <strong>the</strong><br />

character expression is not numeric, <strong>the</strong> number is set <strong>to</strong> missing type 2. If <strong>the</strong> input character value is missing,<br />

<strong>the</strong> number is set <strong>to</strong> missing type 3.<br />

There are three forms of <strong>the</strong> number function. They differ only when an invalid (missing 2) result occurs.<br />

NUMBER does not print any warnings when invalid values are found. NUMBER.W prints a diagnostic warning.<br />

NUMBER.E produces an error message, which ends <strong>the</strong> command at that moment.<br />

A numeric variable for a date, suitable as input <strong>to</strong> <strong>the</strong> DAYS function, may be generated. Assuming Date is<br />

a character string of <strong>the</strong> form 11/19/68, this instruction:<br />

[ GENERATE Date2 = NUMBER.W ( COMPRESS ( Date, '/' ) ) ]<br />

squeezes out all slashes in <strong>the</strong> values of Date and converts <strong>the</strong>m <strong>to</strong> numbers that can be input <strong>to</strong> <strong>the</strong> DAYS function<br />

<strong>to</strong> compute differences between dates. A warning message is issued if Date contains any invalid characters, such<br />

as “-” or “ ” (blank).<br />

The CHARACTER function converts numbers in<strong>to</strong> character strings. It requires a numeric expression as its<br />

argument. This instruction:


9.14 <strong>PPL</strong>: Modification of Character Variables<br />

[ GEN ID:C11 =<br />

CHARACTER ( Class ) // CHARACTER ( SS.Num ) ]<br />

generates a character variable named ID whose value is <strong>the</strong> concatenation of <strong>the</strong> character representations of <strong>the</strong><br />

numeric variables Class and SS.Num.<br />

It is possible <strong>to</strong> use CHARACTER followed by a second argument indicating <strong>the</strong> number of decimal places<br />

<strong>to</strong> preserve in <strong>the</strong> expression. This is particularly useful if you have income values with decimal places carried up<br />

<strong>to</strong> four places and you wish <strong>to</strong> specify only two decimal places. The second argument indicates <strong>the</strong> number of<br />

places <strong>to</strong> carry in <strong>the</strong> expression:<br />

LIST Salary [ GEN <strong>Inc</strong>ome:C = CHARACTER ( Salary, 2 ) ] $<br />

CHARACTER may also be followed by a third argument which indicates <strong>the</strong> maximum number of places <strong>to</strong> preserve.<br />

In <strong>the</strong> following example, all values have a minimum of two decimal places and those with sufficient digits<br />

in <strong>the</strong> decimal portion have a maximum of three places:<br />

LIST Salary [ GEN <strong>Inc</strong>ome:C = CHARACTER ( Salary, 2, 3 ) ] $<br />

The CHAREX function extracts specific digits from a numeric value and yields a character representation of<br />

those digits. CHAREX operates only on <strong>the</strong> integer portion of <strong>the</strong> number — any fractional portion and sign are<br />

ignored. The two required arguments are a numeric expression and a character string selection mask enclosed in<br />

quotes:<br />

([GEN Month:C2 = CHAREX ( Date, 'XX00' ) ]<br />

The selection mask is composed of X and 0 (zero) characters and may be up <strong>to</strong> ten characters in length. An<br />

X retains a digit and a 0 drops a digit. The selection mask is aligned with <strong>the</strong> right-most digit of <strong>the</strong> numeric value.<br />

Thus, <strong>the</strong> selection mask “X0X” applied <strong>to</strong> <strong>the</strong> numeric value 840921 yields <strong>the</strong> character representation “91”.<br />

The selection mask “XX00X” applied <strong>to</strong> “156” yields “006” because lead zeros pad <strong>the</strong> numeric value until it is<br />

<strong>the</strong> length of <strong>the</strong> mask. The numeric function NUMEX is similar <strong>to</strong> CHAREX, but it yields a numeric result.<br />

9.22 Character/Integer Translation<br />

CVAL ( 65 ) = 'A'<br />

IVAL ( 'A' ) = 65<br />

The <strong>PPL</strong> functions, CVAL and IVAL, short for character value and integer value, translate a character <strong>to</strong> an integer<br />

and vice versa. This permits non-printing characters <strong>to</strong> be inserted in<strong>to</strong> text strings output by <strong>the</strong> PUT instruction,<br />

<strong>the</strong> LIST command or <strong>the</strong> TITLES command. Also, all kinds of character values can be compared precisely by<br />

referring <strong>to</strong> <strong>the</strong>ir integer codes.<br />

The CVAL function requires an integer between 0 and 255 as its argument. It returns <strong>the</strong> character equivalent<br />

of that integer. The IVAL function requires a character value of any size as its argument. It returns <strong>the</strong> integer<br />

equivalent of <strong>the</strong> first character.<br />

Figure 9.2 illustrates using <strong>the</strong> CVAL function <strong>to</strong> output non-printing codes with text <strong>to</strong> a printer. The example<br />

codes are appropriate for a personal computer and an Lexmark printer. Despite <strong>the</strong> fact that <strong>the</strong> codes are<br />

specific for <strong>the</strong>se machines, <strong>the</strong> example illustrates <strong>the</strong> general procedure of imbedding codes in text strings. The<br />

character equivalent of 27:<br />

CVAL (27)<br />

is an escape code. Many printers require that an escape code, a non-printing signal, precede alphanumeric characters<br />

<strong>to</strong> specify various printer parameters. An escape code followed by <strong>the</strong> character “G”:<br />

(CVAL(27)) 'G' 'Bold On' (CVAL(27)) 'H' 'Bold Off'<br />

turns on <strong>the</strong> double-strike print mode; escape H turns it off. All text printed after escape G is double-struck until<br />

escape H is processed. O<strong>the</strong>r printer instructions require an escape code followed by <strong>the</strong> character equivalent of<br />

an integer. This is a form feed:


<strong>PPL</strong>: Modification of Character Variables 9.15<br />

(CVAL(27)) (CVAL(12))<br />

__________________________________________________________________________<br />

Figure 9.2 The CVAL Function for Bells and Whistles<br />

Enter a command:<br />

>> <strong>PPL</strong> (PUT<br />

@PAGE (CVAL(27)) 'G' 'Bold On' (CVAL(27)) 'H' 'Bold Off'<br />

@SKIP (CVAL(27)) 'M' 'Elite On' (CVAL(27)) 'P' 'Elite Off'<br />

@SKIP (CVAL(27)) (CVAL(12)) 'Form Feed'<br />

@SKIP (CVAL(27)) 'R' (CVAL(7)) 'International Character Set On'<br />

@SKIP (CVAL(221)) 'Hable Ud. Espa' (CVAL(252)) 'ol?'<br />

@SKIP (CVAL(27)) 'R' (CVAL(7)) 'International Character Set Off'<br />

@SKIP (CVAL(27)) (CVAL(14)) 'Enlarged Print On'<br />

@SKIP (CVAL(27)) '@' 'Initialize Printer'<br />

@SKIP (CVAL(27)) (CVAL(7)) 'Ring Bell' ),<br />

PR LPT1 $<br />

__________________________________________________________________________<br />

Still o<strong>the</strong>r instructions require an escape code, a character, and <strong>the</strong> character equivalent of an integer as an<br />

option. This selects <strong>the</strong> Spanish character set on <strong>the</strong> printer:<br />

(CVAL(27)) 'R' (CVAL(7)) 'International Character Set On'<br />

(CVAL(221)) 'Hable Ud. Espa' (CVAL(252)) 'ol?'<br />

(CVAL(27)) 'R' (CVAL(7)) 'International Character Set Off'<br />

The upside-down question mark and <strong>the</strong> Spanish “n” print in <strong>the</strong> subsequent text, and <strong>the</strong>n <strong>the</strong> default character<br />

set is res<strong>to</strong>red. This re-initializes <strong>the</strong> printer <strong>to</strong> <strong>the</strong> normal defaults,<br />

(CVAL(27)) '@'<br />

so that subsequent users are not unduly surprised.<br />

Notice that <strong>the</strong> <strong>PPL</strong> command is used <strong>to</strong> process <strong>PPL</strong> (P-<strong>STAT</strong> programming language) instructions. An input<br />

file is not required and an output file is not produced. The <strong>PPL</strong> command exists solely <strong>to</strong> process <strong>PPL</strong><br />

instructions, such as <strong>the</strong> PUT instruction used in this example. PUT places text strings on <strong>the</strong> output device, which<br />

is <strong>the</strong> printer in this example:<br />

PR LPT1 $<br />

(“LPT1” is <strong>the</strong> default name for <strong>the</strong> printer on many personal computers.) The TEXT.WRITER or PROCESS<br />

commands could be used if <strong>the</strong> output text strings were <strong>to</strong> incorporate values from a P-<strong>STAT</strong> system file. Both<br />

TEXT.WRITER and PROCESS require an input file. Any P-<strong>STAT</strong> command could be used — <strong>the</strong> input filename<br />

is followed directly by <strong>PPL</strong> clauses containing <strong>PPL</strong> instructions. The instructions containing “@” control<br />

text placement. See <strong>the</strong> chapter on <strong>the</strong> TEXT.WRITER command for specifics.


9.16 <strong>PPL</strong>: Modification of Character Variables<br />

__________________________________________________________________________<br />

Figure 9.3 Nesting Functions<br />

File People:<br />

Name Birthdate<br />

Susan 07/08/56<br />

Marc 01/26/52<br />

David 03/31/59<br />

>> SORT People<br />

[ GEN #Day =<br />

DAYS ( NUMBER ( COMPRESS ( Birthdate, '/' ) ), 'MMDDYY' ) ;<br />

GEN #Today = DAYS ( .NDATE., 'YYYYMMDD' ) ;<br />

GEN Age = INT ( ( 1 + #Today - #Day ) / 365.25 ) ],<br />

BY Age,<br />

OUT People.By.Age $<br />

>> LIST $<br />

Name Birthdate Age<br />

David 03/31/59 53<br />

Susan 07/08/56 55<br />

Marc 01/26/52 60<br />

__________________________________________________________________________<br />

9.23 Complex Character Expressions<br />

The arguments for character functions are expressions, which must be enclosed in paren<strong>the</strong>ses and separated by<br />

commas. The simplest expression is a variable name or position. Complex expressions are nested functions, numeric<br />

constants, quoted character constants (literals or strings), or combinations of <strong>the</strong>se. Combining character<br />

opera<strong>to</strong>rs and functions in a series of instructions and procedures permits complex manipulation of character<br />

variables.<br />

Figure 9.3 illustrates how numeric and character functions can be used <strong>to</strong>ge<strong>the</strong>r <strong>to</strong> create a numeric variable,<br />

Age, given a character variable Birthdate. COMPRESS is used <strong>to</strong> squeeze out <strong>the</strong> slashes from Birthdate. The<br />

result from COMPRESS is <strong>the</strong> input <strong>to</strong> <strong>the</strong> NUMBER function. The result from <strong>the</strong> NUMBER function is <strong>the</strong> first<br />

argument for <strong>the</strong> DAYS function.<br />

Scratch variables are used for <strong>the</strong> intermediate computations. They are not necessary. The entire series of<br />

nested functions can be placed in a single <strong>PPL</strong> phrase:<br />

[ GEN Age = INT ( ( 1 + DAYS ( .NDATE., 'YYYYMMDD' ) -<br />

DAYS ( NUMBER ( COMPRESS ( Birthdate, '/' )), 'MMDDYY' ))<br />

/ 365.25 ) ],<br />

When functions are nested this way, <strong>the</strong> possibility for an error in logic is greater than it is when <strong>the</strong> process is<br />

broken up in<strong>to</strong> several smaller steps.


<strong>PPL</strong>: Modification of Character Variables 9.17<br />

The example in Figure 9.3 was run on June 22, 2012. It should be noted that <strong>the</strong> DAYS function for .NDATE.<br />

uses 'YYYYMMDD'. This is needed because .NDATE. produces a 4-digit year.<br />

9.24 Using <strong>the</strong> Name of a Variable as a Character Value<br />

The VARNAME function provides <strong>the</strong> name of a variable:<br />

[ GEN Last.Missing:C = .M. ;<br />

DO #L USING Test.1 TO Test.8 ;<br />

IF V(#L) MISSING,<br />

SET Last.Missing = VARNAME ( #L ) ;<br />

ENDDO ]<br />

The character variable Last.Missing is generated and set equal <strong>to</strong> missing. The values of Test.1 through Test.8 are<br />

tested — each one that is missing causes <strong>the</strong> recoding of Last.Missing <strong>to</strong> <strong>the</strong> name of <strong>the</strong> variable with <strong>the</strong> missing<br />

value. Last.Missing would have values of “Test.4”, “Test.7”, “-” (missing), and so on.<br />

Figure 9.4 illustrates a more complex usage of <strong>the</strong> VARNAME function. Here, a more compact and informative<br />

listing of <strong>the</strong> file Patients is desired. Five new variables named d.Heart, d.Liver, and so on, are created,<br />

each one set <strong>to</strong> <strong>the</strong> name of <strong>the</strong> corresponding variable in <strong>the</strong> DO loop:<br />

DO #J USING Heart TO Back;<br />

GEN ?( 'd.' & ):C = VARNAME (#J) ;<br />

ENDDO ;<br />

At <strong>the</strong> end of this step, <strong>the</strong> file has twelve variables, Id, Name, Heart through Back, and d.Heart through d.Back.<br />

The values of <strong>the</strong> first case are:<br />

1001 Jones 0 1 0 1 0 Heart Liver Kidney Brain Back<br />

The next step tests <strong>the</strong> five 0/1 variables Heart through Back, and, if any are equal <strong>to</strong> zero, sets <strong>the</strong> corresponding<br />

d.variable <strong>to</strong> missing:<br />

DO #J USING Heart TO Back ;<br />

IF V(#J) EQ 0, SET V( #J+5 ) = .M1. ;<br />

ENDDO ]<br />

At <strong>the</strong> end of <strong>the</strong> second step, <strong>the</strong> first case contains:<br />

1001 Jones 0 1 0 1 0 - Liver - Brain -<br />

Variables Heart through Back are no longer needed, so <strong>the</strong>y are dropped from <strong>the</strong> file.<br />

SPLIT is <strong>the</strong>n used <strong>to</strong> break each patient case in<strong>to</strong> five cases, one for each of <strong>the</strong> disease situations. After <strong>the</strong><br />

SPLIT, <strong>the</strong> cases pertaining <strong>to</strong> patient Jones are:<br />

Id Name Disease<br />

1001 Jones -<br />

1001 Jones Liver<br />

1001 Jones -<br />

1001 Jones Brain<br />

1001 Jones -


9.18 <strong>PPL</strong>: Modification of Character Variables<br />

__________________________________________________________________________<br />

Figure 9.4 Using VARNAME, SPLIT and COLLECT<br />

File Patients:<br />

Id Name Heart Liver Kidney Brain Back<br />

1001 Jones 0 1 0 1 0<br />

1002 Brown 1 0 0 0 0<br />

1003 Davis 0 1 1 0 1<br />

1004 Mason 0 1 1 0 0<br />

1009 Smith 1 0 0 1 0<br />

LIST Patients<br />

[ DO #J USING Heart TO Back;<br />

GEN ?( 'd.' & ):C = VARNAME (#J);<br />

ENDDO;<br />

DO #J USING *;<br />

IF V(#J) EQ 0, SET V( #J+5 ) = .M1. ;<br />

ENDDO ]<br />

[ DROP Heart TO Back;<br />

SPLIT INTO 5, CARRY ( Id Name ), CREATE Disease d.?;<br />

IF Disease MISSING, DELETE ]<br />

[ COLLECT 3, BY Id, CARRY Name ] $<br />

Disease Disease Disease<br />

ID Name .1 .2 .3<br />

1001 Jones Liver Brain -<br />

1002 Brown Heart - -<br />

1003 Davis Liver Kidney Back<br />

1004 Mason Liver Kidney -<br />

1009 Smith Heart Brain -<br />

__________________________________________________________________________<br />

Next any case that has a missing value for Disease is deleted, leaving only two cases for Jones and no more than<br />

three cases for any patient (<strong>the</strong> maximum number of diseases observed for any patient). The final step:<br />

[ COLLECT 3, BY Id, CARRY Name ]<br />

collects <strong>the</strong> maximum of three cases for each patient back in<strong>to</strong> a single case. Since Jones had only two medical<br />

problems, he has a missing value for Disease.3.<br />

9.25 The MATCHES and XMATCHES Opera<strong>to</strong>rs<br />

The opera<strong>to</strong>r MATCHES tests if a pattern matches <strong>the</strong> value of a character variable. The pattern is composed of<br />

characters and symbols, such as <strong>the</strong> wildcard character “*”. The pattern is supplied within single or double quotes:<br />

[ IF Company MATCHES 'Con* Ed*', RETAIN ]


<strong>PPL</strong>: Modification of Character Variables 9.19<br />

This instruction selects cases from <strong>the</strong> file shown in Figure 9.5 with <strong>the</strong> following values of Company:<br />

Con Ed coned Consolidated Education, <strong>Inc</strong>.<br />

Consulted, <strong>Inc</strong>. Connie Edward Conway Medics<br />

__________________________________________________________________________<br />

Figure 9.5 File of Character Data for MATCHES and XMATCHES<br />

Company<br />

Con Ed<br />

coned<br />

Super Coned<br />

Corn Fed<br />

Ed Con<br />

Consolidated Education, <strong>Inc</strong>.<br />

Consulted, <strong>Inc</strong>.<br />

Con Ed<br />

Connie Edward<br />

Conway Medics<br />

*CON ED*<br />

Connie E. Dean, Assoc.<br />

__________________________________________________________________________<br />

It does not select:<br />

Super Coned Corn Fed Ed Con<br />

Con Ed *CON ED* Connie E. Dean, Assoc.<br />

This simple, common usage of MATCHES, with <strong>the</strong> wildcard character “*” in <strong>the</strong> pattern, illustrates several<br />

basic rules of matching:<br />

1. The pattern is anchored — that is, a match of <strong>the</strong> first character in <strong>the</strong> pattern is sought in <strong>the</strong> first<br />

character of <strong>the</strong> character value. (This means that lead or left-most blanks count.)<br />

2. The wildcard “*” matches zero or more occurrences of any character, including blanks.<br />

3. Spaces or blanks inside <strong>the</strong> pattern are ignored, unless <strong>the</strong>y are escaped — enclosed in < >.<br />

4. Case is not considered, unless XMATCHES is used.<br />

The pattern may be unanchored by using <strong>the</strong> wildcard “*” as <strong>the</strong> first character in <strong>the</strong> pattern:<br />

[ IF Company MATCHES '*Con* Ed*', RETAIN ]<br />

This instruction selects:<br />

Con Ed coned Consolidated Education, <strong>Inc</strong>.<br />

Consulted, <strong>Inc</strong>. Connie Edward Conway Medics<br />

Super Coned Con Ed *CON ED*<br />

Trailing or right-most blanks are ignored. This instruction, without <strong>the</strong> “*” as <strong>the</strong> final character in <strong>the</strong> pattern,<br />

selects:<br />

[ IF Company MATCHES '*Con* Ed', RETAIN ]<br />

Con Ed coned Super Coned Con Ed<br />

A complete set of pattern symbols (meta-characters) and syntax exists for MATCHES and XMATCHES,<br />

making possible any arbitrary selections. Additional <strong>PPL</strong> may be used with <strong>the</strong>se opera<strong>to</strong>rs <strong>to</strong> provide conse-


9.20 <strong>PPL</strong>: Modification of Character Variables<br />

quences after selections, recode data values and fur<strong>the</strong>r refine <strong>the</strong> selection criteria. The next section discusses <strong>the</strong><br />

MATCHES meta-characters and syntax, and Figure 9.6 summarizes <strong>the</strong>m.<br />

9.26 MATCHES: Meta-Characters and Syntax<br />

The asterisk “*”, which matches zero or more occurrences of any character, is <strong>the</strong> most useful and general wildcard<br />

character. Additional wildcard characters fur<strong>the</strong>r limit <strong>the</strong> pattern <strong>to</strong> be matched. The at-sign “@” matches<br />

zero or more blanks. It is useful if <strong>the</strong>re may be lead blanks that should be ignored. For example, this instruction:<br />

selects:<br />

selects:<br />

[IF Company MATCHES '@Con* Ed', RETAIN ]<br />

Con Ed coned<br />

Con Ed<br />

The question mark “?” matches any single character. This instruction:<br />

[ IF Company MATCHES '?Con* Ed?', RETAIN ]<br />

*CON ED*<br />

O<strong>the</strong>r wildcards match specific single characters. The crosshatch or number sign “#” matches any single digit, <strong>the</strong><br />

dollar sign “$” matches any single letter, and <strong>the</strong> underscore “_” matches a single blank. This instruction:<br />

selects this case:<br />

[ IF Company MATCHES 'Con_Ed', RETAIN ]<br />

Con Ed<br />

The underscore matches <strong>the</strong> single blank in <strong>the</strong> center. Strings with lead blanks are not selected because <strong>the</strong> pattern<br />

is anchored on <strong>the</strong> left. Trailing blanks are ignored.<br />

Character strings that contain meta-characters may be matched by escaping <strong>the</strong> meta-characters. Escaping<br />

removes <strong>the</strong> special meaning of a meta-character. The backslash “\” and <strong>the</strong> angle signs “< >” are escape characters.<br />

Any character directly after <strong>the</strong> slash or enclosed between <strong>the</strong> angle signs is treated as a literal character:<br />

[ IF Company MATCHES '\* * ', RETAIN ]<br />

This instruction selects:<br />

*CON ED*<br />

The first and third asterisks in <strong>the</strong> pattern are literal characters. They match only asterisks. The middle asterisk<br />

is a meta-character that matches zero or more of any characters. Thus, a string of two or more characters that begins<br />

and ends with an asterisk is selected.<br />

The escape characters are also used <strong>to</strong> match characters that may not print. The characters are referenced by<br />

<strong>the</strong>ir decimal or octal (base 8) integer equivalents. The slash is used for octal numbers and <strong>the</strong> angle signs are used<br />

for decimal numbers. This instruction:<br />

[ IF Company MATCHES '* *', RETAIN ]<br />

selects character strings containing a tab character in <strong>the</strong>m. (009 is <strong>the</strong> decimal equivalent of <strong>the</strong> tab character in<br />

<strong>the</strong> ASCII character codes.)<br />

Paren<strong>the</strong>ses and square brackets are enclosures that specify, respectively, a literal string of characters and a<br />

single character <strong>to</strong> match. A literal string of characters may be specified with or without paren<strong>the</strong>ses. These two<br />

instructions are equivalent:


<strong>PPL</strong>: Modification of Character Variables 9.21<br />

__________________________________________________________________________<br />

Figure 9.6 MATCHES and XMATCHES: Meta-Characters<br />

In General:<br />

< > [ ] * @ _ ? # $ 0 1 +<br />

Within [ ]:<br />

^ - _ # $ ]<br />

Within ( ):<br />

| _ )<br />

Escape Characters:<br />

\ < ><br />

Pattern Syntax:<br />

@ zero or more blanks<br />

* zero or more of any character<br />

? a single character<br />

# a single digit<br />

$ a single letter<br />

_ a single blank<br />

a literal character (an asterisk )<br />

\# a literal character (a crosshatch)<br />

a decimal number<br />

\009 an octal number<br />

Enclosures:<br />

( abc ) a literal string of characters<br />

abc same as ( abc )<br />

( abc | xyz ) abc or xyz<br />

[ abc ] a single character: a or b or c<br />

[ a-z ] a single letter in <strong>the</strong> range a through z<br />

[ $ ] same as [ a-z ]<br />

[ 0-9 ] a single number in <strong>the</strong> range 0 through 9<br />

[ # ] same as [ 0-9 ]<br />

[ _ ] a single blank<br />

[ ^$ ] a single character that is NOT a letter<br />

Repetitions after Enclosures:<br />

1 1 a single match (<strong>the</strong> default)<br />

0 1 zero or one matches<br />

0 + zero or more matches<br />

1 + one or more matches<br />

0 same as 0 1<br />

+ same as 1 +<br />

__________________________________________________________________________


9.22 <strong>PPL</strong>: Modification of Character Variables<br />

[ IF Company MATCHES '(Con) * (Ed) *', DELETE ]<br />

[ IF Company MATCHES ' Con * Ed *', DELETE ]<br />

Notice that <strong>the</strong> blanks in <strong>the</strong> pattern contribute <strong>to</strong> its readability. Blanks are ignored unless <strong>the</strong>y are escaped, and<br />

<strong>the</strong>y may be omitted if desired.<br />

Paren<strong>the</strong>ses are typically used when one character string or ano<strong>the</strong>r is <strong>to</strong> be matched:<br />

[ IF Company MATCHES '(Con | Corn) * Ed', RETAIN ]<br />

This instruction selects:<br />

Con Ed coned Corn Fed<br />

The vertical bar character “|” means “or” and <strong>the</strong> paren<strong>the</strong>ses limit <strong>the</strong> character strings that are <strong>to</strong> be “or-ed”. Note<br />

that merely <strong>the</strong> juxtaposition of character strings means “and” — that is, this pattern:<br />

[ IF Company MATCHES 'Con Corn * Ed', RETAIN ]<br />

matches only character values that have “ConCorn” followed by zero or more of any character followed by “Ed”.<br />

If a blank is sought between “Con” and “Corn”, <strong>the</strong> pattern should be specified with an underscore:<br />

[ IF Company MATCHES 'Con_Corn * Ed', RETAIN ]<br />

Square bracket enclosures specify a single character <strong>to</strong> match. Typically, that character may be one of several<br />

in <strong>the</strong> enclosure that is repeated a specified number of times. For example, this instruction:<br />

[ IF Company MATCHES 'Con [ _ n s ] * Ed *', RETAIN ]<br />

selects <strong>the</strong>se values:<br />

Con Ed Consolidated Education, <strong>Inc</strong>.<br />

Consulted, <strong>Inc</strong>. Connie Edward<br />

The string “Con” is followed by a blank, an “n” or an “s” and that is followed by zero or more of any characters,<br />

<strong>the</strong> string “Ed” and zero or more of any characters.<br />

A repetition specification may follow ei<strong>the</strong>r type of enclosure. Possible repetitions are: 11, 01, 0+ and 1+.<br />

The repetition 1 1 is <strong>the</strong> default that is assumed when nothing follows an enclosure — one match. The repetition<br />

0 1 means zero or one matches, 0 + means zero or more matches and 1 + means one or more matches. This<br />

instruction:<br />

selects:<br />

This:<br />

selects:<br />

[ IF Company MATCHES 'Co (n)1 + *', RETAIN ]<br />

Con Ed coned Consolidated Education, <strong>Inc</strong>.<br />

Consulted, <strong>Inc</strong>. Connie Edward Conway Medics<br />

Connie E. Dean, Assoc.<br />

[ IF Company MATCHES 'Co (n)0 1 *', RETAIN ]<br />

Con Ed coned Consolidated Education, <strong>Inc</strong>.<br />

Consulted, <strong>Inc</strong>. Connie Edward Conway Medics<br />

Corn Fed Connie E. Dean, Assoc.<br />

Notice that <strong>the</strong> repetition 1 + matches one or more occurrences of “n” and that 0 1 matches zero or one occurrence.<br />

Thus, Corn Fed is included in <strong>the</strong> second group of matches (it has zero occurrences of “n” after <strong>the</strong> “Co”). The


<strong>PPL</strong>: Modification of Character Variables 9.23<br />

strings with two occurrences of “n” are also included because of <strong>the</strong> wildcard “*” that makes any character valid<br />

after <strong>the</strong> zero or one “n”.<br />

It is also possible <strong>to</strong> specify a range from which a single character is valid as a match and characters that are<br />

not valid as matches. The square brackets are used. This instruction:<br />

selects:<br />

[ IF Company MATCHES '[ ^$ ^# ] *', RETAIN ]<br />

*CON ED*<br />

Con Ed<br />

The meta-character “^” means not. Thus, <strong>the</strong> enclosure above specifies a single character that is not a letter and<br />

not a number as <strong>the</strong> first character. (The “$” means any letter and <strong>the</strong> “#” means any number.) This instruction<br />

means <strong>the</strong> same thing, but uses ranges in <strong>the</strong> pattern instead of <strong>the</strong> meta-characters “$” and “#”:<br />

[ IF Company MATCHES '[ ^a-z ^A-Z ^0-9 ] *', RETAIN ]<br />

The hyphen “-” is used in ranges. When <strong>the</strong> “^” is omitted, <strong>the</strong>n any character in <strong>the</strong> specified range is a valid<br />

match. Figure 9.6 summarizes <strong>the</strong> meta-characters.<br />

9.27 CLAG: A Lag using a character argument<br />

CLAG is a function that performs a lag on a character argument, which can be an expression.<br />

GEN PREVIOUS.TITLE:c30 = CLAG( JOB.TITLE, 12 )<br />

This would take <strong>the</strong> JOB.TITLE value from 12 cases ago and copy it in<strong>to</strong> PREVIOUS.TITLE of <strong>the</strong> current case.<br />

The second argument, <strong>the</strong> lag depth, must be an integer constant from 1 <strong>to</strong> 500.<br />

9.28 CONCATENATION OF CHARACTER CONSTANTS<br />

There is a special opera<strong>to</strong>r (&&) that permits dynamic concatenation of character constants in a command or in<br />

<strong>PPL</strong>. It is most useful in situations such as macros where <strong>the</strong>re is an 80 character limit on record size.<br />

MAKE Myfile, FILE<br />

<br />

&&<br />

;<br />

There is no particular limit on <strong>the</strong> number of pieces and <strong>the</strong>y can be enclosed in ei<strong>the</strong>r angle brackets or quotation<br />

marks.<br />

[ GEN text:c200 =<br />

<br />

&&<br />

"The sentence in <strong>the</strong> text contains a single >. "<br />

&&<br />

'A third piece is needed <strong>to</strong> complete <strong>the</strong> variable.' ]<br />

In command text and in <strong>PPL</strong> <strong>the</strong> && structure may be used anywhere that a character constant<br />

may be used.


9.24 <strong>PPL</strong>: Modification of Character Variables<br />

<strong>PPL</strong><br />

SUMMARY<br />

Character variables are modified by functions and opera<strong>to</strong>rs. Some are specifically for character variables<br />

and o<strong>the</strong>rs may be used with ei<strong>the</strong>r character or numeric variables. Functions and opera<strong>to</strong>rs are<br />

grouped below according <strong>to</strong> <strong>the</strong>ir usages.<br />

<strong>PPL</strong> Functions: Character<br />

The arguments for character functions are expressions. The simplest expression is a variable name or<br />

position (vnp). Complex expressions are nested functions, numeric constants (nn), quoted character constants<br />

or strings ('cs'), or combinations of <strong>the</strong>se.<br />

Abbreviations following <strong>the</strong> functions indicate <strong>the</strong> type of argument that should result from <strong>the</strong> evaluation<br />

of <strong>the</strong> expression.<br />

BLANK (exp, loc, len)<br />

specifies a character value that is <strong>to</strong> have blank characters replace existing characters. The second argument<br />

in <strong>the</strong> BLANK function gives <strong>the</strong> start location. The third argument, which is optional, gives <strong>the</strong><br />

length of <strong>the</strong> area <strong>to</strong> be made blank. This example:<br />

[ GEN New.Tel:C8 = BLANK ( Tel, 5, 4 ) ]<br />

will replace characters five through eight of a telephone number with blanks. When <strong>the</strong> third argument<br />

is omitted, <strong>the</strong> expression is filled with blanks from <strong>the</strong> start location through <strong>the</strong> end.<br />

This example will blank out all but <strong>the</strong> first letter of Last.Name:<br />

LIST Diet.Clients<br />

[ KEEP Last.Name Weight Pounds.Lost ;<br />

SET Last.Name = BLANK ( Last.Name, 2 ) ] $<br />

An alternate usage mode exists for <strong>the</strong> BLANK function. Its general format is:<br />

BLANK (exp, old, nn)<br />

The first argument specifies a character value that may contain a specified substring; if so, it is <strong>to</strong> be<br />

blanked. The second argument yields <strong>the</strong> character string that may be present in <strong>the</strong> first expression.<br />

When it is present, it is replaced by blank characters. The optional third argument is <strong>the</strong> number of times<br />

<strong>to</strong> find and replace <strong>the</strong> character string; one change is assumed. This <strong>PPL</strong> phrase:<br />

[ SET Comments = BLANK ( Comments, 'damn', 10 ) ]<br />

replaces up <strong>to</strong> ten occurrences of <strong>the</strong> word “damn” in <strong>the</strong> variable Comments with an equivalent number<br />

of blanks.<br />

XBLANK (exp, old, nn)<br />

blanks out specified characters — functions just like BLANK's alternate mode of operation, but respects<br />

<strong>the</strong> case (upper, lower or mixed) of <strong>the</strong> “old” string:<br />

[ SET Symp<strong>to</strong>m = XBLANK ( Symp<strong>to</strong>m, 'D', 9 ) ]<br />

Upper-case “D” is blanked out in values of Symp<strong>to</strong>m; lower-case “d" is ignored.<br />

loc=location len=length lim=delimiter exp=expression nn=number cs=char string


<strong>PPL</strong>: Modification of Character Variables 9.25<br />

CAPS (exp)<br />

capitalizes <strong>the</strong> first letter of each <strong>to</strong>ken (word) in a character value:<br />

[ GEN Name:C = CAPS ( 'JOHN paul JoNeS' ) ]<br />

Letters o<strong>the</strong>r than <strong>the</strong> first are changed <strong>to</strong> lower case. The default is that <strong>to</strong>kens are separated by blanks.<br />

The output appears like this:<br />

John Paul Jones<br />

In a more complex usage, an additional or replacement <strong>to</strong>ken delimiter is supplied in quotes as a second<br />

argument:<br />

[ GEN Name:C = CAPS ( 'ann hayden-jones', '- ' ) ]<br />

CENTER (exp)<br />

centers a character value in its field:<br />

[ SET Surname = CENTER ( Surname ) ]<br />

CHANGE (exp, old, new, nn)<br />

specifies a character value possibly containing an “old” character string that is <strong>to</strong> be changed <strong>to</strong> a “new”<br />

character string. The first argument <strong>to</strong> <strong>the</strong> CHANGE function is a character expressio.n The second argument<br />

is <strong>the</strong> old string and <strong>the</strong> third is <strong>the</strong> new string. An optional fourth argument is <strong>the</strong> maximum<br />

number of changes <strong>to</strong> make per value; one change is assumed.<br />

In this example:<br />

[ SET College = CHANGE ( College, 'University', 'Univ', 3 ) ]<br />

<strong>the</strong> character variable College will have old values of “University” changed <strong>to</strong> new values of “Univ”. A<br />

maximum of three such changes per value of College is specified. The resultant values of College will<br />

be shorter wherever this change is made, and <strong>the</strong> listing may be more attractive with <strong>the</strong> abbreviation.<br />

CHANGE, without a new argument, removes <strong>the</strong> old string:<br />

[ SET College = CHANGE ( College, 'ersity', 3 ) ]<br />

The old string “ersity” is changed <strong>to</strong> a null string. This (probably) achieves <strong>the</strong> same result as <strong>the</strong> previous<br />

example.<br />

An alternate usage of <strong>the</strong> CHANGE function has <strong>the</strong> format:<br />

CHANGE ( exp, loc, len, new )<br />

The first argument is a character expression, <strong>the</strong> second is <strong>the</strong> start location of <strong>the</strong> old string, <strong>the</strong> third is<br />

<strong>the</strong> length of <strong>the</strong> old string, and <strong>the</strong> fourth is <strong>the</strong> new character string:<br />

[ SET Date = CHANGE ( Date, 7, 2, '85' ) ]<br />

Values of Date in <strong>the</strong> form 11/08/84 are changed <strong>to</strong> 11/08/85.<br />

XCHANGE (exp, old, new, nn)<br />

changes character strings — functions just like CHANGE, but respects <strong>the</strong> case of <strong>the</strong> “old” string:<br />

[ SET Sal = CHANGE ( Sal, 'm', 'Ms.', 1 ) ]<br />

Only lower-case “m” would be changed.<br />

CHARACTER (exp)<br />

converts a number in<strong>to</strong> its character equivalent:<br />

[ GEN Code:C3 = CHARACTER ( Area.Code ) ]<br />

exp=expression nn=number cs=char string loc=location len=length lim=delimiter


9.26 <strong>PPL</strong>: Modification of Character Variables<br />

It is possible <strong>to</strong> use CHARACTER followed by a second argument indicating <strong>the</strong> number of decimal<br />

places <strong>to</strong> preserve in <strong>the</strong> expression. This might be useful if you have income values with decimal places<br />

carried up <strong>to</strong> four places and you wish <strong>to</strong> specify only two decimal places. The second argument indicates<br />

<strong>the</strong> number of places <strong>to</strong> carry in <strong>the</strong> expression:<br />

LIST Salary [ GEN <strong>Inc</strong>ome:C = CHARACTER ( Salary, 2 ) ] $<br />

CHARACTER may also be followed by a third argument that indicates <strong>the</strong> maximum number of places<br />

<strong>to</strong> print:<br />

LIST Salary [ GEN <strong>Inc</strong>ome:C = CHARACTER ( Salary, 2, 3 ) ] $<br />

MAKE.CHARACTER described in <strong>the</strong> manual “P-<strong>STAT</strong>: File Management” can be used <strong>to</strong> change a<br />

numeric variable in<strong>to</strong> a character variable or <strong>to</strong> resize a character variable.<br />

CHAREX (exp, 'XX00')<br />

extracts specific digits from a numeric value and yields a character representation of those digits. CHA-<br />

REX operates only on <strong>the</strong> integer portion of <strong>the</strong> number — any fractional portion and sign are ignored.<br />

The two required arguments are a numeric expression and a character string mask enclosed in quotes:<br />

[ GEN Month:C2 = CHAREX ( Date, 'XX00' ) ]<br />

The selection mask is composed of X and 0 (zero) characters and may be up <strong>to</strong> twelve characters in<br />

length. An X retains a digit and a 0 drops a digit. The selection mask is aligned with <strong>the</strong> right-most digit<br />

of <strong>the</strong> numeric value. The numeric function NUMEX does much <strong>the</strong> same thing but yields a numeric<br />

result without <strong>the</strong> lead zeros.<br />

CLAG (exp, nn )<br />

CLAG is function that performs a lag on a character argument, which can be an expression.<br />

COMPRESS (exp)<br />

squeezes all blanks out of a character value:<br />

[ SET Text = COMPRESS ( Text ) ]<br />

Leading, trailing and embedded blanks are removed.<br />

An expanded mode of usage has optional arguments:<br />

COMPRESS ( exp, nn, lim )<br />

The second argument is <strong>the</strong> number of delimiter characters that should remain between <strong>to</strong>kens (“words”).<br />

The third argument is an alternate delimiter character or characters o<strong>the</strong>r than <strong>the</strong> blank — a <strong>to</strong>ken separa<strong>to</strong>r.<br />

This example:<br />

[ SET Text = COMPRESS ( Text, 1 ) ]<br />

squeezes out all blanks but one from between words. This:<br />

[ GEN Amount = NUMBER ( COMPRESS ( Money, ', $' ) ) ]<br />

generates a numeric variable, Amount, equal <strong>to</strong> <strong>the</strong> character variable Money after all commas, blanks<br />

and currency signs are compressed out.<br />

CVAL (nn)<br />

gives <strong>the</strong> character equivalent of <strong>the</strong> specified decimal number. This is used when unusual characters<br />

that cannot be entered or printed on <strong>the</strong> terminal screen are desired. Often <strong>the</strong>se characters can be produced<br />

on a printer. An example of this is:<br />

[ PUT (CVAL(27)) 'R' (CVAL(7))<br />

(CVAL(221)) “Hable Ud. Espa” (CVAL(252)) “ol?” ]<br />

loc=location len=length lim=delimiter exp=expression nn=number cs=char string


<strong>PPL</strong>: Modification of Character Variables 9.27<br />

where <strong>the</strong> CVAL of <strong>the</strong> number 27 followed by “R” and <strong>the</strong> CVAL of 7 specifies <strong>the</strong> Spanish international<br />

character set. The CVAL of <strong>the</strong> numbers 221 and 252 yields <strong>the</strong> upside-down question mark and<br />

<strong>the</strong> Spanish “n”, respectively.<br />

IVAL ('c')<br />

gives <strong>the</strong> integer equivalent of <strong>the</strong> first character of a character value. This is <strong>the</strong> opposite of CVAL, described<br />

above. Since <strong>the</strong> integer equivalents of uppercase and lowercase characters are different, IVAL<br />

can be used in tests of equality of character values that respect case. (See <strong>the</strong> XEQ opera<strong>to</strong>r also.)<br />

LEFT (exp)<br />

left-justifies a character value in its field:<br />

[ SET Street.Address = LEFT ( Street.Address ) ]<br />

LENGTH (exp)<br />

yields a numeric value length, which is <strong>the</strong> location of <strong>the</strong> right-most non-blank character:<br />

[ GEN HS.Length = LENGTH ( High.School ) ]<br />

(Leading and embedded blanks are included in <strong>the</strong> count.)<br />

LOWER (exp)<br />

converts a character value <strong>to</strong> lowercase characters:<br />

[ SET Region = LOWER ( Region ) ]<br />

NUMBER (exp)<br />

converts a character value in<strong>to</strong> a number:<br />

[ GEN Year = NUMBER ( SUBSTRING ( Date, 7, 2 ) ) ]<br />

If <strong>the</strong> character value is all blank, <strong>the</strong> result is set <strong>to</strong> missing type 1. If <strong>the</strong> character value contains characters<br />

o<strong>the</strong>r than numbers, <strong>the</strong> result is set <strong>to</strong> missing type 2. If <strong>the</strong> character value is missing, <strong>the</strong> result<br />

is set <strong>to</strong> missing type 3.<br />

NUMBER may be suffixed with ei<strong>the</strong>r “.W” or “.E”. NUMBER.W issues a warning and NUMBER.E<br />

s<strong>to</strong>ps <strong>the</strong> command with an error message when <strong>the</strong> character value contains characters o<strong>the</strong>r than<br />

numbers.<br />

If you wish <strong>to</strong> change <strong>the</strong> type of a character variable <strong>to</strong> numeric, you may use <strong>the</strong> MAKE.NUMERIC<br />

command which is described in manual “P-<strong>STAT</strong>” File Management”.<br />

PAD (exp, len, fill)<br />

specifies a character value which is <strong>to</strong> be “padded” on <strong>the</strong> right side. The first argument <strong>to</strong> PAD is <strong>the</strong><br />

character expression <strong>to</strong> be padded, <strong>the</strong> second is <strong>the</strong> minimum length, and <strong>the</strong> third is an optional fill character<br />

<strong>to</strong> be used for padding. If only one argument is supplied, a minimum length of 1 and a blank fill<br />

character are assumed. This example:<br />

[ GEN Zip:C10 = PAD ( Zipcode, 10, '-' ) ]<br />

pads <strong>the</strong> variable Zipcode with dashes on <strong>the</strong> right side. If Zipcode initially has a length of five, it is padded<br />

until its length is ten. If Zipcode initially has a length of ten or more characters, it is not be padded.<br />

RPAD is a synonym for PAD.<br />

LPAD (exp, len, fill)<br />

specifies a character value which is <strong>to</strong> be “padded” with blanks or a supplied fill character on <strong>the</strong> left side.<br />

This example:<br />

exp=expression nn=number cs=char string loc=location len=length lim=delimiter


9.28 <strong>PPL</strong>: Modification of Character Variables<br />

[ SET Message = LPAD ( LRTRIM ( Message ), 16, '>' ) ]<br />

pads <strong>the</strong> trimmed variable Message with <strong>the</strong> “>” character on <strong>the</strong> left side. See also PAD and LRPAD.<br />

LRPAD (exp, len, fill)<br />

specifies a character value, a minimum length, and an optional fill character <strong>to</strong> be used for padding <strong>the</strong><br />

character value. Padding will occur evenly on both <strong>the</strong> left and right sides of <strong>the</strong> expression. The right<br />

side will be padded first. If <strong>the</strong> character expression is already equal <strong>to</strong> or greater than <strong>the</strong> specified<br />

length, no padding will take place. A length of 1 and a blank fill character are assumed when none are<br />

specified. See also LPAD and PAD. This is often used on an LRTRIM result.<br />

POSITION (exp, 'cs')<br />

yields a numeric value which is <strong>the</strong> position of <strong>the</strong> character string within <strong>the</strong> character value:<br />

[ GEN Blank.Location = POSITION ( Name, ' ' ) ]<br />

Values match regardless of whe<strong>the</strong>r <strong>the</strong>y are uppercase or lowercase. If <strong>the</strong> second value is not found,<br />

<strong>the</strong> result is zero.<br />

A more complex usage of POSITION is also possible. The general form is:<br />

POSITION ( exp, 'cs', 'cs', 'cs',..., len )<br />

Additional optional arguments are multiple character strings whose positions in <strong>the</strong> character variable are<br />

sought. Only <strong>the</strong> left-most position of any successfully located string is given.<br />

An integer between 1 and 50,000 may be supplied as <strong>the</strong> right-most argument giving a length. It permits<br />

<strong>the</strong> contents of <strong>the</strong> character strings, whose positions are being sought, <strong>to</strong> be divided in<strong>to</strong> strings of <strong>the</strong><br />

specified length. Each portion of <strong>the</strong> divided string is treated as a separate argument and its position is<br />

located. The character values must be evenly divisible by <strong>the</strong> length.<br />

Examples of usages include:<br />

POSITION ( City.State, ',' , '.' )<br />

POSITION ( 'ABCDEF', 'AC', 'BC', 'DE', 'DF' )<br />

POSITION ( City.State, 'NYNJ', 2 )<br />

POSITION ( 'ABCDEF', 'AEIOU', 1 )<br />

XPOSITION (exp, 'cs')<br />

gives <strong>the</strong> position of <strong>the</strong> specified character string in <strong>the</strong> value — works just like POSITION, but respects<br />

<strong>the</strong> case of <strong>the</strong> character string in searching.<br />

RIGHT (exp)<br />

right-justifies a character value in its field:<br />

[ SET Zip = RIGHT ( Zip) ]<br />

SIZE (exp)<br />

yields a numeric value giving <strong>the</strong> size of <strong>the</strong> specified character value. The size includes any blanks, embedded<br />

or o<strong>the</strong>rwise. It is typically ei<strong>the</strong>r <strong>the</strong> defined size or <strong>the</strong> size resulting after various character<br />

function procedures or operations.<br />

SUBSTRING (exp, loc, len)<br />

yields a character value which is <strong>the</strong> string beginning in <strong>the</strong> location specified by <strong>the</strong> second argument<br />

and of <strong>the</strong> length specified by <strong>the</strong> third argument. If <strong>the</strong> optional starting location is not given, it is assumed<br />

<strong>to</strong> be 1. If <strong>the</strong> optional length is not given, it is assumed <strong>to</strong> be <strong>the</strong> remainder of <strong>the</strong> string. For<br />

example:<br />

loc=location len=length lim=delimiter exp=expression nn=number cs=char string


<strong>PPL</strong>: Modification of Character Variables 9.29<br />

[ GEN Initial:C1 = SUBSTRING ( LEFT ( Name ), 1, 1 ]<br />

yields <strong>the</strong> substring of name beginning at <strong>the</strong> first character and one character long.<br />

TOKEN (exp, nn, lim)<br />

accesses a portion of a longer character string:<br />

[ GEN First.Name:C = TOKEN ( Name ) ]<br />

The first <strong>to</strong>ken starts with <strong>the</strong> first non-delimiter on <strong>the</strong> left and continues until a subsequent delimiter is<br />

found. TOKEN accesses <strong>the</strong> first <strong>to</strong>ken unless <strong>the</strong> optional second argument specifies a <strong>to</strong>ken in ano<strong>the</strong>r<br />

position. The assumed delimiter is a blank unless <strong>the</strong> optional third argument specifies ano<strong>the</strong>r delimiter<br />

or delimiters. LTOKEN is a synonym for TOKEN.<br />

RTOKEN (exp, nn, lim)<br />

accesses <strong>to</strong>kens counting from <strong>the</strong> right:<br />

[ GEN Last.Name:C = RTOKEN ( Name ) ;<br />

IF RTOKEN ( Name ) CONTAINS '.',<br />

SET Last.Name = RTOKEN ( Name, 2 ) ]<br />

NTOKEN (exp, lim)<br />

yields a numeric value that is a count of <strong>the</strong> <strong>to</strong>kens in <strong>the</strong> first character value. If <strong>the</strong> optional second<br />

argument is not provided, <strong>the</strong> delimiter between <strong>to</strong>kens is assumed <strong>to</strong> be <strong>the</strong> blank:<br />

[ GEN #Middle.Name:C;<br />

GEN #Number = NTOKEN ( Name ) ;<br />

IF #Number GT 2,<br />

SET Middle.Name = TOKEN ( Name, 2 ) ]<br />

TRIM (exp, nn, 'cs')<br />

specifies a character value that may have characters trimmed from <strong>the</strong> right side, and <strong>the</strong> characters <strong>to</strong><br />

trim. An optional number limits that number of characters <strong>to</strong> be trimmed. The resultant character value<br />

will have a shorter size if <strong>the</strong> specified characters exist on <strong>the</strong> right and trimming occurs. If a trim character<br />

is not specified, <strong>the</strong> blank character is assumed:<br />

[ SET Name = TRIM ( Name ) ]<br />

In this example, <strong>the</strong> variable Name will be set equal <strong>to</strong> Name with all blank characters trimmed off from<br />

<strong>the</strong> right side. Multiple trim characters may be specified:<br />

[ GEN Text:C = TRIM ( Var1, '.,-' ) ]<br />

Any of <strong>the</strong> specified characters occurring on <strong>the</strong> right end of <strong>the</strong> variable Var1 will be removed. RTRIM<br />

is a synonym for TRIM. See also LTRIM and LRTRIM.<br />

LTRIM (exp, nn, 'cs')<br />

specifies a character value that may have characters trimmed from <strong>the</strong> left side, and <strong>the</strong> characters <strong>to</strong> trim.<br />

All matching characters will be trimmed unless <strong>the</strong> optional second argument specifies a limit. See also<br />

RTRIM and LRTRIM.<br />

LRTRIM (exp, nn, 'cs')<br />

specifies a character value that may have characters trimmed from both <strong>the</strong> left and right sides. The characters<br />

<strong>to</strong> be trimmed are an optional argument. If no characters are specified, blank characters will be<br />

trimmed. When trimming takes place, <strong>the</strong> resultant variable size may be shorter than it was initially. LR-<br />

exp=expression nn=number cs=char string loc=location len=length lim=delimiter


9.30 <strong>PPL</strong>: Modification of Character Variables<br />

PAD is often done on an LRTRIM result. The second argument is optional. If it is used, it limits <strong>the</strong><br />

number of characters that are trimmed.<br />

UPPER (exp)<br />

converts a character value <strong>to</strong> uppercase characters:<br />

[ SET State = UPPER ( State ) ]<br />

VARNAME (exp)<br />

yields a character value that is <strong>the</strong> name of <strong>the</strong> variable in <strong>the</strong> expression:<br />

[ GEN Primary.Disease:C = .M. ;<br />

DO #J USING Heart TO Skin ;<br />

IF V(#J) EQ 1 AND Primary.Disease EQ .M. ,<br />

SET Primary.Disease = VARNAME ( #J );<br />

ENDDO ]<br />

The new character variable, Primary.Disease, will have values of missing, unless any of <strong>the</strong> variables<br />

Heart through Skin has a value of 1. Then Primary.Disease will have <strong>the</strong> name of <strong>the</strong> first of those variables<br />

as its value.<br />

VERIFY (exp, 'cs', 'cs')<br />

yields a numeric value which is <strong>the</strong> location of <strong>the</strong> first character in <strong>the</strong> initial argument which is NOT<br />

found in any of <strong>the</strong> remaining arguments. Thus, <strong>the</strong> presence of only specified characters may be<br />

verified:<br />

[ GEN Error =<br />

VERIFY ( Char.<strong>Inc</strong>ome, '0123456789', ' $.,' ]<br />

Arguments 2 and 3 could have been combined in<strong>to</strong> a single argument.<br />

<strong>PPL</strong> Functions: Character and Numeric<br />

The following functions operate on ei<strong>the</strong>r character or numeric variable lists. However, numeric and<br />

character variables may not be combined in one list. The list may reference variables by name or position.<br />

The functions operate on character variables in <strong>the</strong> same manner that <strong>the</strong>y operate on numeric<br />

variables.<br />

COUNT.GOOD (vnp, vnp)<br />

gives <strong>the</strong> number of non-missing values of <strong>the</strong> variables specified in <strong>the</strong> list. Only variable names or positions<br />

may be included in <strong>the</strong> list.<br />

FIRST.GOOD (vnp, vnp)<br />

gives <strong>the</strong> first good (non-missing) value of <strong>the</strong> variables specified in <strong>the</strong> list. Only variable names or positions<br />

may be included in <strong>the</strong> list.<br />

LAST.GOOD (vnp, vnp)<br />

gives <strong>the</strong> last good (non-missing) value of <strong>the</strong> variables specified in <strong>the</strong> list. Only variable names or positions<br />

may be included in <strong>the</strong> list.<br />

<strong>PPL</strong> Opera<strong>to</strong>rs: Character<br />

Concatenation opera<strong>to</strong>rs are used <strong>to</strong> combine character values or expressions.<br />

loc=location len=length lim=delimiter exp=expression nn=number cs=char string


<strong>PPL</strong>: Modification of Character Variables 9.31<br />

MODIFY List<br />

[ GEN Mail.Name:C36;<br />

IF Sex EQ 1, SET Mail.Name =<br />

'Mr. ' // First.Name /// Last.Name ;<br />

IF Sex EQ 2, SET Mail.Name =<br />

'Ms. ' // First.Name /// Last.Name ], OUT Mail $<br />

// concatenation<br />

connects <strong>the</strong> character strings before and after <strong>the</strong> double slashes:<br />

[ GEN Name:C32 = First.Name // ' ' // Last.Name ]<br />

The double slash opera<strong>to</strong>r abuts <strong>the</strong> character strings end-<strong>to</strong>-end. Blank portions of each field are<br />

included:<br />

Jennifer Smith<br />

/// trim concatenation<br />

concatenation connects <strong>the</strong> two character strings after trimming leading and trailing blanks.<br />

[ GEN Name:C32 = First.Name /// Last.Name ]<br />

The triple slash opera<strong>to</strong>r abuts <strong>the</strong> trimmed character strings end-<strong>to</strong>-end and <strong>the</strong>n inserts a blank between<br />

<strong>the</strong> strings:<br />

Jennifer Smith<br />

&& dynamic concatenation of character constants<br />

character constants can be dynamically concatenated in <strong>the</strong> command language and <strong>PPL</strong> by using <strong>the</strong> &&<br />

opera<strong>to</strong>r.<br />

[ GEN Cvar:c130 = && "bbb" && 'ccc' ]<br />

<strong>PPL</strong> Opera<strong>to</strong>rs: Logical<br />

In general, <strong>the</strong> following logical opera<strong>to</strong>rs evaluate two expressions. The expressions may be variables,<br />

values and functions, except for:<br />

• AMONG and NOTAMONG, whose arguments are lists of values and variables,<br />

• GOOD and MISSING, which do not have arguments, and<br />

• MATCHES, which has a character string argument.<br />

All of <strong>the</strong>se logical opera<strong>to</strong>rs are appropriate for character data. AMONG, NOTAMONG, GOOD and<br />

MISSING are also appropriate for numeric data. Numeric and character expressions may not be mixed<br />

in one argument list. Character constants must be enclosed in quotes.<br />

AMONG (list of values and variables)<br />

is true, false or missing depending on whe<strong>the</strong>r a value is one of <strong>the</strong> specified values:<br />

[ IF State AMONG ( 'NJ', 'N.J.', 'New Jersey' ), RETAIN ]<br />

or in <strong>the</strong> specified range:<br />

[ IF Name AMONG ('A' TO 'FZZ' ), RETAIN ]<br />

exp=expression nn=number cs=char string loc=location len=length lim=delimiter


9.32 <strong>PPL</strong>: Modification of Character Variables<br />

<strong>Inc</strong>lusion in <strong>the</strong> range is based on <strong>the</strong> sort order of <strong>the</strong> character strings, which may differ among<br />

computers.<br />

XAMONG (list of values and variables)<br />

specifies case-respecting comparisons — like AMONG in all o<strong>the</strong>r aspects.<br />

CONTAINS 'cs' or exp<br />

is true, false or missing, depending on whe<strong>the</strong>r <strong>the</strong> character value argument is present:<br />

[ IF Address CONTAINS '08540' , RETAIN ]<br />

[ IF Address CONTAINS TRIM( Zip ), RETAIN ]<br />

In <strong>the</strong> first example, cases with values of Address containing “08540” are retained; in <strong>the</strong> second, cases<br />

in which <strong>the</strong> Zip characters are also present in Address are retained.<br />

XCONTAINS 'cs' or exp<br />

specifies case-respecting evaluations — like CONTAINS in all o<strong>the</strong>r aspects.<br />

XEQ exactly EQ<br />

GOOD<br />

tests whe<strong>the</strong>r two character expressions are exactly equal in both specific characters and case.<br />

[ IF Initials XEQ 'JW', SET Name = 'Jim Wolf, Sr.' ]<br />

[ IF Initials XEQ 'jw', SET Name = 'Jim Wolf, Jr.' ]<br />

The opera<strong>to</strong>rs XNE, XLT, XLE, XGT and XGE are similar — case and characters must be identical in<br />

string comparisons.<br />

is true or false depending on whe<strong>the</strong>r <strong>the</strong> value is present (good) or missing. GOOD combines = and .G. :<br />

[ IF Address GOOD , RETAIN ] or<br />

[ IF Address = .G. , RETAIN ]<br />

MATCHES 'cs'<br />

is true, false or missing, depending on whe<strong>the</strong>r <strong>the</strong> character string argument matches <strong>the</strong> value of a character<br />

variable. The case of <strong>the</strong> characters is not significant. The character string argument may include<br />

meta-characters that define or limit matches:<br />

[ IF Food MATCHES '*beef*', RETAIN ]<br />

In this example, <strong>the</strong> meta-character “*” is a wildcard that matches zero or more occurrences of any character.<br />

Thus, any cases in which <strong>the</strong> variable Food contains <strong>the</strong> string “beef” are continued. Values of<br />

Food such as <strong>the</strong> following are considered matches:<br />

Beef Roast Beef Beefsteak<br />

Some of <strong>the</strong> meta-characters that may be used in <strong>the</strong> character string argument are:<br />

* zero or more of any character @ zero or more blanks<br />

? a single character # a single digit<br />

_ a single blank $ a single letter<br />

\# a literal character (<strong>the</strong> #) a literal character (<strong>the</strong> *)<br />

(abc) a literal string (ab|bc) ab or bc<br />

abc same as (abc) [abc] a or b or c<br />

[a-z] a single letter in this range [$] same as [a-z]<br />

loc=location len=length lim=delimiter exp=expression nn=number cs=char string


<strong>PPL</strong>: Modification of Character Variables 9.33<br />

[0-9] a single number in this range [#] same as [0-9]<br />

[ _ ] a single blank [^$] a single character that is not a letter<br />

[#]11 a single match [#]01 zero or one matches<br />

[#]0+ zero or more matches [#]1+ one or more matches<br />

XMATCHES 'cs'<br />

MISSING<br />

specifies case-respecting matches — like MATCHES in all o<strong>the</strong>r aspects.<br />

is true or false depending on whe<strong>the</strong>r <strong>the</strong> value is present (good) or missing. MISSING combines = and<br />

.M. :<br />

[ IF Address MISSING , DELETE ] or<br />

[ IF Address EQ .M. , DELETE ]<br />

NOTAMONG (list of values and variables)<br />

is true, false or missing depending on whe<strong>the</strong>r a value is not among <strong>the</strong> specified values:<br />

[ IF State NOTAMONG<br />

( 'NJ', 'N.J.', 'New Jersey' ), DELETE ]<br />

or not in <strong>the</strong> specified range:<br />

[ IF Name NOTAMONG ( 'A' TO 'FZZ' ), DELETE )<br />

XNOTAMONG (list of values and variables)<br />

specifies case-respecting comparisons — like NOTAMONG in all o<strong>the</strong>r aspects.<br />

exp=expression nn=number cs=char string loc=location len=length lim=delimiter


10<br />

<strong>PPL</strong>: Date and Time<br />

Commands and Functions<br />

The first section of this chapter describes <strong>the</strong> default format of date values, and describes <strong>the</strong> extensive set of date<br />

and time functions such as ADD.DAYS. The second section describes eight commands that may be used <strong>to</strong><br />

change <strong>the</strong> default ordering and appearance of new date values. The third section describes <strong>the</strong> six date-related<br />

logical opera<strong>to</strong>rs in <strong>PPL</strong> that compare dates. The final section contains complete details on <strong>the</strong> FORMAT.DATE<br />

function which is used <strong>to</strong> provide templates that describe exactly how a date should appear in <strong>the</strong> prin<strong>to</strong>ut.<br />

10.1 DATE ANDTIME FUNCTIONS<br />

A date value is not a special datatype, it is simply a P-<strong>STAT</strong> character value that contains a 4-digit year from 1753<br />

<strong>to</strong> 2999, a month value and a day from 1 <strong>to</strong> 31. It may have time in hh:mm:ss or hh:mm form. The seconds may<br />

have up <strong>to</strong> 3 places, like 12:13:14.567 . It may also have <strong>the</strong> day of <strong>the</strong> week.<br />

Most date functions read an input date, do something <strong>to</strong> it, and write a date result, formatted in <strong>the</strong> same way<br />

as <strong>the</strong> input value. By formatting, we mean <strong>the</strong> ordering of <strong>the</strong> fields within <strong>the</strong> date value, such as ‘jan 1 1992’<br />

or ‘1992 January 1’ or such.<br />

A function like CURRENT.DATE has no input <strong>to</strong> serve as a format for <strong>the</strong> output, so it uses <strong>the</strong> default format.<br />

A P-<strong>STAT</strong> run begins with <strong>the</strong> default format looking like<br />

'Tues Jan 1, 2002 18:52:04' .<br />

Note <strong>the</strong> size: <strong>the</strong>se formats can use 30 or more characters.<br />

This default appearance can be changed by <strong>the</strong> DATE.ORDER command, and by several o<strong>the</strong>r commands<br />

that control things like a month name appearing as Jan or jan or January, etc. FORMAT.DATE, described in <strong>the</strong><br />

final section is a general and powerful function for specifying <strong>the</strong> exact appearance <strong>to</strong> be used when including<br />

dates in <strong>the</strong> printed output.<br />

10.2 Functions Which create or Use Dates<br />

1. DAY.MONTH.YEAR creates date from integer or character argument.<br />

2. DAY.YEAR.MONTH creates date from integer or character argument.<br />

3. MONTH.DAY.YEAR creates date from integer or character argument.<br />

4. MONTH.YEAR.DAY creates date from integer or character argument.<br />

5. YEAR.DAY.MONTH creates date from integer or character argument.<br />

6. YEAR.MONTH.DAY creates date from integer or character argument.<br />

7. MAKE.DATE creates a date from numeric input.<br />

8. CURRENT.DATE provides <strong>to</strong>day’s date and time.<br />

9. REFORMAT.DATE changes <strong>the</strong> format of a date value.<br />

10. <strong>STAT</strong>US.DATE shows if a date is valid, if it has time, etc.


10.2 <strong>PPL</strong>:Date and Time Commands and Functions<br />

11. DAYS returns days since 1/1/1753 for a date.<br />

12. SECONDS returns seconds since 1/1/1753 for a date.<br />

13. SECONDS.MIDNIGHT returns seconds since midnight for a date.<br />

14. UNDO.DAYS reverses <strong>the</strong> DAYS function.<br />

15. UNDO.SECONDS reverses <strong>the</strong> SECONDS function.<br />

16. FISCAL.YEAR returns <strong>the</strong> fiscal year of a date.<br />

17. FISCAL.QUARTER returns <strong>the</strong> fiscal quarter of a date.<br />

18. QUARTER returns <strong>the</strong> calendar quarter of a date.<br />

19. DAY.WITHIN.WEEK returns 1 <strong>to</strong> 7, <strong>the</strong> day within a week..<br />

20. DAY.WITHIN.YEAR returns 1 <strong>to</strong> 366, <strong>the</strong> day within a year<br />

21. WEEK.WITHIN YEAR retursn 0 <strong>to</strong> 53, <strong>the</strong> week withi<br />

22. ADD.MONTHS add some months <strong>to</strong> a date.<br />

23. ADD.DAYS add some days <strong>to</strong> a date.<br />

24. ADD.HOURS add some hours <strong>to</strong> a date.<br />

25. ADD.MINUTES add some minutes <strong>to</strong> a date.<br />

26. ADD.SECONDS add some seconds <strong>to</strong> a date.<br />

27. SUBTRACT.YEARS subtract some years from a date.<br />

28. SUBTRACT.MONTHS subtract some months from a date.<br />

29. SUBTRACT.DAYS subtract some days from a date.<br />

30. SUBTRACT.HOURS subtract some hours from a date.<br />

31. SUBTRACT.MINUTES subtract some minutes from a date.<br />

32. SUBTRACT.SECONDS subtract some seconds from a date.<br />

33. EXTRACT.YEARS return numeric years from a date.<br />

34. EXTRACT.MONTHS return numeric months from a date.<br />

35. EXTRACT.DAYS return numeric days from a date.<br />

36. EXTRACT.HOURS return numeric hours from a date.<br />

37. EXTRACT.MINUTES return numeric minutes from a date.<br />

38. EXTRACT.SECONDS return numeric seconds from a date.<br />

39. EXTRACT.CC return 2-digit numeric century from a date. (19 from 1983)<br />

40. EXTRACT.YY return 2-digit numeric year within century. (83 from 1983)<br />

41. EXTRACT.DATE return a copy of <strong>the</strong> input, dropping time.<br />

42. EXTRACT.TIME return a copy of <strong>the</strong> input, dropping date.


<strong>PPL</strong>: Date and Time Commands and Functions 10.3<br />

43. EXTRACT WEEKDAY return <strong>the</strong> weekday name.<br />

44. CHANGE.YEARS change <strong>the</strong> years field in a date.<br />

45. CHANGE.MONTHS change <strong>the</strong> months field in a date.<br />

46. CHANGE.DAYS change <strong>the</strong> days field in a date.<br />

47. CHANGE.HOURS change <strong>the</strong> hours field in a date.<br />

48. CHANGE.MINUTES change <strong>the</strong> minutes field in a date.<br />

49. CHANGE.SECONDS change <strong>the</strong> seconds field in a date.<br />

50. DIF.YEARS difference between 2 dates in years.<br />

51. DIF.MONTHS difference between 2 dates in months.<br />

52. DIF.DAYS difference between 2 dates in days.<br />

53. DIF.HOURS difference between 2 dates in hours.<br />

54. DIF.MINUTES difference between 2 dates in minutes.<br />

55. DIF.SECONDS difference between 2 dates in seconds.<br />

10.3 Six Simple Date Functions<br />

The 6 simple date functions make a character date value from ei<strong>the</strong>r numeric input like 12252005 or<br />

20051225, or from character input like ’12/25/2005’. The order of <strong>the</strong> three segments should be consistent<br />

with <strong>the</strong> function name. In o<strong>the</strong>r words, if 12252005 is meant <strong>to</strong> be month 12, day 25, and year<br />

2005, <strong>the</strong> MONTH.DAY.YEAR function should be chosen.<br />

For example:<br />

MONTH.DAY.YEAR ( 12252005 ) = ’Sun Dec 25, 2005’<br />

MONTH.DAY.YEAR ( ’12--25--2005’ ) = ’Sun Dec 25, 2005’<br />

A numeric argument can be an integer of 3 <strong>to</strong> 8 digits. If 3, 4 or 5 digits, lead zeros are assumed <strong>to</strong><br />

bring it up <strong>to</strong> 6 <strong>to</strong>tal digits, and <strong>the</strong> yy form of year is assumed. In that case, <strong>the</strong> default is <strong>to</strong> assume<br />

20yy. Thus, year.month.day( 225) produces Friday Feb 25, 2000.<br />

If 7 digits, one lead zero is assumed, and <strong>the</strong> yyyy form of year is assumed.<br />

A character argument should contain ei<strong>the</strong>r one or three integers, like ’12252005’ or ’12/25/2005’.<br />

If just one integer, it is treated like <strong>the</strong> numeric input.<br />

Non-digits are treated as separa<strong>to</strong>rs. Thus, ’***12abc25///2005 ’ will produce 12, 25 and 2005.<br />

A second argument may be supplied <strong>to</strong> specify <strong>the</strong> century <strong>to</strong> be used for 2-digit years. Centuries from<br />

1700 through 2900 are allowed, shown by values of 17 throught 29 , or by 1700, 1800, 1900, etc.<br />

PUT ( DAY.MONTH.YEAR ( 10042006 )) $ or<br />

PUT ( DAY.MONTH.YEAR ( "4/10/2006" )) $<br />

produces “Mon April 10, 2006”<br />

PUT ( DAY.YEAR.MONTH ( 10200604 )) $<br />

also produces Mon April 10, 2006.<br />

PUT ( MONTH.YEAR.DAY ( 4200610 )) $


10.4 <strong>PPL</strong>:Date and Time Commands and Functions<br />

You can forget <strong>the</strong> lead zero and <strong>the</strong> function can figure out what you mean. But note in <strong>the</strong> first 2<br />

examples above <strong>the</strong> 0 in 04 for <strong>the</strong> day is necessary. The following 3 examples also produce <strong>the</strong> same<br />

result.<br />

PUT ( MONTH.YEAR.DAY ( 4102006 )) $<br />

PUT ( YEAR.DAY.MONTH ( 20060410 )) $<br />

PUT ( YEAR.MONTH.DAY ( 20061004 )) $<br />

10.4 DATE and TIME function details.<br />

1. MAKE.DATE (year, month, day ) >>> date<br />

MAKE.DATE (year, month, day, hour, minute, second) >>> date<br />

MAKE.DATE (year, month, day, hms, 'mask' ) >>> date<br />

MAKE.DATE (ymd, 'mask' ) >>> date<br />

MAKE.DATE (ymd, 'mask', hour, minute, second) >>> date<br />

MAKE.DATE (ymd, 'mask', hms, 'mask' ) >>> date<br />

This makes a character date value from numeric values. The function must have 2 or 3 date arguments,<br />

and may also have 2 or 3 time arguments. These examples show <strong>the</strong> input as constants, but<br />

<strong>the</strong>y can be variables or expressions of any complexity.<br />

The result is a character date, formatted in <strong>the</strong> current default format. P-<strong>STAT</strong> starts a run with <strong>the</strong><br />

default format set <strong>to</strong> <strong>the</strong> following template:<br />

'Tues Jan 1, 2002 18:52:04' .<br />

The date can be provided using separate arguments for year, month and day. Alternatively, those three<br />

values can be compressed in<strong>to</strong> one integer, like 19971225, followed by a mask or template in quotes<br />

<strong>to</strong> show how <strong>to</strong> parse <strong>the</strong> compressed integer.<br />

Time is provided similarly. However, seconds may have a fractional part of up <strong>to</strong> three places; in that<br />

case <strong>the</strong> three argument form must be used. Some examples:<br />

MAKE.DATE ( 1997, 12, 25 ) = 'Thurs Dec 25, 1997'<br />

MAKE.DATE ( 1997, 12, 25,<br />

23, 59, 59 ) = 'Thurs Dec 25, 1997 23:59:59'<br />

A mask like ‘yyyymmdd’ is used when year, month and day are combined in<strong>to</strong> one integer, like<br />

19971225.<br />

A mask like ‘hhmmss’ is used when hour, minute and second are combined in<strong>to</strong> one integer, like<br />

235959.<br />

MAKE.DATE ( 12251997, 'mmddyyyy',<br />

235959, 'hhmmss' )<br />

= 'Thurs Dec 25, 1997 23:59:59'<br />

If <strong>the</strong> year mask in a template has yy (ra<strong>the</strong>r than yyyy), <strong>the</strong> yy may be preceded by a 2-digit century<br />

from 17 <strong>to</strong> 29; if not, 20 is assumed. For example:<br />

MAKE.DATE ( 122595, 'mmdd19yy' )<br />

MAKE.DATE ( 122595, 'mmddyy' )<br />

The first would produce 'Mon Dec 25, 1995'.<br />

The second would produce 'Sun Dec 25, 2095'.<br />

2. CURRENT.DATE () >>> date<br />

Make a character date value from <strong>the</strong> current date and time. The current default format is used <strong>to</strong> construct<br />

<strong>the</strong> result.


<strong>PPL</strong>: Date and Time Commands and Functions 10.5<br />

CURRENT.DATE () = ‘Sun June 23, 2002 12:26:49’<br />

The empty paren<strong>the</strong>ses are needed <strong>to</strong> persuade <strong>the</strong> <strong>PPL</strong> processor that this is indeed a function.<br />

3. REFORMAT.DATE ( date1 ) >>> date<br />

REFORMAT.DATE ( date1, date2 ) >>> date<br />

Take a date input and reformat it. If a second argument is supplied, it will be used as a formatting template.<br />

If not, <strong>the</strong> current default format is used.<br />

REFORMAT.DATE ( 'June 23, 2002' ) = 'Sun June 23, 2002'<br />

Several points about <strong>the</strong> above example:<br />

(1) There was no second argument, <strong>the</strong>refore <strong>the</strong> current default format would be used which, unless<br />

changed earlier in <strong>the</strong> run, would produce <strong>the</strong> above result.<br />

(2) The default has day-of-week, <strong>the</strong>refore this result gets day-of-week also.<br />

(3) The default has time, but this input does not have time, so none is put in<strong>to</strong> <strong>the</strong> result.<br />

REFORMAT.DATE ( 'JUNE 23, 2002', '1997, dec, 25' ) =<br />

'2002, June, 23'<br />

Note in <strong>the</strong> above that <strong>the</strong> use of ‘dec’ was used for <strong>the</strong> ordering of elements, but did not change <strong>the</strong><br />

naming style. The MONTH.LENGTH and similar commands can be used <strong>to</strong> change how names are<br />

written.<br />

4. <strong>STAT</strong>US.DATE ( date ) >>> integer<br />

Take a date input and produce a numeric result, from -3 <strong>to</strong> 2, which indicates <strong>the</strong> usability of <strong>the</strong> input.<br />

<strong>STAT</strong>US.DATE ( 'jan 1, 1997 12:13:14' ) = 2<br />

<strong>STAT</strong>US.DATE ( 'jan 1, 1997 ' ) = 1<br />

<strong>STAT</strong>US.DATE ( 'jan 1 ' ) = 0<br />

<strong>STAT</strong>US.DATE ( .M1. ) = -1<br />

<strong>STAT</strong>US.DATE ( .M2. ) = -2<br />

<strong>STAT</strong>US.DATE ( .M3. ) = -3<br />

A result of 2 means <strong>the</strong> input is a valid date value which also contains time.<br />

A result of 1 means <strong>the</strong> input is a valid date value but does not contain time.<br />

A result of 0 means <strong>the</strong> input is not missing, but is none<strong>the</strong>less not a valid date value (missing year).<br />

A result of -1, -2 or -3 means <strong>the</strong> imput is missing: -1 is missing 1, etc.<br />

5. DAYS ( date ) >>> integer<br />

The DAYS function takes an input date value and produces <strong>the</strong> number of days in that value since Jan<br />

1, 1753. Time, if <strong>the</strong>re, is ignored. The result of <strong>the</strong> DAYS function can be used <strong>to</strong> sort on <strong>the</strong> date,<br />

with no concern about time within date.<br />

DAYS ( 'jan 1, 1753' ) = 1<br />

DAYS ( 'jan 1, 2002 12:13:14' ) = 90,946<br />

There is an older form of DAYS function that has two arguments, a 6 or 8 digit year-month-day integer<br />

like 19981225, and a mask like ‘yyyymmdd’. This form, which returned days since jan 1,1900, is<br />

being de-documented but will be supported for some years.<br />

The new form is recognized by its having just 1 argument.


10.6 <strong>PPL</strong>:Date and Time Commands and Functions<br />

6. SECONDS ( date ) >>> number<br />

The SECONDS function takes an input date value and produces <strong>the</strong> number of seconds since 00:00:00<br />

on Jan 1, 1753. If <strong>the</strong> input lacks a time field, <strong>the</strong> result is missing. The result of <strong>the</strong> SECONDS function<br />

can be used <strong>to</strong> sort on time within date.<br />

SECONDS ( 'jan 1, 2002 12:13:14' ) = 7,857,691,994<br />

7. SECONDS.MIDNIGHT ( date ) >>> number<br />

The SECONDS.MIDNIGHT function takes <strong>the</strong> time element of an input date value and produces <strong>the</strong><br />

number of seconds since midnight. The date element is ignored. If time is not <strong>the</strong>re, <strong>the</strong> result is missing.<br />

The result of <strong>the</strong> SECONDS.MIDNIGHT function can be used <strong>to</strong> sort on <strong>the</strong> time, with no<br />

concern about <strong>the</strong> date.<br />

SECONDS.MIDNIGHT ( 'jan 1, 2002 00:00:00' ) = 0<br />

SECONDS.MIDNIGHT ( 'jan 1, 2002 12:13:14' ) = 43,994<br />

SECONDS.MIDNIGHT ( 'jan 1, 2002 23:59:59.12' ) = 86,399.12<br />

8. UNDO.DAYS ( number ) >>> date<br />

The UNDO.DAYS function takes <strong>the</strong> result of <strong>the</strong> DAYS function and re-creates <strong>the</strong> date. Note, only<br />

<strong>the</strong> date is recovered, since time information is not carried in <strong>the</strong> DAYS result.<br />

UNDO.DAYS ( 90946 ) = 'Tues Jan 1, 2002'<br />

UNDO.DAYS ( DAYS('jan 1 2002 12:13:14') = 'Tues Jan 1, 2002'<br />

9. UNDO.SECONDS ( number ) >>> date<br />

The UNDO.SECONDS function takes <strong>the</strong> result of <strong>the</strong> SECONDS function and re-creates <strong>the</strong> date<br />

and time.<br />

UNDO.SECONDS ( 7857691994 ) = 'Tues Jan 1, 2002 12:13:14'<br />

10. FISCAL.YEAR ( date, integer ) >>> integer<br />

The FISCAL.YEAR function requires a second argument: <strong>the</strong> ending month of <strong>the</strong> fiscal year. This<br />

can be 1 through 12, but is usually 6, 9 or 12:<br />

6 for fiscal years ending on June 30.<br />

9 for <strong>the</strong> Sept 30 fiscal year end (U.S.Government).<br />

12 for a Dec 31 ending of a calendar year.<br />

FISCAL.YEAR ( 'sept 15,2001', 9 ) = 2001<br />

FISCAL.YEAR ( 'oct 15,2001', 9 ) = 2002<br />

FISCAL.YEAR ( 'dec 31,2001', 6 ) = 2002<br />

FISCAL.YEAR ( 'dec 31,2001', 12 ) = 2001<br />

11. FISCAL.QUARTER ( date, integer ) >>> integer<br />

The FISCAL.QUARTER function requires a second argument: <strong>the</strong> ending month of <strong>the</strong> fiscal year.<br />

This can be 1 through 12, but is usually 6, 9 or 12:<br />

6 for fiscal years ending on June 30.<br />

9 for <strong>the</strong> Sept 30 fiscal year end (U.S.Government).<br />

12 for a Dec 31 ending of a calendar year.<br />

FISCAL.QUARTER ( 'jan 10,2001', 6 ) = 3<br />

FISCAL.QUARTER ( 'jan 10,2001', 9 ) = 2<br />

FISCAL.QUARTER ( 'jan 10,2001', 12 ) = 1


<strong>PPL</strong>: Date and Time Commands and Functions 10.7<br />

12. QUARTER ( date ) >>> integer<br />

The QUARTER function returns <strong>the</strong> calendar year quarter; it is <strong>the</strong> same as FISCAL.QUARTER with<br />

a second argument of twelve.<br />

QUARTER ( 'jan 10,2001' ) = 1<br />

QUARTER ( 'dec 10,2001' ) = 4<br />

13. DAY.WITHIN.WEEK ( date ) >>> integer<br />

The DAY.WITHIN.WEEK function returns an integer from 1 <strong>to</strong> 7. The default is for <strong>the</strong> week <strong>to</strong> begin<br />

on Monday, so that a Monday returns 1, Tuesday 2, and Sunday 7. If a weekday name in quotes<br />

is given as a second argument, that day will be treated as day 1 in <strong>the</strong> function.<br />

Note, Dec 25,2002 is a Wednesday.<br />

DAY.WITHIN.WEEK ( 'dec 25, 2002' ) = 3<br />

DAY.WITHIN.WEEK ( 'dec 25, 2002', 'Sunday' ) = 4<br />

DAY.WITHIN.WEEK ( 'dec 25, 2002', 'sat' ) = 5<br />

14. DAY.WITHIN.YEAR ( date ) >>> integer<br />

The DAY.WITHIN.YEAR function returns an integer from 1 <strong>to</strong> 366. January 1 is always 1, and December<br />

31 will return 365 in non-leap years, and 366 in leap years.<br />

DAY.WITHIN.YEAR ( 'jan 11, 2001' ) = 11<br />

DAY.WITHIN.YEAR ( 'feb 11, 2001' ) = 42<br />

DAY.WITHIN.YEAR ( 'dec 25, 2001' ) = 359<br />

DAY.WITHIN.YEAR ( 'dec 25, 2004' ) = 360<br />

15 WEEK.WITHIN.YEAR ( data, integer) >>> integer<br />

This function returns <strong>the</strong> week number within <strong>the</strong> year for <strong>the</strong> supplied date. The range can be 0 <strong>to</strong> 53,<br />

depending on <strong>the</strong> date and on <strong>the</strong> calculation method.<br />

There are two methods for determining what constitutes week one of a given year.<br />

The first method is simple: <strong>the</strong> first week goes from Jan 1 through Jan 7. This can be called an AB-<br />

SOLUTE week.<br />

The second method makes use of a calendar week, which is defined by ISO 8061 as going from Monday<br />

through Sunday.<br />

The first week is <strong>the</strong> first CALENDAR week that contains a sufficient number of days within <strong>the</strong> current<br />

year. Sufficient can be set <strong>to</strong> 1 through 7; <strong>the</strong> ISO standard is 4. This function assumes a Mon-<br />

Sun calendar week; a different calendar week can be given in <strong>the</strong> function. The arguments are:<br />

1. A character date value, variable or expression like ’Jan 4,2004’.<br />

2. An integer constant from 0 <strong>to</strong> 7. This selects <strong>the</strong> method <strong>to</strong> be used <strong>to</strong> define <strong>the</strong> first week of <strong>the</strong><br />

year.<br />

0: This uses <strong>the</strong> absolute week. The first week is Jan 1 through Jan 7. The result can be from<br />

1 <strong>to</strong> 53. The third argument, if provided, is ignored.<br />

1-7: These use <strong>the</strong> calendar week. The 1 <strong>to</strong> 7 specify <strong>the</strong> minimum number of days needed <strong>to</strong><br />

constitute an acceptable first week.<br />

For example, suppose <strong>the</strong> calendar week is Mon-Sun and Jan 4 is a Sunday. Is an initial 4-day<br />

week sufficient <strong>to</strong> be used as week 1 ? If this argument is 1 <strong>to</strong> 4, yes. If insufficient, <strong>the</strong> partial<br />

week becomes week 0, and <strong>the</strong> next calendar week is week 1.


10.8 <strong>PPL</strong>:Date and Time Commands and Functions<br />

The ISO 8061 standard is 4, which <strong>the</strong>refore accepts <strong>the</strong> first calendar week that has a majority<br />

of its days in <strong>the</strong> current year.<br />

Using 7 would cause <strong>the</strong> first full calendar week <strong>to</strong> be week 1.<br />

3. An optional character constant which contains <strong>the</strong> starting day of <strong>the</strong> calendar week <strong>to</strong> be used<br />

instead of <strong>the</strong> default Monday <strong>to</strong> Sunday week. This can be a full name like ’Tuesday’, or an<br />

abbreviation like ’Wed’.<br />

***********************************<br />

* examples using ABSOLUTE weeks *<br />

***********************************<br />

Week.within.year ( ’jan 3 2004’, 0 ) = 1<br />

Week.within.year ( ’jan 5 2004’, 0 ) = 1<br />

Week.within.year ( ’jan 8 2004’, 0 ) = 2<br />

Week.within.year ( ’dec 31 2004’, 0 ) = 53<br />

************************************<br />

* examples using <strong>the</strong> default *<br />

* Monday <strong>to</strong> Sunday calendar week *<br />

************************************<br />

mon tue wed thu fri sat sun<br />

1 2 3 4<br />

5 6 7 8 9 10 11<br />

12 13 14 15 16 17 18<br />

Week.within.year ( ’jan 5 2004’, 1 ) = 2<br />

Week.within.year ( ’jan 3 2004’, 4 ) = 1<br />

Week.within.year ( ’jan 5 2004’, 4 ) = 2<br />

Week.within.year ( ’jan 5 2004’, 4 ) = 2<br />

Week.within.year ( ’jan 5 2004’, 7 ) = 1<br />

**************************************<br />

* examples using an alternative *<br />

* Sunday <strong>to</strong> Saturday calendar week *<br />

**************************************<br />

sun mon tue wed thu fri sat<br />

1 2 3<br />

4 5 6 7 8 9 10<br />

11 12 13 14 15 16 17<br />

Week.within.year ( ’jan 3 2004’, 1, ’sun’ ) = 1<br />

Week.within.year ( ’jan 3 2004’, 4, ’sun’ ) = 0<br />

Week.within.year ( ’jan 5 2004’, 4, ’sun’ ) = 1<br />

Week.within.year ( ’jan 1 2004’, 7, ’sun’ ) = 0<br />

Week.within.year ( ’jan 2 2004’, 7, ’sun’ ) = 0<br />

Week.within.year ( ’jan 3 2004’, 7, ’sun’ ) = 0<br />

Week.within.year ( ’jan 4 2004’, 7, ’sun’ ) = 1


<strong>PPL</strong>: Date and Time Commands and Functions 10.9<br />

16. ADD.YEARS ( date, 1 <strong>to</strong> 6 numbers ) >>> date<br />

17. ADD.MONTHS ( date, 1 <strong>to</strong> 5 numbers ) >>> date<br />

18. ADD.DAYS ( date, 1 <strong>to</strong> 4 numbers ) >>> date<br />

19. ADD.HOURS ( date, 1 <strong>to</strong> 3 numbers ) >>> date<br />

20. ADD.MINUTES ( date, 1 <strong>to</strong> 2 numbers ) >>> date<br />

21. ADD.SECONDS ( date, 1 number ) >>> date<br />

22. SUBTRACT.YEARS ( date, 1 <strong>to</strong> 6 numbers ) >>> date<br />

23. SUBTRACT.MONTHS ( date, 1 <strong>to</strong> 5 numbers ) >>> date<br />

24. SUBTRACT.DAYS ( date, 1 <strong>to</strong> 4 numbers ) >>> date<br />

25. SUBTRACT.HOURS ( date, 1 <strong>to</strong> 3 numbers ) >>> date<br />

26. SUBTRACT.MINUTES ( date, 1 <strong>to</strong> 2 numbers ) >>> date<br />

27. SUBTRACT.SECONDS ( date, 1 number ) >>> date<br />

Each of <strong>the</strong>se has a date value as its first argument, followed by one or more date/time amounts <strong>to</strong> be<br />

added or subtracted.<br />

The ADD.YEARS function, for example, treats <strong>the</strong> required second argument as <strong>the</strong> number of years<br />

<strong>to</strong> be added; months, days, hours, minutes and seconds can also be supplied. For example:<br />

ADD.YEARS ( 'jan 1, 1991', 3 ) = 'Jan 1, 1994'<br />

ADD.YEARS ( 'jan 1, 1991', 3, 1 ) = 'Feb 1, 1994'<br />

ADD.YEARS ( 'jan 1, 1991', 3, 1,10 ) = 'Feb 11, 1994'<br />

Since <strong>the</strong> function in <strong>the</strong> above 3 examples was ADD.YEARS, <strong>the</strong> initial element (i.e., argument two)<br />

is years. If yet ano<strong>the</strong>r argument follows, it is treated as months, and so on.<br />

ADD.YEARS ( 'jan 1, 1991 10:10:10', 1,2,3,4,5,6)<br />

= 'March 4, 1992 14:15:16'<br />

The above adds 1 year, 2 months, 3 days, 4 hours, 5 minutes and 6 seconds <strong>to</strong> jan 1,1991 at 10:10:10.<br />

A subtract of <strong>the</strong> same amount could be done:<br />

SUBTRACT.YEARS ( 'march 4, 1992 14:15:16', 1,2,3,4,5,6)<br />

= 'Jan 1, 1991 10:10:10'<br />

Some additional examples using scratch variable ##d, which is set <strong>to</strong> ‘jan 1 1991 10:10:10’ for <strong>the</strong><br />

function input:<br />

ADD.DAYS ( ##d, 100 ) = 'April 11 1991 10:10:10'<br />

ADD.DAYS ( ##d, 1000 ) = 'Sept 27 1993 10:10:10'<br />

ADD.DAYS ( ##d, 1000,0,0,1 ) = 'Sept 27 1993 10:10:11'<br />

ADD.DAYS ( ##d, 1000,0,0,1.5) = 'Sept 27 1993 10:10:11.5'<br />

ADD.MINUTES ( ##d, 1000 ) = 'Jan 2 1991 02:50:10'<br />

ADD.MINUTES ( ##d, -20 ) = missing, invalid argument<br />

ADD.MINUTES ( ##d, 1,2,3 ) = error, <strong>to</strong>o many arguments<br />

These functions process <strong>the</strong> years field first, <strong>the</strong>n <strong>the</strong> months field (which could fur<strong>the</strong>r change <strong>the</strong><br />

years field), and so on.


10.10 <strong>PPL</strong>:Date and Time Commands and Functions<br />

*********************************<br />

* limitations in using *<br />

* ADD.YEARS SUBTRACT.YEARS *<br />

* ADD.MONTHS SUBTRACT.MONTHS *<br />

*********************************<br />

These 4 functions are of limited usefulness because <strong>the</strong>y can quite easily produce an invalid date,<br />

which causes <strong>the</strong> function <strong>to</strong> issue a missing result.<br />

For example, adding one year <strong>to</strong> feb 29,1992 would produce feb 29 in 1993, which is invalid because<br />

1993 was not a leap year. Similarly, adding 1 month <strong>to</strong> aug 31,2001 produces sept 31,2001, invalid<br />

because september hath but 30 days.<br />

These functions first check <strong>the</strong> date validity after processing year and month; it is checked again after<br />

any additional elements have been processed.<br />

The o<strong>the</strong>r 8 functions in this group (ADD.DAYS and such) all produce sensible, reversible results.<br />

28. EXTRACT.YEARS ( date ) >>> integer, 1753 <strong>to</strong> 2999<br />

29. EXTRACT.MONTHS ( date ) >>> integer, 1 <strong>to</strong> 12<br />

30. EXTRACT.DAYS ( date ) >>> integer, 1 <strong>to</strong> 31<br />

31. EXTRACT.HOURS ( date ) >>> integer, 0 <strong>to</strong> 23<br />

32. EXTRACT.MINUTES ( date ) >>> integer, 0 <strong>to</strong> 59<br />

33. EXTRACT.SECONDS ( date ) >>> integer, 0 <strong>to</strong> 59<br />

34. EXTRACT.CC ( date ) >>> integer, 17 <strong>to</strong> 29<br />

35. EXTRACT.YY ( date ) >>> integer, 0 <strong>to</strong> 99<br />

36. EXTRACT.DATE ( date ) >>> character date value<br />

37. EXTRACT.TIME ( date ) >>> character time value<br />

38. EXTRACT.WEEKDAY ( date ) >>> character weekday name<br />

EXTRACT.YEARS ( 'jan 5 1991 10:11:12' ) = 1991<br />

EXTRACT.MONTHS ( 'jan 5 1991 10:11:12' ) = 1<br />

EXTRACT.DAYS ( 'jan 5 1991 10:11:12' ) = 5<br />

EXTRACT.HOURS ( 'jan 5 1991 10:11:12' ) = 10<br />

EXTRACT.MINUTES( 'jan 5 1991 10:11:12' ) = 11<br />

EXTRACT.SECONDS( 'jan 5 1991 10:11:12' ) = 12<br />

EXTRACT.CC ( 'jan 5 1991 10:11:12' ) = 19 (century)<br />

EXTRACT.YY ( 'jan 5 1991 10:11:12' ) = 91<br />

EXTRACT.DATE ( 'jan 5 1991 10:11:12' ) = 'jan 5 1991'<br />

EXTRACT.TIME ( 'jan 5 1991 10:11:12' ) = '10:11:12'<br />

EXTRACT.WEEKDAY( 'jan 5 1991 10:11:12' ) = 'Sat'<br />

The result of extract.date will contain <strong>the</strong> day of week only if <strong>the</strong> input argument has day-of-week.


<strong>PPL</strong>: Date and Time Commands and Functions 10.11<br />

39. CHANGE.YEARS ( date, 1 <strong>to</strong> 6 numbers ) >>> date<br />

40. CHARGE.MONTHS ( date, 1 <strong>to</strong> 5 numbers ) >>> date<br />

41. CHANGE.DAYS ( date, 1 <strong>to</strong> 4 numbers ) >>> date<br />

42. CHANGE.HOURS ( date, 1 <strong>to</strong> 3 numbers ) >>> date<br />

43. CHANGE.MINUTES ( date, 1 <strong>to</strong> 2 numbers ) >>> date<br />

44. CHANGE.SECONDS ( date, 1 number ) >>> date<br />

These six functions are used <strong>to</strong> change specific elements within a date-time value without affecting<br />

<strong>the</strong> o<strong>the</strong>r elements of <strong>the</strong> value.<br />

Each of <strong>the</strong>se has a date value as its first argument, and <strong>the</strong>n one or more date or time elements as<br />

subsequent arguments.<br />

In <strong>the</strong> CHANGE.MONTHS function, for example, <strong>the</strong> argument after <strong>the</strong> input date must be an integer<br />

from 1 <strong>to</strong> 12. This provides <strong>the</strong> changed month element <strong>to</strong> be placed in<strong>to</strong> <strong>the</strong> function result. A<br />

third argument, if given, would <strong>the</strong>n be treated as a days element, and so forth.<br />

In <strong>the</strong>se examples, we assume that character scratch variable ##d has been set <strong>to</strong>:<br />

'jan 1 1991 10:10:10'.<br />

CHANGE.YEARS ( ##d, 1992 ) = 'Jan 1 1992 10:10:10'<br />

CHANGE.MONTHS ( ##d, 2 ) = 'Feb 1 1991 10:10:10'<br />

CHANGE.DAYS ( ##d, 8 ) = 'Jan 8 1991 10:10:10'<br />

CHANGE.HOURS ( ##d, 11 ) = 'Jan 1 1991 11:10:10'<br />

CHANGE.MINUTES( ##d, 12 ) = 'Jan 1 1991 10:12:10'<br />

CHANGE.SECONDS( ##d, 13 ) = 'Jan 1 1991 10:10:13'<br />

As with functions like ADD.YEARS, additional arguments can be supplied <strong>to</strong> change several fields<br />

at once.<br />

CHANGE.YEARS ( ##d, 1992, 2, 8, 11, 12, 13 ) =<br />

'Feb 8 1992 11:12:13'<br />

CHANGE.HOURS (##d, 11, 12, 13) = 'Jan 1 1991 11:12:13'.<br />

CHANGE.DAYS (##d, 8, 11 ) = 'Jan 8 1991 11:10:10'.<br />

The above change.hours example has three values after <strong>the</strong> input date. Since <strong>the</strong> function is<br />

change.hours,<br />

argument 2 (11) is treated as an HOURS change,<br />

argument 3( 12) is treated as a MINUTES change, and<br />

argument 4( 13) is treated as a SECONDS change.<br />

The above change.days has two arguments after <strong>the</strong> input date; <strong>the</strong>se are treated as days (because of<br />

<strong>the</strong> function name) and hours (<strong>the</strong> next time element after days).<br />

CHANGE.DAYS ( 'Jan 1 1991', 3, 21, 22, 23 ) =<br />

'Jan 3 1991 21:22:23'.<br />

If <strong>the</strong> function has values for hours, minutes and seconds, <strong>the</strong>y are placed in <strong>the</strong> result even when <strong>the</strong><br />

input did not have any time fields.


10.12 <strong>PPL</strong>:Date and Time Commands and Functions<br />

45. DIF.YEARS ( date, date ) >>> number<br />

46. DIF.MONTHS ( date, date ) >>> number<br />

47. DIF.DAYS ( date, date ) >>> number<br />

48. DIF.HOURS ( date, date ) >>> number<br />

49. DIF.MINUTES ( date, date ) >>> number<br />

50. DIF.SECONDS ( date, date ) >>> number<br />

The first two arguments are <strong>the</strong> date values being compared. It does not matter which is <strong>the</strong> first argument,<br />

i.e.,<br />

DIF.DAYS( date1, date2 ) = DIF.DAYS( date2, date1 ).<br />

An optional third argument can be used <strong>to</strong> limit <strong>the</strong> calculation; using 2, for example, causes only <strong>the</strong><br />

first two elements, years and months, <strong>to</strong> be looked at.<br />

DIF.YEARS ( 'jan 1,1992', 'feb 3,1993' ) = 1.090411<br />

DIF.YEARS ( 'jan 1,1992', 'feb 3,1993', 1 ) = 1.<br />

DIF.MONTHS( 'jan 1,1992', 'feb 3,1993' ) = 13.071429<br />

DIF.MONTHS( 'jan 1,1992', 'feb 3,1993', 2 ) = 13.<br />

DIF.DAYS ( 'jan 1,1992', 'feb 3,1993' ) = 399.<br />

DIF.DAYS ( 'jan 1,1992 12:00:00',<br />

'feb 3,1993 15:00:00' ) = 399.125<br />

DIF.HOURS ( 'jan 1,1992 12:00:00',<br />

'feb 3,1993 15:00:00' ) = 9,579.<br />

DIF.MINUTES('jan 1,1992 12:00:00',<br />

'feb 3,1993 15:00:00' ) = 574,740.<br />

DIF.SECONDS('jan 1,1992 12:00:00',<br />

'feb 3,1993 15:00:00' ) = 34,484,400.<br />

DIF.SECONDS('jan 1,1992 12:00:00.2',<br />

'feb 3,1993 15:00:00' ) = 34,484,399.8<br />

***********************************************<br />

* NOTE: DIF.YEARS and DIF.MONTHS are both *<br />

* counting time elements of varying lengths *<br />

***********************************************<br />

Since years can have differing lengths ( 365 or 366 days), and months are even worse ( 28 or 29 or 30<br />

or 31 days), <strong>the</strong> dif.years and dif.months functions produce results which reflect <strong>the</strong> somewhat arbitrary<br />

choices on how <strong>to</strong> compute <strong>the</strong>m.<br />

DIF.YEARS ( 'feb 4,1992', 'mar 7,1993' ) = 1.0849315<br />

DIF.MONTHS( 'feb 4,1992', 'mar 7,1993' ) = 13.0967742<br />

In <strong>the</strong> above dif.years example, <strong>the</strong>re is one full year from feb 4,1992 <strong>to</strong> feb 4,1993, and <strong>the</strong>n 31 more<br />

days <strong>to</strong> march 7,1993. The fractional year is <strong>the</strong>n 31/365, which is 0.0849315. The 365 is <strong>the</strong> distance<br />

from feb 4,1993 <strong>to</strong> <strong>the</strong> end of <strong>the</strong> next full year, feb 4,1994.<br />

If <strong>the</strong> earlier date is a feb 29, one day is subtracted from both dates <strong>to</strong> simplify <strong>the</strong> calculations.<br />

In <strong>the</strong> above dif.months example, <strong>the</strong>re are 13 full months from feb 4,1992 <strong>to</strong> march 4,1993. The fractional<br />

part is 3/31, or 0.0967742. The 3 is <strong>the</strong> distance from march 4 <strong>to</strong> march 7, and <strong>the</strong> 31 is <strong>the</strong><br />

distance from march 4 <strong>to</strong> april 4, <strong>the</strong> end of <strong>the</strong> next full month.


<strong>PPL</strong>: Date and Time Commands and Functions 10.13<br />

If <strong>the</strong> day of <strong>the</strong> month of <strong>the</strong> earlier date is more than 28, from one <strong>to</strong> three days are subtracted from<br />

both dates <strong>to</strong> simplify <strong>the</strong> calculations.<br />

These two functions could well be coded in a different manner that gives slightly different results in<br />

<strong>the</strong> fractional part. The coding and results of DIF.DAYS, DIF.HOURS, DIF.MINUTES and<br />

DIF.SECONDS, on <strong>the</strong> o<strong>the</strong>r hand, are straightforward.<br />

*******************************************<br />

* using <strong>the</strong> third argument: *<br />

* doing DIF.YEARS, DIF.DAYS, etc. *<br />

* on just <strong>the</strong> initial parts of <strong>the</strong> date *<br />

*******************************************<br />

A third argument may be supplied: it is <strong>the</strong> extent of <strong>the</strong> year-month-day-hour-minute-second fields<br />

that should be used in computing <strong>the</strong> difference. The fields beyond that level will be ignored.<br />

DIF.YEARS ( 'feb 4,1992', 'mar 7,1993', 1 ) = 1.<br />

DIF.YEARS ( 'feb 4,1992', 'mar 7,1993', 2 ) = 1.0833333<br />

DIF.YEARS ( 'feb 4,1992', 'mar 7,1993', 3 ) = 1.0849315<br />

The limit of 1 in DIF.YEARS nullifies all but <strong>the</strong> year field in <strong>the</strong> two arguments. Therefore, <strong>the</strong> difference<br />

between 1992 and 1993 is simply 1.<br />

The limit of 2 in DIF.YEARS nullifies all but <strong>the</strong> year and month fields in <strong>the</strong> two arguments. Therefore,<br />

<strong>the</strong> difference is calculated between feb 1992 and mar 1993. The result is 1 1/12 years.<br />

The limit of 3 in DIF.YEARS nullifies all but <strong>the</strong> year, month and day fields in <strong>the</strong> two arguments. As<br />

it happens, <strong>the</strong> arguments did not have time fields so using <strong>the</strong> limit had no effect.<br />

The result is 1 plus 32/366, <strong>the</strong> 32 because <strong>the</strong> days field is being used.<br />

*********************************************<br />

* doing DIF.YEARS, DIF.MONTHS or DIF.DAYS *<br />

* while ignoring <strong>the</strong> time fields *<br />

*********************************************<br />

Suppose you are using DIF.YEARS, DIF.MONTHS or DIF.DAYS and have no interest in <strong>the</strong> time<br />

fields of <strong>the</strong> arguments. If <strong>the</strong> arguments lack time fields anyhow, <strong>the</strong>re is obviously no problem, but<br />

suppose some do and some don’t ?<br />

The default for a function like DIF.DAYS is <strong>to</strong> use all available fields, so if one argument has a time<br />

field and <strong>the</strong> o<strong>the</strong>r does not, <strong>the</strong> result will be set <strong>to</strong> missing. You could use<br />

DIF.DAYS ( extract.date( arg1 ), extract.date( arg2) ).<br />

However, using a limit of 3 does <strong>the</strong> same thing.<br />

DIF.DAYS ( arg1, arg2, 3).<br />

10.5 DATE AND TIME COMMANDS<br />

A P-<strong>STAT</strong> run begins with <strong>the</strong> date language set <strong>to</strong> English. Therefore, date values being read are expected <strong>to</strong> have<br />

English month and weekday names, and date values being created will be given English names. Also, names being<br />

written will be capitalized and abbreviated, like Jan or Tues.<br />

If a function takes an input date value and creates a resulting date value, <strong>the</strong> result will be ordered in <strong>the</strong> same way<br />

as <strong>the</strong> input. In o<strong>the</strong>r words, if <strong>the</strong> input starts with <strong>the</strong> monthname, so will <strong>the</strong> output.<br />

If <strong>the</strong>re is no input <strong>to</strong> be used as a format, a default ordering that looks like ‘Wed Aug 7, 2002 10:15:22’ is used.<br />

The following eight commands may be used <strong>to</strong> change <strong>the</strong> default language, ordering and name style of dates.


10.14 <strong>PPL</strong>:Date and Time Commands and Functions<br />

DATE.LANGUAGE changes <strong>the</strong> language of month and weekday names. English and German<br />

are supported.<br />

DATE.ORDER changes <strong>the</strong> default format ordering.<br />

MONTH.CASE changes <strong>the</strong> case of month names. Uppercase, lowercase and capitalized are<br />

supported.<br />

WEEKDAY.CASE same for weekday names.<br />

MONTH.LENGTH changes <strong>the</strong> length of month names. Full length or abbreviated are<br />

supported.<br />

WEEKDAY.LENGTH same for weekday names.<br />

MONTH.NAMES provides month name abbreviations <strong>to</strong> be used.<br />

WEEKDAY.NAMES provides weekday name abbreviations <strong>to</strong> be used.<br />

10.6 The DATE.LANGUAGE Command<br />

P-<strong>STAT</strong> carries full and abbreviated month and weekday names in both English and German. The default language<br />

is English.<br />

DATE.LANGUAGE GERMAN $<br />

would switch <strong>the</strong> active language <strong>to</strong> German.<br />

A function like ADD.DAYS ( ‘Oct 10, 1992’, 1 ) will compare OCT <strong>to</strong> <strong>the</strong> full month names of <strong>the</strong> currently<br />

active language, and accept a full match or <strong>the</strong> best partial match. It will use <strong>the</strong> abbreviated names or, if requested,<br />

<strong>the</strong> full names of <strong>the</strong> current language <strong>to</strong> construct a result date.<br />

The default English month name abbreviations are:<br />

jan feb march april may june july aug sept oct nov dec.<br />

The default English weekday name abbreviations are:<br />

mon tues wed thurs fri sat sun.<br />

The default German month name abbreviations are:<br />

jan feb marz apr mai juni juli aug sept okt nov dez.<br />

The default German weekday name abbreviations are:<br />

mo di mi do fr sa so.<br />

When reading a date in German, both SAMSTAG and SONNABEND are recognized as Saturday, but what<br />

of abbreviations like SO, SON or SONN? They are all accepted as SONNTAG (Sunday).<br />

10.7 The DATE.ORDER Command<br />

DATE.ORDER '2 june 2002 (sun) 12:12:12' $<br />

The DATE.ORDER command changes <strong>the</strong> default ordering for a date <strong>to</strong> <strong>the</strong> order shown in <strong>the</strong> command.<br />

Blanks, dashes, commas, slashes and paren<strong>the</strong>ses may be freely used <strong>to</strong> create a particular date appearance.<br />

When one of <strong>the</strong> date functions writes a date value, <strong>the</strong> components of <strong>the</strong> value will be written in a certain<br />

ORDER. The order determines things like where <strong>the</strong> year should be, if <strong>the</strong> weekday name should be included,<br />

and if time should be included.<br />

Also, <strong>the</strong> value will be written in a certain style. Style consists of language (English or German), case (MAY<br />

or may or May), and length (Jan or January). For example,<br />

’Wed Aug 7, 2002 10:05:26’


<strong>PPL</strong>: Date and Time Commands and Functions 10.15<br />

is an ordering that consists of weekday, month, day, comma, year and time, with blanks as shown. The names are<br />

abbreviated and capitalized (first letter uppercase, <strong>the</strong> rest lowercase). This is, in fact, <strong>the</strong> default date format.<br />

The default date order is used only when <strong>the</strong>re is nothing else <strong>to</strong> use. If a date function has an input date, like<br />

ADD.DAYS, <strong>the</strong> result will have <strong>the</strong> same ordering as <strong>the</strong> input. However, <strong>the</strong> naming style of <strong>the</strong> input can be<br />

ambiguous: is May an abbreviation or a full month name? Therefore, <strong>the</strong> default style (case, length and language)<br />

is used for names. If a date function does not have an input date, like CURRENT.DATE(), default ordering and<br />

style are used.<br />

Therefore, using<br />

1. The default ordering is: 'Tues Jan 1, 2002 12:34:56'.<br />

2. The default style is: English, abbreviated, capitalized.<br />

PUT (CURRENT.DATE())$<br />

would produce something like:<br />

Wed June 2, 2002 14:01:19.<br />

The time field can be omitted from dates, as can <strong>the</strong> weekday name.<br />

DATE.ORDER 'june 2 (mon) 1999' $<br />

Here, since time is not included, functions that do not have a character date input <strong>to</strong> use as an output template will<br />

write a date output that does not include time.<br />

10.8 Changing <strong>the</strong> Case and Length of names<br />

The default for both month names and weekday names is capitalized and abbreviated (ie.e, Jan or Tues). There<br />

are 2 commands which affect <strong>the</strong> case of names as <strong>the</strong>y are written. MONTH.CASE affects month names,<br />

WEEKDAY.CASE affects weekday names.<br />

MONTH.CASE upper $<br />

WEEKDAY.CASE capitalized $<br />

UPPER causes names <strong>to</strong> be entirely in upper case. LOWER causes names <strong>to</strong> be entirely in lower case. CAP-<br />

ITALIZED causes names <strong>to</strong> have <strong>the</strong> initial letter in upper case, and <strong>the</strong> rest in lower case.<br />

The following commands affect <strong>the</strong> length of names as <strong>the</strong>y are written. FULL causes names <strong>to</strong> be written<br />

in <strong>the</strong>ir entirety: January. ABBREVIATED causes names <strong>to</strong> be written in a short form: Jan.<br />

MONTH.LENGTH FULL $<br />

WEEKDAY.LENGTH ABBREVIATED $<br />

10.9 Month and Weekday Names<br />

There are 2 commands which can be used <strong>to</strong> alter <strong>the</strong> default abbreviations: MONTH.NAMES and WEEK-<br />

DAY.NAMES. These commands override <strong>the</strong> default abbreviations. They must, however <strong>the</strong>mselves be<br />

abbreviations of <strong>the</strong> current full names. MONTH.NAMES requires 12 arguments and WEEKDAY.NAMES requires<br />

7 arguments.<br />

MONTH.NAMES jan feb mar apr may jun jul aug sep oct nov dec $<br />

WEEKDAY.NAMES mon tue wed thu fri sat sun $<br />

The names can each be quoted or unquoted, or <strong>the</strong> entire set of names can be in one quoted string.<br />

WEEKDAY.NAMES mon 'tue' wed thu fri 'sat' sun $<br />

WEEKDAY.NAMES 'mon tue wed thu fri sat sun' $


10.16 <strong>PPL</strong>:Date and Time Commands and Functions<br />

__________________________________________________________________________<br />

Figure 10.1 DATE Logical Opera<strong>to</strong>rs<br />

Test Values<br />

date1 = ’jan 12,1991 12:01:00’<br />

date2 = ’may 23,1991 12:08:00’<br />

date3 = ’may 23,1991 12:08:00’<br />

date4 = ’may 23,1991 22:08:00’<br />

date5 = ’may 23,1991 ’<br />

Tests using logical opera<strong>to</strong>rs<br />

The tests The Result<br />

[ if date1 DATE.GT date2, false ]<br />

[ if date1 AFTER date2, false, same as DATE.GT ]<br />

[ if date3 DATE.GE date2, true ]<br />

[ if date1 DATE.EQ date2, false ]<br />

[ if date1 DATE.LE date2, true ]<br />

[ if date1 DATE.LT date2, true ]<br />

[ if date1 BEFORE date2, true, same as DATE.LT ]<br />

[ if date1 DATE.EQ date5, false ]<br />

[ if date4 DATE.EQ date5, missing ]<br />

[ if extract.date(date4) DATE.EQ date5, true ]<br />

__________________________________________________________________________<br />

10.10 DATE LOGICAL OPERATORS<br />

There are 6 logical opera<strong>to</strong>rs that can be used <strong>to</strong> compare date values. They are:<br />

1. DATE.GT (AFTER can also be used)<br />

2. DATE.GE<br />

3. DATE.EQ<br />

4. DATE.NE<br />

5. DATE.LE<br />

6. DATE.LT (BEFORE can also be used)<br />

Each examines two date values, which can be expressions. A date value MUST have a date field (year-monthday)<br />

and MAY have a time field (hour-minute-second). These date and time fields are treated in <strong>the</strong> date compares<br />

as if <strong>the</strong>y were two separate BY variables in a sort.<br />

If <strong>the</strong> two date values differ at <strong>the</strong> year-month-day level, <strong>the</strong>re is no need <strong>to</strong> look at time, so it doesn’t matter<br />

if one value has a time field and <strong>the</strong> o<strong>the</strong>r does not. However, if <strong>the</strong> two year-month-day fields are <strong>the</strong> same, what<br />

happens if one value has a time field and <strong>the</strong> o<strong>the</strong>r does not?<br />

1. If nei<strong>the</strong>r has time, <strong>the</strong> result is equal.


<strong>PPL</strong>: Date and Time Commands and Functions 10.17<br />

2. If both have time, <strong>the</strong> times are compared, yielding a result.<br />

3. If one has time and <strong>the</strong> o<strong>the</strong>r doesn’t, <strong>the</strong> result is missing<br />

The final three examples in Figure 10.1 deal with TIME issues.<br />

[ if date1 DATE.EQ date5, false ]<br />

[ if date4 DATE.EQ date5, missing ]<br />

[ if extract.date(date4) DATE.EQ date5, true ]<br />

When we compare date1 with date5, <strong>the</strong> year-month-day values differ, so we can get a FALSE result even though<br />

one has time and <strong>the</strong> o<strong>the</strong>r does not. Date4 and date5, however, do not differ on year-month-day. If nei<strong>the</strong>r had<br />

time, <strong>the</strong> result would be equal, but since one has time and <strong>the</strong> o<strong>the</strong>r does not, <strong>the</strong> result is missing.<br />

Using EXTRACT.DATE gets rid of <strong>the</strong> time field in date4, so <strong>the</strong> compare with timeless date5 produces a<br />

non-missing result.<br />

10.11 FORMAT.DATE<br />

FORMAT.DATE is a date/dime function that provides considerable flexibility in formatting a date-time value.<br />

It has two arguments: <strong>the</strong> character value <strong>to</strong> be formatted, and <strong>the</strong> format <strong>to</strong> be used for it. A P-<strong>STAT</strong> date/<br />

time value is an ordinary variable, often sized character*40, that holds date time information. Creating date-time<br />

variables was covered in considerable detail in <strong>the</strong> early parts of this chapter. This section describes how <strong>to</strong> print<br />

this information in <strong>the</strong> formats that you prefer.<br />

PUT ( Current.date ( ) )$ results in something like<br />

Mon Oct 24, 2011 11:21:36<br />

The current.date function has no arguments. It produces <strong>the</strong> current date and time in <strong>the</strong> default form. FOR-<br />

MAT.DATE expects <strong>the</strong> initial argument <strong>to</strong> hold a value in a similar format. A format consists of format specifiers<br />

(like dd for days) and separa<strong>to</strong>r characters (like :). The format determines which date/time elements are separa<strong>to</strong>r<br />

characters.<br />

The format will often be provided by a character constant within <strong>the</strong> function. It can, however, be placed in<br />

a permanent character scratch variable, as in FORMAT.DATE ( ddd, ##someformat ) . Blanks are significant.<br />

Given aug 28, 2011,<br />

'yyyymmdd' produces 20110828 .<br />

'yyyy mm dd' produces 2011 08 28<br />

The caret (^) will not be placed in <strong>the</strong> result, and can <strong>the</strong>refore be used <strong>to</strong> make a format more readable.<br />

'yyyy^mm^dd' does <strong>the</strong> same thing as 'yyyymmdd'.<br />

Any o<strong>the</strong>r character is copied as is, such as <strong>the</strong> : in hh:mm:ss or <strong>the</strong> / in mm/dd/yyyy.<br />

yyyy year, in 4-digit form, like 2011.<br />

yy year, in 2-digit form, like 11.<br />

month month, full name, like september.<br />

mon month, abbreviated name, like sept.<br />

n.month month, 1 <strong>to</strong> 12, ie, numeric month.<br />

mm month, 1 <strong>to</strong> 12 if usage is clear,<br />

like yy/mm/dd. same as n.month .<br />

dd day, 1 <strong>to</strong> 31.<br />

hh hour, 0 <strong>to</strong> 23.<br />

n.minute minute, 0 <strong>to</strong> 59. ie, numeric minute.<br />

mm minute, 0 <strong>to</strong> 59 if usage is clear,<br />

like hh:mm:ss. same as n.minute .<br />

ss second, 0 <strong>to</strong> 59, can have up <strong>to</strong> 3 places,<br />

like 34.178 .


10.18 <strong>PPL</strong>:Date and Time Commands and Functions<br />

ord ordinal, <strong>the</strong> day within year, 1 <strong>to</strong> 366.<br />

jjj (for julian) does <strong>the</strong> same.<br />

Ordinal has become <strong>the</strong> accepted name.<br />

day.of.week weekday, full name, like monday.<br />

dow weekday, abbreviation, like mon.<br />

am puts hours in 1-12 form, and <strong>the</strong>n uses<br />

am, pm, noon, midnight <strong>to</strong> clarify.<br />

These are placed where <strong>the</strong> 'am' was found.<br />

a.m. same thing, but uses a.m. and p.m. .<br />

date causes mm/dd/yyyy <strong>to</strong> be used.<br />

time causes hh:mm:ss <strong>to</strong> be used.<br />

The default is <strong>to</strong> show hours in 0 <strong>to</strong> 23 form, which is sometimes called military time. The format specifier 'am'<br />

causes hours <strong>to</strong> appear in 1 <strong>to</strong> 12 form, along with one of am, pm, noon, and midnight. The am (or pm, etc) is<br />

placed where <strong>the</strong> specifier was. Using a specifier of a.m. causes a.m. (or p.m.) <strong>to</strong> be used instead.<br />

Examples of converting 24-hour <strong>to</strong> 12-hour mode.<br />

00:00:00 becomes 12:00:00 midnight.<br />

00:00:01 becomes 12:00:01 am.<br />

01:00:00 becomes 01:00:00 am.<br />

12:00:00 becomes 12:00:00 noon.<br />

12:00:01 becomes 12:00:01 pm.<br />

13:00:00 becomes 01:00:00 pm.<br />

The case used for names like Monday in <strong>the</strong> result is controlled by <strong>the</strong> case of <strong>the</strong> format word that was used.<br />

Using day.of.week will get 'monday'. Using Day.of.week will get 'Monday'. Using DAY.OF.WEEK will get<br />

'MONDAY'. This is done for full and abbreviated month names, full and abbreviated weekday names, and for a.m.<br />

and am. Lead zeros are printed, except for days when month is a name. Consider Jan 2, 1995 5:06:07.<br />

'date time' gets 01/02/1995 05:06:07 .<br />

However<br />

'Month dd, yyyy' gets January 2, 1995.<br />

__________________________________________________________________________<br />

Figure 10.2 FORMAT.DATE<br />

MAKE work1, VARS year month day hour min sec;<br />

1995 3 1 10 13 15<br />

2004 2 9 21 22 23 $<br />

LIST work1 [ GENERATE dt1:c40 TO MAKE.DATE<br />

(year, month, day, hour, min, sec ) ]<br />

[ GENERATE dt2:c40 TO FORMAT.DATE<br />

( dt1, 'yyyy-mm-dd time a.m. dow' ) ]<br />

[ KEEP dt1 dt2 ] $<br />

dt1 dt2<br />

Wed March 1, 1995 10:13:15 1995-03-01 10:13:15 a.m. wed<br />

Mon Feb 9, 2004 21:22:23 2004-02-09 09:22:23 p.m. mon<br />

__________________________________________________________________________


<strong>PPL</strong>: Date and Time Commands and Functions 10.19<br />

The first step in using P-<strong>STAT</strong>’s date routines is <strong>to</strong> s<strong>to</strong>re <strong>the</strong> date in date variable format.<br />

The second step if <strong>to</strong> provide one or more date templates <strong>to</strong> use <strong>the</strong>n <strong>the</strong> date is printed. Here are four different<br />

date templates and <strong>the</strong> resulting character string s<strong>to</strong>red in variable “this.date” for ##FMT1 and ##FMT3<br />

__________________________________________________________________________<br />

Figure 10.3 FORMAT.DATE Example<br />

GEN ##DAT1:c = DAY.MONTH.YEAR ( 13042012 )<br />

##FMT1:c = 'Month-dd-yyyy' April-13-2012<br />

##FMT2.c = 'mon dd yy’ april 13 12<br />

##FMT3.c = 'dd/n.month/yy’ 13/05/12<br />

##FMT4.c = 'Dow Mon dd yyyy; Fri April 13, 2012<br />

GEN ##this.date:c = FORMAT.DATE ( ##dat1, ##FMT1 ) $<br />

PUT ##this.date $<br />

APRIL-13-2012<br />

GEN ##this.date:c = FORMAT.DATE ( ##dat1, ##FMT3 ) $<br />

PUT ##this.date $<br />

13/04/12<br />

__________________________________________________________________________


10.20 <strong>PPL</strong>:Date and Time Commands and Functions<br />

DATE AND TIME FUNCTIONS<br />

DAY.MONTH.YEAR nn or “cs”<br />

converts an integer or character argument day.month.year order <strong>to</strong> a character date.<br />

DAY.YEAR.MONTH nn or “cs”<br />

converts an integer or character argument in day.year.month order <strong>to</strong> a character date.<br />

MONTH.DAY.YEAR nn or “cs”<br />

converts an integer or character argument in month.day.year order <strong>to</strong> a character date.<br />

MONTH.YEAR.DAY nn or “cs”<br />

converts an integer or character argument in month.year.day order <strong>to</strong> a character date.<br />

YEAR.DAY.MONTH nn or “cs”<br />

converts an integer or character argument in year.day.month order <strong>to</strong> a character date.<br />

YEAR.MONTH.DAY nn or “cs”<br />

MAKE.DATE<br />

converts an integer or character argument in year.month.day order <strong>to</strong> a character date.<br />

creates a date from numeric input.<br />

SUMMARY<br />

MAKE.DATE (year, month, day ) >>> date<br />

MAKE.DATE (year, month, day, hour, minute, second) >>> date<br />

MAKE.DATE (year, month, day, hms, ‘mask’ ) >>> date<br />

MAKE.DATE (ymd, ‘mask’ ) >>> date<br />

MAKE.DATE (ymd, ‘mask’, hour, minute, second) >>> date<br />

MAKE.DATE (ymd, ‘mask’, hms, ‘mask’ ) >>> date<br />

CURRENT.DATE<br />

provides <strong>to</strong>day’s date and time.<br />

REFORMAT.DATE ( ddd, ddd )<br />

changes <strong>the</strong> format of a date value. If <strong>the</strong> second argument is supplied it is used as a formatting template.<br />

REFORMAT.DATE ( date1, date2 ) >>> date<br />

REFORMAT.DATE ( ‘June 23, 2002’ ) >>> date<br />

<strong>STAT</strong>US.DATE ( ddd )<br />

shows if a date is valid, if it has time, etc. Produces a number from 2 <strong>to</strong> -3 which indicates <strong>the</strong> useability<br />

of <strong>the</strong> date. 2 indicates both date and time. 1 indicates date only. 0 indicates invalid date value. -1, -2,<br />

and -3 indicate missing values.<br />

DAYS ( ddd )<br />

returns days since 1/1/1753 for a date.<br />

nn=number nopt=optional number ddd=date variable copt=optional char constant


<strong>PPL</strong>: Date and Time Commands and Functions 10.21<br />

SECONDS ( ddd )<br />

returns seconds since 1/1/1753 for a date.<br />

SECONDS.MIDNIGHT ( ddd )<br />

returns seconds since midnight for a date.<br />

UNDO.DAYS ( nn )<br />

reverses <strong>the</strong> DAYS function.<br />

UNDO.SECONDS ( nn )<br />

reverses <strong>the</strong> SECONDS function.<br />

FISCAL.YEAR ( ddd, nn )<br />

returns <strong>the</strong> fiscal year of a date.<br />

FISCAL.QUARTER ( ddd, nn )<br />

returns <strong>the</strong> fiscal quarter of a date.<br />

QUARTER ( ddd )<br />

returns <strong>the</strong> calendar quarter of a date.<br />

DAY WITHIN.WEEK ( ddd, 'name' )<br />

returns an integer from 1 <strong>to</strong> 7. Name is an optional weekday name such as ‘Sunday’.<br />

DAY.WITHIN.YEAR ( ddd )<br />

returns 1 <strong>to</strong> 366, <strong>the</strong> day within a year.<br />

WEEK.WITHIN YEAR ( ddd, nn, occ )<br />

ADD.YEARS ( ddd, nn, nopt, nopt, nopt, nopt, nopt )<br />

add some years <strong>to</strong> a date.<br />

ADD.MONTHS ( ddd, nn, nopt, nopt, nopt, nopt )<br />

add some months <strong>to</strong> a date.<br />

ADD.DAYS ( ddd, nn, nopt, nopt, nopt )<br />

add some days <strong>to</strong> a date.<br />

ADD.HOURS ( ddd, nn, nopt, nopt )<br />

add some hours <strong>to</strong> a date.<br />

ADD.MINUTES ( ddd, nn, nopt )<br />

add some minutes <strong>to</strong> a date.<br />

ADD.SECONDS ( ddd, nn )<br />

add some seconds <strong>to</strong> a date.<br />

SUBTRACT.YEARS ( ddd, nn, nopt, nopt, nopt, nopt, nopt )<br />

subtract some years from a date.<br />

ddd=date variable copt=optional char constant nn=number nopt=optional number


SUBTRACT.MONTHS ( ddd, nn, nopt, nopt, nopt, nopt )<br />

subtract some months from a date.<br />

SUBTRACT.DAYS ( ddd, nn, nopt, nopt, nopt )<br />

subtract some days from a date.<br />

SUBTRACT.HOURS ( ddd, nn, nopt, nopt )<br />

subtract some hours from a date.<br />

SUBTRACT.MINUTES ( ddd, nn, nopt )<br />

subtract some minutes from a date.<br />

SUBTRACT.SECONDS ( ddd, nn )<br />

subtract some seconds from a date.<br />

EXTRACT.YEARS ( ddd )<br />

return numeric years from a date.<br />

EXTRACT.MONTHS ( ddd )<br />

return numeric months from a date.<br />

EXTRACT.DAYS ( ddd )<br />

return numeric days from a date.<br />

EXTRACT.HOURS ( ddd )<br />

return numeric hours from a date.<br />

EXTRACT.MINUTES ( ddd )<br />

return numeric minutes from a date.<br />

EXTRACT.SECONDS ( ddd )<br />

return numeric seconds from a date.<br />

EXTRACT.CC ( ddd )<br />

return 2-digit numeric century from a date.<br />

EXTRACT.YY ( ddd )<br />

return 2-digit numeric year from a date.<br />

EXTRACT.DATE ( ddd )<br />

make a copy of <strong>the</strong> input, dropping time.<br />

EXTRACT.TIME ( ddd )<br />

make a copy of <strong>the</strong> input, dropping date.<br />

EXTRACT.WEEKDAY ( ddd )<br />

return <strong>the</strong> character weekday name.<br />

CHANGE.YEARS ( ddd, nn, nopt, nopt, nopt, nopt, nopt )<br />

change <strong>the</strong> years field in a date.


<strong>PPL</strong>: Date and Time Commands and Functions 10.23<br />

CHANGE.MONTHS ( ddd, nn, nopt, nopt, nopt, nopt )<br />

change <strong>the</strong> months field in a date.<br />

CHANGE.DAYS ( ddd, nn, nopt, nopt, nopt )<br />

change <strong>the</strong> days field in a date.<br />

CHANGE.HOURS ( ddd, nn, nopt, nopt )<br />

change <strong>the</strong> hours field in a date.<br />

CHANGE.MINUTES ( ddd, nn, nopt )<br />

change <strong>the</strong> minutes field in a date.<br />

CHANGE.SECONDS ( ddd, nn )<br />

change <strong>the</strong> seconds field in a date.<br />

DIF.YEARS ( ddd, ddd, nn )<br />

difference between 2 dates in years. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements of<br />

<strong>the</strong> data that are <strong>to</strong> be looked at, thus a 2 means use just years and months<br />

DIF.MONTHS ( ddd, ddd, nopt )<br />

difference between 2 dates in months. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements<br />

of <strong>the</strong> data that are <strong>to</strong> be looked at.<br />

DIF.DAYS ( ddd, ddd, nopt )<br />

difference between 2 dates in days. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements of<br />

<strong>the</strong> data that are <strong>to</strong> be looked at.<br />

DIF.HOURS ( ddd, ddd, nopt )<br />

difference between 2 dates in hours. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements<br />

of <strong>the</strong> data that are <strong>to</strong> be looked at.<br />

DIF.MINUTES ( ddd, ddd, nopt )<br />

difference between 2 dates in minutes. The optional numeric argument can be used <strong>to</strong> limit <strong>the</strong> elements<br />

of <strong>the</strong> data that are <strong>to</strong> be looked at.<br />

DIF.SECONDS ( ddd, ddd )<br />

difference between 2 dates in seconds.<br />

DATE FORMATTING COMMANDS<br />

FORMAT.DATE ( ddd, date.format ) $<br />

<strong>the</strong> first argument is a P-<strong>STAT</strong> date variable. The second argument is a character variable that contains<br />

<strong>the</strong> desired format. Almost any arrangement of numeric or character day/month values, dates and years<br />

can be specified. The following are 3 simple examples.<br />

‘Month-dd-yyyy’ ‘mon dd yy’ dd/n.month/yy<br />

DATE.LANGUAGE<br />

DATE.LANGUAGE GERMAN $<br />

ddd=date variable copt=optional char constant nn=number nopt=optional number


10.24 <strong>PPL</strong>:Date and Time Commands and Functions<br />

DATE.LANGUAGE ENGLISH $<br />

Select <strong>the</strong> language for <strong>the</strong> dates. GERMAN and ENGLISH are supported.<br />

DATE.ORDER<br />

DATE.ORDER 'Jan 1, 2002 12:34:56' $<br />

DATE.ORDER changes <strong>the</strong> default format ordering. The supplied date must be a legal date. The default<br />

order is:<br />

'Tues Jan 1, 2002 12:34:56'<br />

The default style is: English, abbreviated, capitalized.<br />

MONTH.CASE<br />

MONTH.CASE UPPER $<br />

Changes <strong>the</strong> case of month names. UPPER, LOWER and CAPITALIZED are supported.<br />

WEEKDAY.CASE<br />

WEEKDAY.CASE LOWER $<br />

Changes <strong>the</strong> case of weekday names. UPPER, LOWER and CAPITALIZED are supported.<br />

MONTH.LENGTH<br />

MONTH.LENGTH FULL $<br />

Changes <strong>the</strong> length of month names. FULL and ABBREVIATED are supported.<br />

WEEKDAY.LENGTH<br />

WEEKDAY.LENGTH ABBREVIATED $<br />

Changes <strong>the</strong> length of weekday names. FULL and ABBREVIATED are supported.<br />

MONTH.NAMES<br />

MONTH.NAMES jan feb mar apr may june july aug sept oct nov dec $<br />

Changes <strong>the</strong> default month names. These 12 names must be abbreviations of <strong>the</strong> current fullmonth<br />

names.<br />

WEEKDAY.NAMES<br />

WEEKDAY.NAMES mo tu we th fr sa su $<br />

changes <strong>the</strong> default weekday names. These 7 names must be abbreviations of <strong>the</strong> current full weekday<br />

names.<br />

DATE LOGICAL OPERATORS<br />

Each date logical opera<strong>to</strong>r examines two date values, which can be expressions. A date value MUST have a date<br />

field (year-month-day) and MAY have a time field (hour-minute-second). These date and time fields are treated<br />

in <strong>the</strong> date compares as if <strong>the</strong>y were two separate BY variables in a sort. The 6 opera<strong>to</strong>rs are:<br />

1. DATE.EQ<br />

nn=number nopt=optional number ddd=date variable copt=optional char constant


<strong>PPL</strong>: Date and Time Commands and Functions 10.25<br />

2. DATE.NE<br />

3. DATE.LE<br />

4. DATE.LT (BEFORE can also be used)<br />

5. DATE.GT (AFTER can also be used)<br />

6. DATE.GE<br />

ddd=date variable copt=optional char constant nn=number nopt=optional number


11<br />

TEXTWRITER:<br />

Report Writing<br />

The TEXTWRITER command produces text or reports that summarize <strong>the</strong> data in a P-<strong>STAT</strong> system file. The<br />

text is formatted much <strong>the</strong> same way as text produced by a word processing software package, with justification,<br />

paragraphs and pagination. In addition, <strong>the</strong> reports can include character strings, values from <strong>the</strong> file, and evaluations<br />

of complex expressions containing functions and opera<strong>to</strong>rs.<br />

TEXTWRITER uses <strong>the</strong> P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong> (<strong>PPL</strong>) instructions PUT and PUTL <strong>to</strong> specify <strong>the</strong><br />

strings, values and expressions. Additional <strong>PPL</strong> may be included <strong>to</strong> test values and output appropriate strings.<br />

Thus, if Sex equals “M”, <strong>the</strong> string “Mr.” is written, but if Sex equals “F”, “Ms.” is output. Control words format<br />

<strong>the</strong> text and position it in specific columns and lines. (The previous eight chapters explain all aspects of <strong>PPL</strong>. The<br />

first two <strong>PPL</strong> chapters cover <strong>the</strong> basics which, with this chapter, provides sufficient information for using<br />

TEXTWRITER.)<br />

11.1 OVERVIEW<br />

TEXTWRITER requires an input file and instructions specifying <strong>the</strong> contents of a report. The input file, whose<br />

data values are typically included or summarized in <strong>the</strong> report, is named directly after <strong>the</strong> command:<br />

TEXTWRITER Accounts<br />

Here <strong>the</strong> input file is named “Cus<strong>to</strong>mers”. (No comma follows <strong>the</strong> filename.) The report instructions are <strong>PPL</strong><br />

clauses enclosed in brackets:<br />

TEXTWRITER Accounts<br />

[ IF FIRST ( .FILE. ), PUT @SKIP ;<br />

PUT @JUST Company Tel.No <br />

First.Name Last.Name ],<br />

WIDTH 56 $<br />

The bulk of a TEXTWRITER command is <strong>the</strong> set of <strong>PPL</strong> instructions that follow <strong>the</strong> input filename. Many<br />

of <strong>the</strong>se are PUT statements. Identifiers specific <strong>to</strong> TEXTWRITER may follow <strong>the</strong> <strong>PPL</strong> in <strong>the</strong> usual manner.<br />

The format of a PUT is:<br />

1. PUT (or PUTL)<br />

2. one or more values, character strings and control words (like @20 or @NEXT)<br />

3. <strong>the</strong> PUT phrase end character which, depending on <strong>the</strong> context, is a comma, a semicolon or a right<br />

bracket.<br />

In <strong>the</strong> PUT instructions, character strings are enclosed in quotes or between <strong>the</strong> directional signs “ > ”. Variable names are not in quotes or directional signs. Expressions are enclosed in paren<strong>the</strong>ses. Control<br />

words (beginning with “@”) specify placement and format options.<br />

The report produced by this TEXTWRITER command might look like:


11.2 TEXTWRITER: Report Writing<br />

A REPORT:<br />

At Smith and Bro<strong>the</strong>rs, <strong>Inc</strong>., telephone: (312) 457-8700,<br />

<strong>the</strong> person <strong>to</strong> contact is Jim Glidden.<br />

A similar sentence is <strong>the</strong>n output for each case in <strong>the</strong> input file.<br />

11.2 Justification<br />

When justification is specified, <strong>the</strong> text in <strong>the</strong> report is aligned at <strong>the</strong> right edge as well as <strong>the</strong> left edge. Extra<br />

blanks are inserted after certain punctuation and between words <strong>to</strong> achieve justification. Up <strong>to</strong> a maximum of five<br />

blanks may come between words, although a smaller number may be specified. Typically, only two blanks appear<br />

between some of <strong>the</strong> words. The concluding line of a paragraph, as well as any single line, is not justified.<br />

To avoid excess blank spaces in <strong>the</strong> report, trailing blanks are trimmed off character values and character expressions.<br />

Thus, <strong>the</strong> variable University occupies only three columns when its value is “MIT”, but nine columns<br />

when its value is “Prince<strong>to</strong>n”. Similarly, <strong>the</strong> values of numeric variables occupy only as many columns as necessary<br />

for a specific value, not <strong>the</strong> number of columns needed for <strong>the</strong> largest value. A blank space is au<strong>to</strong>matically<br />

inserted between successive values of variables and expressions.<br />

A large print buffer accumulates text. This permits <strong>the</strong> formatting and justification of large blocks of text.<br />

Strings that belong <strong>to</strong>ge<strong>the</strong>r, such as a word and its apostrophe, are kept <strong>to</strong>ge<strong>the</strong>r even though <strong>the</strong>y may be specified<br />

in separate instructions. Each text string follows <strong>the</strong> next, until control words such as @PARA (new<br />

paragraph) or @NEXT (next line) cause <strong>the</strong> start of a new line. Then <strong>the</strong> text in <strong>the</strong> buffer is flushed (emptied<br />

out) and printed, and accumulation of text for subsequent lines begins.<br />

11.3 The “No-Break” Character<br />

The “not” sign, which is generally a caret in <strong>the</strong> ASCII character set and a bar-like character in <strong>the</strong> EBCDIC set,<br />

is <strong>the</strong> no-break character. It keeps two character strings <strong>to</strong>ge<strong>the</strong>r on <strong>the</strong> same line and translates <strong>to</strong> a single blank<br />

space when printing takes place. Thus, “Mr.^Lee” prints as “Mr. Lee” and does not break or widen between <strong>the</strong><br />

two words.<br />

11.4 <strong>PPL</strong> INSTRUCTIONS PUT AND PUTL<br />

PUT and PUTL, two <strong>PPL</strong> instructions, specify character strings, values of variables and scratch variables, and expressions<br />

<strong>to</strong> position in <strong>the</strong> text. These instructions produce <strong>the</strong> actual report. PUT places only <strong>the</strong> values of<br />

variables in <strong>the</strong> text, whereas PUTL places <strong>the</strong> names of <strong>the</strong> variables as well as <strong>the</strong> values in <strong>the</strong> text.<br />

PUT can be used in <strong>the</strong> same way that, for example, SET is used, ei<strong>the</strong>r starting a new <strong>PPL</strong> instruction or as<br />

a consequent of an IF. The PUT is followed by character strings, variable names and expressions. Many PUT<br />

items (strings and variable names) may follow one PUT. Control words, such as @NEXT, may be used as needed:<br />

[ PUT .DATE. @SKIP;<br />

Here a character string, a system variable, and a control word follow one PUT instruction. The @SKIP, described<br />

later, causes <strong>the</strong> current line and <strong>the</strong>n a blank line <strong>to</strong> be written.<br />

In addition, any o<strong>the</strong>r <strong>PPL</strong> instructions, functions and opera<strong>to</strong>rs may be included in <strong>the</strong> <strong>PPL</strong> clauses. This is<br />

typically <strong>the</strong> case when <strong>the</strong> choice of which character string <strong>to</strong> place in <strong>the</strong> report depends on testing and evaluating<br />

values. For example, this instruction tests <strong>the</strong> value of <strong>the</strong> scratch variable “#Recent”:<br />

IF #Recent EQ , INCREASE #Count, PUT Hospital<br />

> Date.Last.Call ;<br />

The scratch variable “#Count” is increased and an appropriate character string is put in <strong>the</strong> output line of text when<br />

<strong>the</strong> result of <strong>the</strong> IF test is true.


TEXTWRITER: Report Writing 11.3<br />

11.5 Character Strings<br />

Any set of arbitrary characters, enclosed in single or double quotes or between <strong>the</strong> directional signs “ > ”, is a character string. The string may contain letters, numbers, punctuation and blanks, and it may be from<br />

1 <strong>to</strong> 50,000 characters long. The string should not contain <strong>the</strong> names of variables, scratch variables and expressions,<br />

because <strong>the</strong>se items will be printed literally — substitution of <strong>the</strong>ir appropriate values will not take place.<br />

The instruction:<br />

PUT First Last ;<br />

yields a line such as this in <strong>the</strong> report:<br />

This instruction:<br />

yields:<br />

The client is: Mary Roberts.<br />

PUT ;<br />

The client is: First Last.<br />

Character strings should contain only <strong>the</strong> exact text desired in <strong>the</strong> report. Some TEXTWRITER applications have<br />

hundreds of lines of <strong>PPL</strong>. Using instead of 'string' or “string” helps you see <strong>the</strong> strings more easily.<br />

Also, is more quickly<br />

flagged than omitting a string-terminating ’ or ”.<br />

11.6 Values of Variables<br />

The current values of variables, scratch variables, system variables and positions in <strong>the</strong> V vec<strong>to</strong>r (case vec<strong>to</strong>r) or<br />

P vec<strong>to</strong>r (permanent vec<strong>to</strong>r) may be placed in reports. None of <strong>the</strong>se values is enclosed in quotes or between directional<br />

signs. Complex expressions must be enclosed in paren<strong>the</strong>ses. This instruction includes a variable, a<br />

scratch variable and a system variable, as well as four quoted strings:<br />

[ PUT 'The balance for account number ' Acct.Number<br />

' is $' #Balance “ on ” .DATE. '.' ]<br />

This is <strong>the</strong> same instruction using directional signs instead of quotes:<br />

[ PUT Acct.Number<br />

> #Balance > .DATE. ]<br />

Quotes or directional signs enclose only <strong>the</strong> character strings. Given a width of 50 and this data, <strong>the</strong> previous instruction<br />

yields:<br />

The balance for account number 1268004 is $752.35<br />

on Apr 22, 1986.<br />

Note that a sentence such as this is produced for each case in <strong>the</strong> input file. The value of <strong>the</strong> variable<br />

Acct.Number is likely <strong>to</strong> change as each case is processed. The value of <strong>the</strong> scratch variable #Balance will not<br />

change unless it is reset for each case. This instruction would reset #Balance:<br />

[ SET #Balance = Balance + Interest ]<br />

The SET should precede <strong>the</strong> PUT instruction that places <strong>the</strong> value of #Balance in <strong>the</strong> report. The value of <strong>the</strong><br />

system variable .DATE. will not change unless <strong>the</strong> TEXTWRITER command is run again on ano<strong>the</strong>r day.<br />

A blank is au<strong>to</strong>matically inserted after a variable or expression value if <strong>the</strong> next value is ano<strong>the</strong>r variable or<br />

expression, and <strong>the</strong> final character of <strong>the</strong> current value is not a blank or a “not” sign.


11.4 TEXTWRITER: Report Writing<br />

11.7 Expressions and Functions<br />

Complex expressions containing functions and opera<strong>to</strong>rs, as well as variables and values, may be included in PUT<br />

instructions. Expressions must be enclosed in paren<strong>the</strong>ses and many nested levels of paren<strong>the</strong>ses may be used.<br />

The expressions are evaluated and <strong>the</strong> result is placed in <strong>the</strong> output line.<br />

The ability <strong>to</strong> use expressions makes it possible <strong>to</strong> use <strong>the</strong> full power of <strong>PPL</strong> in report writing. Complex numeric<br />

items and trigonometry functions may be computed, character strings may be padded and concatenated, and<br />

values in cases may be tested and recoded, all within <strong>the</strong> <strong>PPL</strong> instructions that comprise <strong>the</strong> bulk of<br />

TEXTWRITER.<br />

A sampling of expressions that may included in PUT instructions are:<br />

(CAPS (Name) ) (12 ** 3)<br />

(Salary + Commission) (LOG10 (Value1 / Value2) )<br />

(MEAN (Test.?) ) (CHAREX (Date, 'XX00') )<br />

(V(4) - P(Area + 2) ) (SUBSTRING (LEFT (Name), 1, 1 ) )<br />

__________________________________________________________________________<br />

Figure 11.1 Producing a Report: The Input Files<br />

File Hospital.lab<br />

Hospital (1) Mercy Hospital (2) Children's (3) Eye and Ear<br />

(4) Crans<strong>to</strong>n Memorial (5) Willis (6) St Agnes /<br />

File Sales<br />

Date Date Amt<br />

Last Last Last Sales<br />

Hospital Call Order Order No Salesman<br />

4 86-03-17 86-04-15 318.00 2 Will Moore<br />

2 85-09-20 85-06-25 112.60 4 Ted Ryan<br />

3 86-01-12 - - 4 Ted Ryan<br />

6 86-05-15 85-06-11 430.99 4 Ted Ryan<br />

5 86-02-07 86-02-10 775.25 6 Liz Brown<br />

1 86-04-12 86-06-01 450.67 6 Liz Brown<br />

__________________________________________________________________________<br />

11.8 A Sample Report<br />

Figures 11.1, 11.2 and 11.3 illustrate producing a report using PUT instructions, quoted strings, values and expressions.<br />

Figure 11.1 shows <strong>the</strong> input files. The file Hispital.lab contains value labels for <strong>the</strong> Hospital variable<br />

in file Sales. A preliminary SORT by Sales.No and Date.Last.Call has grouped <strong>to</strong>ge<strong>the</strong>r each salesperson’s cus<strong>to</strong>mers<br />

and orders <strong>the</strong>m by <strong>the</strong>ir date of <strong>the</strong> last sales call. (The report has paragraphs for each salesperson, with<br />

<strong>the</strong> sentences describing <strong>the</strong> status of each account.)<br />

Figure 11.2 shows <strong>the</strong> TEXTWRITER command and <strong>the</strong> PUT instructions. (The numbers at <strong>the</strong> left are not<br />

part of <strong>the</strong> commands, but merely correspond <strong>to</strong> <strong>the</strong> subsequent explanation.) Notice <strong>the</strong> general format of <strong>the</strong><br />

entire command — TEXTWRITER is followed by <strong>the</strong> input filename, and <strong>the</strong> filename is followed directly by<br />

<strong>PPL</strong> clauses of PUT (and o<strong>the</strong>r) instructions. Quoted strings, variables and expressions are included in <strong>the</strong> PUTs.<br />

Control words (beginning with "@") specify text placement. Notice also that <strong>the</strong> command identifiers (LABELS,<br />

STREAM, JUSTIFY and WIDTH) follow <strong>the</strong> <strong>PPL</strong> and are <strong>the</strong>mselves preceded by commas. (STREAM mode<br />

groups information from several cus<strong>to</strong>mers in<strong>to</strong> a single paragraph.)


TEXTWRITER: Report Writing 11.5<br />

__________________________________________________________________________<br />

Figure 11.2 Producing a Report: The TEXTWRITER Command<br />

TEXTWRITER Sales<br />

1. [ IF FIRST (.FILE.), PUT @PAGE<br />

.DATE. @SKIP ]<br />

2. [ IF FIRST (Sales.No), GEN #Count = 0, PUT @PARA ]<br />

3. [ GEN #Recent:C = 'no';<br />

IF Date.Last.Call GT '86-05-00', SET #Recent = 'yes' ]<br />

4. [ IF #Recent EQ THEN;<br />

INCREASE #Count, PUT Hospital<br />

> Date.Last.Call ;<br />

5. IF Date.Last.Order GOOD, PUT<br />

> Date.Last.Order<br />

@PLACES=0 Amt.Last.Order ;<br />

6. IF Date.Last.Order MISSING,<br />

PUT > ;<br />

ENDIF ]<br />

7. [ IF #Count GT 0 AND LAST (Sales.No),<br />

PUT Salesman > ] ,<br />

LABELS 'Hospital.lab', STREAM, JUSTIFY, WIDTH 61 $<br />

__________________________________________________________________________<br />

The general procedure of <strong>the</strong> instructions in Figure 11.2 is:<br />

1. Supply a heading for <strong>the</strong> report. This is done only once, when <strong>the</strong> first case or cus<strong>to</strong>mer in <strong>the</strong> file<br />

is processed.<br />

2. Generate a scratch variable #Count <strong>to</strong> keep track of <strong>the</strong> number of cus<strong>to</strong>mers a salesperson has. A<br />

salesperson is in <strong>the</strong> report only if he has some cus<strong>to</strong>mers without recent sales calls.<br />

3. Generate a scratch variable #Recent and reset it if a cus<strong>to</strong>mer has had a recent sales call. Only cus<strong>to</strong>mers<br />

without recent calls are <strong>to</strong> be in <strong>the</strong> report. Note: The two digit year will be a problem after<br />

1999.<br />

4. Specify <strong>the</strong> text strings <strong>to</strong> go in <strong>the</strong> report for cus<strong>to</strong>mers without recent calls. Also, put <strong>the</strong> date of<br />

<strong>the</strong>ir call in <strong>the</strong> report.<br />

5. If <strong>the</strong> cus<strong>to</strong>mer has placed an order as a result of a prior sales call, put an appropriate text string and<br />

<strong>the</strong> date of that last order in <strong>the</strong> report. Also, put <strong>the</strong> amount of that order in <strong>the</strong> report.<br />

6. If <strong>the</strong>re has not been an order, put a text string saying so in <strong>the</strong> report.<br />

7. When <strong>the</strong> last of a salesperson’s cus<strong>to</strong>mers is processed and at least one has not had a recent sales<br />

call, place <strong>the</strong> salesperson’s name in <strong>the</strong> report.<br />

The report produced is shown in Figure11.3. Wherever variable Hospital is referenced, <strong>the</strong> text in <strong>the</strong> labels<br />

file is used instead of <strong>the</strong> numeric value. A report of this type often conveys information more easily than a table


11.6 TEXTWRITER: Report Writing<br />

or listing of numbers. It is also obvious how <strong>to</strong> read it. On <strong>the</strong> o<strong>the</strong>r hand, a report summarizing many cases could<br />

be lengthy and repetitive.<br />

__________________________________________________________________________<br />

Figure 11.3 Producing a Report: The Report<br />

Hospital Supply Sales Report: Apr 16, 1986<br />

Crans<strong>to</strong>n Memorial has not received a sales call since<br />

86-03-17 and has not placed an order since 86-04-15. That<br />

order <strong>to</strong>talled $318. Will Moore is <strong>the</strong>ir salesperson.<br />

Children’s has not received a sales call since 85-09-20<br />

and has not placed an order since 85-06-25. That order<br />

<strong>to</strong>talled $113. Eye and Ear has not received a sales call<br />

since 86-01-12 and has not placed a subsequent order. Ted<br />

Ryan is <strong>the</strong>ir salesperson.<br />

Willis has not received a sales call since 86-02-07 and<br />

has not placed an order since 86-02-10. That order <strong>to</strong>talled<br />

$775. Mercy Hospital has not received a sales call since<br />

86-04-12 and has not placed an order since 86-06-01. That<br />

order <strong>to</strong>talled $451. Liz Brown is <strong>the</strong>ir salesperson.<br />

__________________________________________________________________________<br />

Report writing shines when <strong>the</strong> output report is actually many reports, each summarizing a single case (or<br />

related group of cases) and possibly going <strong>to</strong> different recipients. Figures 11.4, 11.5 and 11.6 illustrate a more<br />

complex report. Test results for each case are summarized and a separate report is produced about each individual.<br />

To make <strong>the</strong> report more readable, <strong>the</strong> text is changed slightly for sentences after <strong>the</strong> first one.<br />

11.9 Comments in <strong>PPL</strong> Clauses<br />

There may be many <strong>PPL</strong> clauses in a single TEXTWRITER command. Comments interspersed among <strong>the</strong> clauses<br />

document what is being done. They begin with /* and end with */:<br />

[ /* Comment: Generate a scratch variable counter.*/;<br />

GEN #Counter = 0 ]<br />

Any text may come between <strong>the</strong> beginning and end of <strong>the</strong> comment, and <strong>the</strong> comment may extend across records<br />

(lines). Comments may be part of <strong>the</strong> <strong>PPL</strong> clauses in any command, as well as in TEXTWRITER. The lengthy<br />

TEXTWRITER command in Figure 11.10 includes numerous comments <strong>to</strong> document <strong>the</strong> <strong>PPL</strong> instructions.<br />

11.10 OPTIONAL IDENTIFIERS<br />

TEXTWRITER has optional identifiers that control its operation and some format features. The CASE and<br />

STREAM identifiers specify <strong>the</strong> mode of operation of <strong>the</strong> TEXTWRITER command. The JUSTIFY, BLANKS,<br />

PUTL.CHAR and SPREAD identifiers control <strong>the</strong> way text is output in <strong>the</strong> report. MARGIN, LEADBLANK and<br />

WIDTH alter <strong>the</strong> format of <strong>the</strong> report. LABELS and OUT refer <strong>to</strong> optional files. The LABELS file is an input<br />

file of value labels. The OUT file is a P-<strong>STAT</strong> system file. PostScript identifiers are discussed later:<br />

11.11 CASE and STREAM: The Modes of Operation<br />

The CASE mode is assumed by <strong>the</strong> TEXTWRITER command, and thus <strong>the</strong> CASE identifier does not need <strong>to</strong> be<br />

explicitly included in <strong>the</strong> command. In CASE mode, <strong>the</strong> text starts on a new line at <strong>the</strong> start of each case. All


TEXTWRITER: Report Writing 11.7<br />

accumulated text is flushed and printed, and <strong>the</strong>n accumulation of text for <strong>the</strong> next case begins. STREAM mode<br />

is specified by using <strong>the</strong> STREAM identifier. In STREAM mode, text prints continuously.<br />

Often it is unnecessary <strong>to</strong> specify a mode when a control word such as @PAGE causes a page change as each<br />

new case is processed. This is <strong>the</strong> situation in both Figure 11.5 and Figure 11.10, when @PAGE is <strong>the</strong> initial control<br />

word after <strong>the</strong> first PUT instruction. @PAGE flushes and prints all accumulated text before moving <strong>to</strong> a new<br />

page. However, CASE mode also resets all control words <strong>to</strong> <strong>the</strong>ir initial default values as processing of each new<br />

case starts. STREAM does not reset <strong>the</strong> indent or <strong>the</strong> line width. This is discussed fur<strong>the</strong>r in <strong>the</strong> summary portion<br />

of <strong>the</strong> section “CONTROL WORDS.”<br />

11.12 JUSTIFY, BLANKS, PUTL.CHAR and SPREAD<br />

The JUSTIFY identifier specifies that <strong>the</strong> text in <strong>the</strong> report is <strong>to</strong> have <strong>the</strong> right as well as <strong>the</strong> left edge aligned (justified).<br />

Justification is achieved by <strong>the</strong> addition of extra blank spaces after certain punctuation and between words.<br />

Two blank spaces are inserted after periods, exclamation points and question marks, when <strong>the</strong>y end sentences.<br />

Blanks are not inserted after ellipses (...) and o<strong>the</strong>r punctuation, or if <strong>the</strong> user has already included two blanks after<br />

periods in <strong>the</strong> text strings. If necessary, an additional blank is inserted between one or more words. After each<br />

space has an extra blank, additional blanks are inserted if justification has not yet been achieved.<br />

When JUSTIFY is not specified, only <strong>the</strong> left edge of <strong>the</strong> text is aligned. A line is filled with text until no<br />

room remains for <strong>the</strong> next word, and that word is placed in <strong>the</strong> next line. The right edge of <strong>the</strong> text has a slightly<br />

ragged appearance due <strong>to</strong> <strong>the</strong> differing amounts of blank spaces remaining at <strong>the</strong> end of each line.<br />

The BLANKS identifier may be used when justification is in effect <strong>to</strong> reset <strong>the</strong> maximum number of blanks<br />

that may be added <strong>to</strong> <strong>the</strong> space between words. The argument for BLANKS is an integer whose smallest value<br />

may be 1. When BLANKS is not used, a maximum of four blanks is assumed. Thus, <strong>to</strong> achieve justification,<br />

TEXTWRITER may add up <strong>to</strong> four additional blanks <strong>to</strong> <strong>the</strong> existing single blank space between words. Typically,<br />

it is necessary <strong>to</strong> add only one extra blank <strong>to</strong> <strong>the</strong> spaces between some of <strong>the</strong> words <strong>to</strong> justify a line. However, if<br />

a line contains many long words or if <strong>the</strong> next word is very long, up <strong>to</strong> four additional blanks may need <strong>to</strong> be inserted<br />

between words.<br />

When PUTL is used <strong>to</strong> format a variable name and its value, <strong>the</strong> name and value are separated by <strong>the</strong> three<br />

characters “ = “. This can be changed by providing an alternate set of 1-3 characters. Given <strong>the</strong> variable name<br />

Age and <strong>the</strong> value 15 <strong>the</strong> control sequence:<br />

PUTL Age prints Age = 15<br />

The following examples of PUTL.CHARS produce:<br />

PUTL.CHARS ' ' Age 15<br />

PUTL.CHARS '---' Age---15<br />

PUTL.CHARS ' / ' Age / 15<br />

SPREAD and NO SPREAD are additional controls over <strong>the</strong> insertion of blanks. SPREAD is assumed. If NO<br />

SPREAD is used <strong>the</strong> values of adjacent variables will be concatenated without <strong>the</strong> usual intervening blank.<br />

11.13 MARGIN, LEADBLANK and WIDTH<br />

The MARGIN identifier specifies <strong>the</strong> number of columns that text is <strong>to</strong> be indented from <strong>the</strong> left. MARGIN 0 is<br />

assumed when MARGIN is not used and <strong>the</strong> text is not indented. The control word “@INDENT”, discussed in<br />

<strong>the</strong> subsequent section, specifies an additional indent that is measured from <strong>the</strong> current margin setting. It is used<br />

within a PUT instruction.<br />

Usually TEXTWRITER output is printed with a blank at <strong>the</strong> beginning of each line. This is useful when <strong>the</strong><br />

output is sent <strong>to</strong> a printer which uses <strong>the</strong> first character in <strong>the</strong> line for carriage control instructions. It is not needed<br />

and probably not wanted when <strong>the</strong> output is saved in a diskfile for use in ano<strong>the</strong>r program or document. The identifier<br />

NO LEADBLANK can be used <strong>to</strong> remove that blank. LEADBLANK, <strong>the</strong> assumed setting can also be used.


11.8 TEXTWRITER: Report Writing<br />

The WIDTH identifier specifies <strong>the</strong> number of columns <strong>to</strong> be used for a report; that is, <strong>the</strong> width of an output<br />

line in columns. (Note that <strong>the</strong> width is measured from <strong>the</strong> first column, not from <strong>the</strong> end of <strong>the</strong> margin or indent.)<br />

When WIDTH is not used, <strong>the</strong> current output width defines <strong>the</strong> width of <strong>the</strong> report up <strong>to</strong> a maximum of 400.<br />

WIDTH, in <strong>the</strong> TEXTWRITER command, overrides <strong>the</strong> output width setting. It can be from 2 <strong>to</strong> 400 The<br />

WIDTH identifier may be overridden by <strong>the</strong> control word @WIDTH used within PUT clauses.<br />

Regardless of how <strong>the</strong> width of a report is set, one column of that width is reserved for carriage control characters<br />

(necessary <strong>to</strong> tell a printer when <strong>to</strong> page or skip lines). Thus, <strong>the</strong> actual report has a width one column less<br />

than <strong>the</strong> specified width. This is generally not of concern, but if it is, WIDTH 71, for example, should be specified<br />

for a report of actual width 70.<br />

11.14 Optional Files: LABELS and OUT<br />

The LABELS identifier is used <strong>to</strong> provide <strong>the</strong> names of one mor more labels files. If <strong>the</strong> labels files contains values<br />

for a numeric variable, that text is used in place of <strong>the</strong> number. The TEXTWRITER command does NOT<br />

make use of <strong>the</strong> extended labels when variable names are used in a PUTL or VARNAME reference.<br />

The OUT identifier is used <strong>to</strong> produce an output file that contains any modifications that are made <strong>to</strong> <strong>the</strong> input<br />

file.<br />

11.15 CONTROL WORDS<br />

Control words, used in PUT instructions, control <strong>the</strong> formatting and placement of text. They begin with “@” and<br />

may be anywhere in <strong>the</strong> PUT clause, except before <strong>the</strong> PUT. The basic control words are:<br />

@nn @PARA @PAGE @TRIM @COMMAS @MISS<br />

@PLUS @NEXT @INDENT @JUST @PLACES @M @M1<br />

@MINUS @SKIP @WIDTH @BEFORE @EQUAL @M2 @M3<br />

@LABEL @SPREAD<br />

(“nn” represents a positive whole number.)<br />

Some control words require a numeric argument, such as <strong>the</strong> number of lines <strong>to</strong> skip. This number may follow<br />

directly after <strong>the</strong> control word or after an equal-sign. These are equivalent instructions:<br />

@SKIP3 @SKIP=3<br />

Ei<strong>the</strong>r one skips three lines. Although <strong>the</strong> argument directly following <strong>the</strong> control word is typically a number, it<br />

may be any expression that evaluates <strong>to</strong> a numeric value:<br />

@PLUS(#Count-1) @PLUS=(#Count-1)<br />

Ei<strong>the</strong>r of <strong>the</strong>se instructions moves <strong>the</strong> column pointer <strong>to</strong> <strong>the</strong> right <strong>the</strong> number of spaces specified by <strong>the</strong> value of<br />

#Count minus one.<br />

When a control word has a simple argument such as <strong>the</strong> number 3, it may be placed directly (no spaces) after<br />

<strong>the</strong> control word or it may be separated from <strong>the</strong> control word by an equal sign. Again <strong>the</strong>re are no spaces around<br />

<strong>the</strong> equal sign. When <strong>the</strong> argument is an expression it must be enclosed in paren<strong>the</strong>ses. The paren<strong>the</strong>ses must<br />

immediately follow <strong>the</strong> control word or <strong>the</strong> equal sign. However, <strong>the</strong> expression within <strong>the</strong> paren<strong>the</strong>ses may contain<br />

spaces for readability.<br />

11.16 Control Words <strong>to</strong> Produce a Letter<br />

Figures 11.4, 11.5, and 11.6 illustrate <strong>the</strong> use of TEXTWRITER <strong>to</strong> produce a form letter. The letter is personalized<br />

by including information specific <strong>to</strong> each case in <strong>the</strong> input file. Control words described in <strong>the</strong> sections which<br />

follow position <strong>the</strong> heading, <strong>the</strong> salutation, <strong>the</strong> body and <strong>the</strong> closing portions of <strong>the</strong> letter. Each letter is on a separate<br />

page.


TEXTWRITER: Report Writing 11.9<br />

The TEXTWRITER command and <strong>PPL</strong> clauses can select cases from a file, calculate information, write appropriate<br />

text and control <strong>the</strong> placement of that text <strong>to</strong> produce suitably personalized letters and reports. Various<br />

tasks that could be done include:<br />

1. Billing<br />

Calculate amount due, date due and discount for early payment, and write bill with correct name,<br />

address and aligned dollar amounts.<br />

2. Reminding<br />

Select patients with upcoming appointments and write letters reminding patients of appointment<br />

date, time, physician and procedure.<br />

3. Fund Raising<br />

Select past donors and write solicitations for funds, including in <strong>the</strong> letters <strong>the</strong> number of years of<br />

support, <strong>the</strong> maximum previously given, and <strong>the</strong> number of supporters in this individual’s class or<br />

organization.<br />

4. Claims Processing<br />

Select pending insurance claims and write letters giving <strong>the</strong> current status of <strong>the</strong> claim, including<br />

deductible amount, amount covered, amount payable, and remaining coverage.<br />

__________________________________________________________________________<br />

Figure 11.4 A Form Letter: The Input File<br />

File MailList<br />

Last First Sex Company Street<br />

Greene Sharon F Pierce & Co. P.O. Box 365<br />

Smyth William - Devon Industries 126 West 46th St.<br />

City State Zip Copier<br />

New York NY 10003 Kanon Premiere<br />

Brooklyn NY 11234 Shape 100<br />

__________________________________________________________________________<br />

11.17 Positioning Columns<br />

The control word character “@” may be followed directly by a number <strong>to</strong> specify an exact column location. This<br />

positions <strong>the</strong> first letter of <strong>the</strong> variable Name in <strong>the</strong> fifth column:<br />

[ PUT @5 Name ]<br />

The column location is measured from <strong>the</strong> start of <strong>the</strong> line. The column pointer moves <strong>to</strong> <strong>the</strong> specified column<br />

and subsequent text begins in that column.<br />

@PLUS and @MINUS move <strong>the</strong> column pointer right and left, respectively, from its current position.<br />

@PLUS moves <strong>the</strong> pointer right <strong>the</strong> specified number of columns; @MINUS moves it left. Thus, if <strong>the</strong> current<br />

column is number 20, @PLUS7 moves <strong>the</strong> pointer seven columns <strong>to</strong> <strong>the</strong> right <strong>to</strong> column 27.<br />

Note that <strong>the</strong> pointer moves only in <strong>the</strong> current line. Thus, if text is already in <strong>the</strong> specified column, it is overwritten<br />

by <strong>the</strong> new text. Also, when using @PLUS and @MINUS, <strong>the</strong> pointer moves relative <strong>to</strong> its current<br />

location, which may be dependent upon <strong>the</strong> length of <strong>the</strong> last value it printed. This instruction,<br />

[ PUT @10 First Last @25 Phone ]


11.10 TEXTWRITER: Report Writing<br />

might produce:<br />

Susan Wells 205-672-9122<br />

Thomas Bretchei617-926-0106<br />

12345678901234567890123456789012345678901234567890<br />

The second phone number has overwritten <strong>the</strong> remainder of that name. (A scale numbering <strong>the</strong> columns is included<br />

just for illustration.)<br />

11.18 Positioning Lines<br />

The control words @PARA, @NEXT, @SKIP, @PAGE, @INDENT and @WIDTH specify where a subsequent<br />

line prints. @PARA starts a new paragraph by moving <strong>the</strong> pointer <strong>to</strong> <strong>the</strong> next line and indenting three<br />

columns. Text starts in <strong>the</strong> fourth column. Alternate styles of paragraphia may be obtained using @SKIP (or<br />

@SKIP @5) instead of @PARA.<br />

@NEXT positions subsequent text in <strong>the</strong> next line. @SKIP skips <strong>the</strong> specified number of lines or, if no number<br />

directly follows, skips one line. @PAGE positions text at <strong>the</strong> <strong>to</strong>p of <strong>the</strong> next page.<br />

The control word @INDENT specifies an additional number of columns <strong>to</strong> indent text from <strong>the</strong> current margin<br />

setting. (The value after @INDENT is added <strong>to</strong> <strong>the</strong> current margin setting and text is indented that many<br />

columns from <strong>the</strong> left.) The current margin is that specified by <strong>the</strong> MARGIN identifier in <strong>the</strong> TEXTWRITER<br />

command or <strong>the</strong> default margin of zero indent. @IN is an abbreviation for @INDENT. @NOINDENT resets <strong>the</strong><br />

indentation <strong>to</strong> that specified by <strong>the</strong> MARGIN identifier or <strong>to</strong> 0 if MARGIN was not used.<br />

The @WIDTH control word sets <strong>the</strong> width of <strong>the</strong> report. The width is measured from column one, not from<br />

<strong>the</strong> current margin or indent setting. Thus, when <strong>the</strong> indent is increased, <strong>the</strong> line length is shortened. @WIDTH<br />

overrides any previous output width settings defined by <strong>the</strong> OUTPUT.WIDTH command or <strong>the</strong> identifier WIDTH<br />

in TEXTWRITER. The integer following @WIDTH may range from 2 <strong>to</strong> 400. @NOWIDTH turns off <strong>the</strong> current<br />

setting, and <strong>the</strong> width of <strong>the</strong> report reverts <strong>to</strong> that set by <strong>the</strong> WIDTH identifier or <strong>the</strong> OUTPUT.WIDTH<br />

command.<br />

Each of <strong>the</strong>se control words flushes <strong>the</strong> text buffer and prints accumulated text before moving <strong>to</strong> <strong>the</strong> specified<br />

line. When <strong>the</strong>se control words are not used, text prints continuously until <strong>the</strong> current line is full and <strong>the</strong>n text<br />

continues on <strong>the</strong> next line.<br />

11.19 Positioning Words<br />

@TRIM is assumed by TEXTWRITER. Trailing blanks are au<strong>to</strong>matically trimmed from character strings<br />

before <strong>the</strong>y are positioned in <strong>the</strong> text. This avoids having many blanks following a short name. @NOTRIM may<br />

be specified <strong>to</strong> turn trimming off. Then, a character value occupies as many columns as its defined length, even<br />

though a particular value may be blank or only a few characters long. (Numbers do not have trailing blanks.)<br />

@JUST specifies that text be right as well as left justified. The lines of <strong>the</strong> report are aligned at both <strong>the</strong> left<br />

and right margins. When @JUST is not used and <strong>the</strong> JUSTIFY identifier is not included in <strong>the</strong> TEXTWRITER<br />

command, as many words as fit in a line are printed and <strong>the</strong>n a new line is started. Thus, <strong>the</strong> right margin is jagged<br />

or unaligned. @NOJUST turns justification off, overriding <strong>the</strong> JUSTIFY identifier if it has been used.<br />

@BEFORE places <strong>the</strong> next value or string <strong>to</strong> be written immediately before <strong>the</strong> specified column. It affects<br />

only <strong>the</strong> text or variable following directly after it, and it does not reset <strong>the</strong> right edge of <strong>the</strong> report. This<br />

instruction,<br />

[ PUT @BEFORE20 City @24 Area.Code ]<br />

given this data, produces:<br />

Prince<strong>to</strong>n 609<br />

Somerville 201<br />

Tren<strong>to</strong>n 609


TEXTWRITER: Report Writing 11.11<br />

123456789012345678901234567890<br />

Notice that City is right aligned before column 20; column 20 itself is blank. Area code starts in column 24. (The<br />

scale is not part of <strong>the</strong> output.)<br />

@COMMAS requests that subsequent numeric values print with commas every three digits (counting from<br />

<strong>the</strong> decimal point <strong>to</strong> <strong>the</strong> left). This makes reading large numbers, such as population figures or dollar amounts,<br />

easier. @NOCOMMAS turns this off.<br />

__________________________________________________________________________<br />

Figure 11.5 A Form Letter: The TEXTWRITER Command<br />

TEXTWRITER MailList<br />

[/* RETURN ADDRESS */;<br />

PUT @PAGE @SKIP4 @40 <br />

@NEXT @40 <br />

@NEXT @40 <br />

@SKIP @40 (LTRIM (.DATE.) ) @SKIP2 ;<br />

/* CUSTOMER ADDRESS */;<br />

IF Sex EQ 'M', T.PUT , F.PUT ;<br />

PUT First Last @NEXT Company @NEXT Street<br />

@NEXT City State > Zip ;<br />

/* SALUTATION */;<br />

PUT @SKIP ;<br />

IF Sex EQ 'M', T.PUT , F.PUT , M.GOTO Sir;<br />

PUT Last ; GOTO Continue ;<br />

Sir: PUT ;<br />

/* BODY OF LETTER */;<br />

Continue: ;<br />

PUT @PARA <br />

> Copier ><br />

> City ;<br />

PUT @PARA Copier <br />

;<br />

/* CLOSING */;<br />

PUT @SKIP2 @40 @SKIP4 @40 <br />

@NEXT @40 @SKIP2 @NEXT ],<br />

MARGIN 5, JUSTIFY, WIDTH 71 $<br />

__________________________________________________________________________


11.12 TEXTWRITER: Report Writing<br />

__________________________________________________________________________<br />

Figure 11.6 A Form Letter: One Letter<br />

Ms. Sharon Greene<br />

Pierce & Co.<br />

P.O. Box 365<br />

New York, NY 10003<br />

Dear Ms. Greene:<br />

GREAT Copier Supplies<br />

123 First Street<br />

New York, NY 11001<br />

May 7, 1986<br />

Thank you for calling GREAT Copier. We s<strong>to</strong>ck all supplies<br />

for <strong>the</strong> Kanon Premiere copier at discount prices, and we deliver<br />

<strong>the</strong>m right <strong>to</strong> your business in New York.<br />

Enclosed is a price list for <strong>the</strong> Kanon Premiere. Should you<br />

have any questions, please give us a call.<br />

Sincerely yours,<br />

Sam Right<br />

Sales Manager<br />

SR:ms<br />

Enc: pl<br />

__________________________________________________________________________<br />

@PLACES specifies <strong>the</strong> number of decimal places (counting from <strong>the</strong> decimal point <strong>to</strong> <strong>the</strong> right) with which<br />

<strong>to</strong> print subsequent numeric values. The numbers are rounded if <strong>the</strong>y have more than <strong>the</strong> specified number of<br />

places or zeros are added if <strong>the</strong>y have less than <strong>the</strong> specified number of places. This instruction:<br />

[ PUT @PLACES2 <strong>Inc</strong>ome ]<br />

prints <strong>the</strong> variable <strong>Inc</strong>ome with two decimal places. @PL is an abbreviation. @NOPLACES turns off <strong>the</strong> prior<br />

places specification. Numbers <strong>the</strong>n print with <strong>the</strong>ir actual number of decimal places. (@NOPLACES is not <strong>the</strong><br />

same as @PLACES0. Zero decimal places print when @PLACES0 is specified. The number of places actually<br />

in a numeric value print when @NOPLACES, <strong>the</strong> initial default setting, is in effect.)<br />

The PLACES function is distinct from <strong>the</strong> @PLACES control word. Both may be used in PUT clauses, if<br />

desired:<br />

[ PUT 'Total income, <strong>to</strong> <strong>the</strong> nearest dollar, is: '<br />

@PLACES2 (PLACES (<strong>Inc</strong>ome, 0)) ]


TEXTWRITER: Report Writing 11.13<br />

In this example, <strong>the</strong> PLACES function rounds <strong>the</strong> number of decimal places in <strong>Inc</strong>ome <strong>to</strong> zero and <strong>the</strong> @PLACES<br />

control word sets <strong>the</strong> number of places <strong>to</strong> two. Thus, <strong>Inc</strong>ome is shown <strong>to</strong> <strong>the</strong> nearest dollar, but with two zeros<br />

included after <strong>the</strong> decimal point in <strong>the</strong> common dollar and cents pattern.<br />

@SPREAD is <strong>the</strong> assumed setting. @SPREAD causes a single blank <strong>to</strong> be placed between variables. @NO<br />

SPREAD can be used <strong>to</strong> change this so that <strong>the</strong> blank is omitted. For example:<br />

TEXTWRITER Tests [ PUT @NOSPREAD First Last ] $<br />

produces <strong>the</strong> following output:<br />

JamesWilmot<br />

SheilaHiggin<br />

Variables are usually trimmed of outside blanks before printing. The combination of @NOTRIM and @NO-<br />

SPREAD causes TEXTWRITER <strong>to</strong> leave things alone and print <strong>the</strong> values exactly as <strong>the</strong>y are s<strong>to</strong>red in <strong>the</strong> file<br />

with no intervening blanks.<br />

11.20 Labeling Values<br />

The PUTL instruction and <strong>the</strong> control word @EQUAL align variable names as well as <strong>the</strong>ir values about <strong>the</strong>ir<br />

equal-signs. This is useful for embedding lists within reports or dumping values in cases with inconsistent data.<br />

PUTL requests that <strong>the</strong> variable name as well as its value be placed in <strong>the</strong> output line:<br />

[ IF FIRST (.FILE.),<br />

PUT 'The following accounts are overdue:' @SKIP1;<br />

IF DAYS (.NDATE., 'YYYYMMDD') - DAYS (Due, 'YYMMDD') GT 30,<br />

PUT Acct.Number Co.Name, PUTL @26 Billed Due ]<br />

Here is a possible report using <strong>the</strong>se PUTL statements:<br />

The following accounts are overdue:<br />

1205 Jones & Sons, <strong>Inc</strong>. Billed = 860112 Due = 860210<br />

1231 Birchwood Lumber Billed = 860210 Due = 860305<br />

The @EQUAL control word aligns labeled variables by specifying <strong>the</strong> column location of <strong>the</strong> equal-sign.<br />

Each line of text has that variable label and value with <strong>the</strong> equal-sign in <strong>the</strong> same column. @EQUAL is used after<br />

PUTL:<br />

[ PUTL @EQUAL15 Sys<strong>to</strong>lic ]<br />

For two cases this produces:<br />

Sys<strong>to</strong>lic = 96<br />

Sys<strong>to</strong>lic = 82<br />

123456789012345678901234567890<br />

Multiple locations may be specified. This instruction:<br />

or <strong>the</strong> equivalent:<br />

[ PUTL @EQUAL15:35 Sys<strong>to</strong>lic Dias<strong>to</strong>lic ]<br />

[ PUTL @EQUAL=15:35 Sys<strong>to</strong>lic Dias<strong>to</strong>lic ]<br />

aligns <strong>the</strong> variable names and values about <strong>the</strong> equal-signs, which are positioned in columns 15 and 35. Ei<strong>the</strong>r<br />

instruction produces:


11.14 TEXTWRITER: Report Writing<br />

Sys<strong>to</strong>lic = 96 Dias<strong>to</strong>lic = 123<br />

Sys<strong>to</strong>lic = 82 Dias<strong>to</strong>lic = 114<br />

1234567890123456789012345678901234567890<br />

If <strong>the</strong> variable name and value does not fit in <strong>the</strong> line when <strong>the</strong> equal-sign is positioned in <strong>the</strong> specified column,<br />

it is placed in <strong>the</strong> next line of text. @NOEQUAL turns off alignment about <strong>the</strong> equal-sign.<br />

A very useful enhancement <strong>to</strong> <strong>the</strong> PUTL control word is <strong>the</strong> .ALL. system variable. This causes all <strong>the</strong> variables<br />

in <strong>the</strong> file <strong>to</strong> be printed. Figure 11.7 shows <strong>the</strong> command and <strong>the</strong> prin<strong>to</strong>ut that results.<br />

__________________________________________________________________________<br />

Figure 11.7 TEXTWRITER: Displaying all <strong>the</strong> Variables<br />

TEXTWRITER Tests [ PUTL @EQUAL=15:40:60 .ALL. @SKIP ] $<br />

SS.Number = 243-24-5007 Last = Wilmot First = James<br />

Vocab = 97 Riding = 90 Tea = 98<br />

Hockey = 78 Car = 71 Beer = 85<br />

Juggling = 64 Affairs = 97 Memo = 93<br />

SS.Number = 311-04-8831 Last = Higgin First = Sheila<br />

Vocab = 96 Riding = 89 Tea = 54<br />

Hockey = 86 Car = 70 Beer = 91<br />

Juggling = 82 Affairs = 96 Memo = 86<br />

__________________________________________________________________________<br />

You will almost always want <strong>to</strong> use @EQUALS with <strong>the</strong> .ALL. PUTL combination. If you do not specify<br />

where <strong>the</strong> equal signs should be placed, <strong>the</strong> variables and values are printed one after ano<strong>the</strong>r with 4 spaces between<br />

<strong>the</strong> value and <strong>the</strong> next variable name. The amount that is printed on each line is determined by <strong>the</strong> current<br />

output width setting up <strong>to</strong> a maximum of 400 characters.<br />

PUTL prints a value with its variable name. PUT prints a value. If you are providing a series of variable<br />

names some of which are <strong>to</strong> have labels and some of which should be printed without labels, you may use <strong>the</strong><br />

@LABEL and @NOLABEL control words. @LABEL in a PUT is <strong>the</strong> equivalent of a PUTL. @NOLABEL in<br />

a PUTL is <strong>the</strong> equivalent of a PUT. The command:<br />

TEXTWRITER Tests [ PUTL Last First @NONAME SS.Number ] $<br />

produces <strong>the</strong> following output.<br />

Last = Wilmot First = James 243-24-5007<br />

Last = Higgin First = Sheila 311-04-8831<br />

11.21 Specifying Missing Characters<br />

The @MISS control word specifies a character or a character string <strong>to</strong> print in place of <strong>the</strong> dash or dashes that<br />

usually print for missing values. @MISS, followed by a character or string in quotes, requests that character or<br />

string print for any of <strong>the</strong> three types of missing values:<br />

[ PUT @MISS


TEXTWRITER: Report Writing 11.15<br />

[ PUT 'The cus<strong>to</strong>mer response was '<br />

@MISS1"don’t know" @MISS2'no answer' @MISS3'refuse <strong>to</strong> answer'<br />

Response4 '.' ]<br />

@M1, @M2 and @M3 are abbreviations for <strong>the</strong> three types of missing control words.<br />

The missing specifications are in effect until <strong>the</strong>y are reset or turned off with @NOMISS. The @NOMISS<br />

control word returns <strong>the</strong> missing character <strong>to</strong> <strong>the</strong> dash, <strong>the</strong> initial default character.<br />

@NOMISS1, @NOMISS2 and @NOMISS3 may be used <strong>to</strong> selectively reset specific missing control words.<br />

__________________________________________________________________________<br />

Figure 11.8 A Complex Report: The Input and Labels Files<br />

File Tests<br />

SS Number Last First Vocab Riding Tea Hockey Car<br />

243-24-5007 Wilmot James 97 90 98 78 71<br />

311-04-8831 Higgin Sheila 96 89 54 86 70<br />

Beer Juggling Affairs Memo<br />

85 64 97 93<br />

91 82 96 86<br />

File Tests.lab<br />

Score (1)superior (2)excellent<br />

(3)above average (4)average<br />

(5)marginal (6)poor /<br />

__________________________________________________________________________<br />

11.22 A Complex Report<br />

This section discusses <strong>the</strong> more complex report shown in Figures 11.8,11.9 and 11.10. More of <strong>the</strong> potential of<br />

<strong>PPL</strong> is illustrated. Figure 11.8 shows a portion of an input file containing scores for various aptitude tests (some<br />

of which are perhaps a little silly). Figure 11.9 is <strong>the</strong> desired output and Figure 11.10 contains <strong>the</strong> command.<br />

In <strong>the</strong> first case, scores on Tea and Vocab and Affairs, which are all above 95, are recoded as a 1. Therefore,<br />

<strong>the</strong>y will be represented in <strong>the</strong> first sentence that is constructed. Riding and Memo, which are recoded as a 2, are<br />

represented in <strong>the</strong> second sentence. Beer is recoded as a 3 and is represented in <strong>the</strong> third sentence. For each case,<br />

<strong>the</strong> contents of each sentence and even <strong>the</strong> number of sentences is different.<br />

The general procedure of <strong>the</strong> instructions is:<br />

1. Generate all of <strong>the</strong> variables and scratch variables that will be used in <strong>the</strong> command. Since none of<br />

<strong>the</strong>m are given initial values <strong>the</strong>y are set <strong>to</strong> missing.<br />

2. Select <strong>the</strong> desired variables and do any necessary recoding. Specify a heading for each report (each<br />

case is a separate report on a new page).


11.16 TEXTWRITER: Report Writing<br />

__________________________________________________________________________<br />

Figure 11.9 A Complex Report: The Report (Two Pages)<br />

James Wilmot (243-24-5007)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills, tea pouring and knowledge of current affairs<br />

are in <strong>the</strong> superior range by Company standards. Horseback riding and<br />

memo writing are excellent. Beer drinking is above average. Field<br />

hockey play is average. Car parking is marginal. Juggling is poor.<br />

..............<br />

Sheila Higgin (311-04-8831)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills and knowledge of current affairs are in <strong>the</strong><br />

superior range by Company standards. Horseback riding and beer<br />

drinking are excellent. Field hockey play, juggling and memo writing<br />

are above average. Car parking is marginal. Tea pouring is poor.<br />

__________________________________________________________________________<br />

3. Initialize #Sentence <strong>to</strong> 0. Start <strong>the</strong> All.Scores DO LOOP through <strong>the</strong> 6 possible scores with <strong>the</strong><br />

scratch variable #S.<br />

4. Count <strong>the</strong> number of tests that match <strong>the</strong> current value of #S. For example, for <strong>the</strong> first case when<br />

#S = 1, #NV.of.Score = 3; James Wilmot has three values of score 1 (97, 98 and 97 after recoding).<br />

5. Set #Used <strong>to</strong> zero. Start <strong>the</strong> All.Tests DO loop through <strong>the</strong> 9 test scores. When a value of a test<br />

equals #S, increase #Used and start writing.<br />

When processing <strong>the</strong> first case, 97 (<strong>the</strong> value of variable Vocab) is recoded as a 1. The KEEP reorders<br />

<strong>the</strong> file so that Vocab is <strong>the</strong> first variable in <strong>the</strong> file. The first time through <strong>the</strong> All.Scores loop<br />

#S=1. Within that loop, <strong>the</strong> first time through <strong>the</strong> All.Tests loop #J=1, <strong>the</strong> position of Vocab. Since<br />

<strong>the</strong> value of Vocab is a 1 which equals <strong>the</strong> 1 of #S, we have a match for our first sentence.<br />

6. Generate #Name <strong>to</strong> hold <strong>the</strong> desired name for each test. The first variable Vocab is called “vocabulary<br />

skills” in <strong>the</strong> report.<br />

7. If this is <strong>the</strong> first item in a sentence (#Used = 1), capitalize <strong>the</strong> first letter of <strong>the</strong> name.<br />

8. Depending on <strong>the</strong> number of items that are <strong>to</strong> go in <strong>the</strong> sentence (#NV.of.Score) and <strong>the</strong> number already<br />

in (#Used), put <strong>the</strong> name of <strong>the</strong> aptitude test and possibly a comma or <strong>the</strong> word “and”.<br />

9. Put “is” or “are” in <strong>the</strong> sentence, depending on how many items (#Used) have been used in <strong>the</strong> sentence<br />

already.<br />

10. Change <strong>the</strong> text slightly for sentences after <strong>the</strong> first one <strong>to</strong> make <strong>the</strong> report less repetitious. When<br />

<strong>the</strong> sentence for a given score has been written, <strong>the</strong> All.Tests loop is complete. When all <strong>the</strong> 6 scores<br />

have been evaluated, <strong>the</strong> All.Scores loop is complete.


TEXTWRITER: Report Writing 11.17<br />

__________________________________________________________________________<br />

Figure 11.10 A Complex Report: The TEXTWRITER Command<br />

TEXTWRITER Tests<br />

[<br />

[<br />

[<br />

/* 1. Recoding, Beginning Each Case */;<br />

GEN Score;<br />

GEN #Sentence, GEN #NV.of.score, GEN #USED, GEN #Name:C32 ;<br />

KEEP Vocab TO Memo .OTHERS.;<br />

DO #J = 1, 9;<br />

SET V(#J) = RECODE ( V(#J),<br />

0 TO 65 = 6, 65 TO 73 = 5, 73 TO 80 = 4,<br />

80 TO 88 = 3, 88 TO 95 = 2, 95 TO 100 = 1 ) ;<br />

ENDDO;<br />

PUT @PAGE First Last > SS.Number @NEXT ;<br />

PUT @15 @PARA ]<br />

SET #Sentence = 0 ;<br />

/* #Sentence is <strong>the</strong> count of <strong>the</strong> number of sentences<br />

#S controls <strong>the</strong> loop through <strong>the</strong> 6 possible scores<br />

#NV.of.Score is number of variables with a given score */;<br />

DO All.Scores #S = 1, 6;<br />

SET #NV.of.Score = 0 ;<br />

DO #J = 1, 9;<br />

IF V(#J) EQ #S, INC #NV.of.Score ;<br />

ENDDO;<br />

/* No tests had this score, get <strong>the</strong> next one */;<br />

IF #NV.of.Score EQ 0 NEXTDO; ]<br />

/* #Used is number of items used in sentence<br />

#J is <strong>the</strong> position (1-9) of a test variable */;<br />

SET #Used = 0;<br />

DO All.Tests #J = 1, 9;<br />

IF V(#J) NE #S, NEXTDO;<br />

INC #Used;<br />

/* #Name is <strong>the</strong> name of each test item */;<br />

SET #Name = RECODE ( #J,<br />

1 = , 2 = ,


11.18 TEXTWRITER: Report Writing<br />

[<br />

3 = , 4 = ,<br />

5 = , 6 = ,<br />

7 = ,<br />

8 = ,<br />

9 = ) ;<br />

/* Capitalize <strong>the</strong> name of <strong>the</strong> first test item */;<br />

IF #Used EQ 1, SET #Name =<br />

CHANGE ( #Name, 1, 1, UPPER (SUBSTRING (#Name, 1, 1) ) );<br />

/* Use commas and “and” appropriately<br />

<strong>the</strong>n get <strong>the</strong> next test */;<br />

IF #NV.of.Score EQ 1, PUT #Name, NEXTDO;<br />

IF #Used LE #NV.of.Score - 2, PUT #Name ;<br />

IF #Used EQ #NV.of.Score - 1, PUT #Name ;<br />

IF #Used EQ #NV.of.Score, PUT > #Name ;<br />

All.Tests: ENDDO ]<br />

/* Write sentence, set score <strong>to</strong> #S for labelling */;<br />

INC #Sentence ;<br />

/* Use “is” and “are” appropriately */;<br />

IF #Used EQ 1, PUT > ;<br />

IF #Used GT 1, PUT > ;<br />

/* Different text for <strong>the</strong> first sentence only */;<br />

SET Score = #S;<br />

IF #Sentence EQ 1, PUT Score<br />

> ;<br />

IF #Sentence NE 1, PUT Score ;<br />

All.Scores: ENDDO ],<br />

JUSTIFY, WIDTH 60, LABELS 'Tests.lab' $<br />

__________________________________________________________________________<br />

11.23 Control Word Summary<br />

Control words all begin with “@” and many of <strong>the</strong>m are followed by a number giving a column location or o<strong>the</strong>r<br />

value. The number may follow directly after <strong>the</strong> control word (@SKIP2) or after an equal-sign (@SKIP=2). Although<br />

<strong>the</strong> argument directly following <strong>the</strong> control word is typically a number, it may be any expression that<br />

evaluates <strong>to</strong> a numeric value.<br />

The following control words remain in effect throughout <strong>the</strong> processing of a case by <strong>the</strong> TEXTWRITER<br />

command, unless <strong>the</strong>y are specifically changed or turned off:<br />

@INDENT @EQUAL<br />

@WIDTH @MISS<br />

@JUST @COMMAS


TEXTWRITER: Report Writing 11.19<br />

@TRIM @PLACES<br />

@LABEL @FONT1 - @FONT9<br />

Thus, specifying @JUST means that all <strong>the</strong> text specified in this and subsequent PUT clauses will be justified;<br />

specifying @COMMAS means that all numeric values will have commas in <strong>the</strong>m. Prefacing <strong>the</strong> control word with<br />

“NO” (@NOJUST or @NOCOMMAS) turns off a prior setting. It resets <strong>the</strong> setting <strong>to</strong> that initially assumed by<br />

<strong>the</strong> TEXTWRITER command:<br />

[ PUT 'Invoice number ' Inv.No ', for $'<br />

@COMMAS Inv.Amt ', dated ' @NOCOMMAS Inv.Date<br />

' is past due.' ]<br />

Commas are inserted only in values of <strong>the</strong> numeric variable Inv.Amt and not in Inv.No or Inv.Date, which are<br />

character variables.<br />

The following control words do not remain in effect throughout <strong>the</strong> processing of a case. They apply only<br />

<strong>to</strong> <strong>the</strong> variable expression or character string that directly follows:<br />

@nn @PAGE<br />

@PLUS @NEXT<br />

@MINUS @PARA<br />

@BEFORE @SKIP<br />

These control words must be reissued <strong>to</strong> produce <strong>the</strong> desired results again. (The “nn” above represents a positive<br />

whole number.)<br />

The STREAM and CASE identifiers also affect <strong>the</strong> action of control words. Remember that in CASE mode,<br />

text is flushed and a new line is started when processing of a new case is begun. In STREAM mode, text is flushed<br />

only when processing of all cases is complete. CASE is assumed when nei<strong>the</strong>r identifier is used in <strong>the</strong> TEXT-<br />

WRITER command; STREAM must be specified if it is desired.<br />

When STREAM mode is specified, all control words that typically remain in effect, except @INDENT and<br />

@WIDTH that flush and print text, are reset when processing of a new case begins. In CASE mode, all control<br />

words, including @INDENT and @WIDTH, are reset <strong>to</strong> <strong>the</strong>ir initial default values.<br />

Thus, a PUT instruction such as this,<br />

[ PUT 'Student ' ID.No ' has an outstanding bill of $'<br />

@COMMAS @PLACES2 Balance '.' ]<br />

which specifies that <strong>the</strong> variable Balance print with commas and two decimal places, does not need <strong>to</strong> be initialized<br />

for each case. After <strong>the</strong> first student is processed, <strong>the</strong> variable ID.No for <strong>the</strong> second student will not print<br />

with commas and two places. The @COMMAS and @PLACES2 control words are reset <strong>to</strong> <strong>the</strong>ir default values<br />

at <strong>the</strong> start of each case. However, <strong>the</strong>y do remain in effect throughout a case. Thus, if <strong>the</strong> prior PUT instruction<br />

was followed by ano<strong>the</strong>r PUT instruction, any numeric values specified in that instruction would print with commas<br />

and two decimal places. @NOCOMMAS and @NOPLACES would have <strong>to</strong> precede those numeric variables<br />

<strong>to</strong> reset <strong>the</strong>se control words.<br />

11.24 COMPARING TEXTWRITER AND OTHER COMMANDS<br />

Any of <strong>the</strong> control words, except @INDENT, @JUST, @SPREAD, @WIDTH, and <strong>the</strong> PostScript controls such<br />

as @FONT1, may be used in PUT clauses following any command. They are not exclusive <strong>to</strong> <strong>the</strong> TEXTWRITER<br />

command. Thus, brief reports or “dump messages” may be produced as a system file is processed by any command.<br />

However, as with all P-<strong>STAT</strong> commands, <strong>the</strong> TEXTWRITER identifiers can be used only in <strong>the</strong><br />

TEXTWRITER command.<br />

There is a basic difference between TEXTWRITER and o<strong>the</strong>r commands in <strong>the</strong> way text is output. When<br />

TEXTWRITER is used, text prints continuously — a new line is not started unless <strong>the</strong> prior line is full or control


11.20 TEXTWRITER: Report Writing<br />

words specify a new line. Text is flushed and a new line is started only when processing of a new case is begun,<br />

unless STREAM mode is specified. Then text is flushed only when processing of all cases is complete.<br />

When ano<strong>the</strong>r command (not TEXTWRITER) is used, a new line is started and text is written whenever a<br />

PUT clause ends unless <strong>the</strong> final character in <strong>the</strong> clause is an “@” by itself. This causes <strong>the</strong> line <strong>to</strong> be held for<br />

<strong>the</strong> possible addition of more text. When processing of a case is finished, all text is written, regardless of whe<strong>the</strong>r<br />

<strong>the</strong> line was held or not.<br />

The control character “@” is used at <strong>the</strong> end of a PUT instruction <strong>to</strong> hold <strong>the</strong> column pointer in <strong>the</strong> same line:<br />

PROCESS Class102<br />

[ PUT ID <br />

(ROUND (MEAN.GOOD (Test?) ) ) @ ;<br />

IF (COUNT.GOOD (Test?) ) NE 8, PUT ] $<br />

The “@” after <strong>the</strong> period keeps <strong>the</strong> column pointer in <strong>the</strong> same line. If <strong>the</strong>re are not eight good test scores, an<br />

additional text string is put in that line. This report is produced:<br />

1022: Average is 85.<br />

1248: Average is 88. Missing some tests.<br />

This command, without <strong>the</strong> “@”,<br />

produces:<br />

PROCESS Class102<br />

[ PUT ID <br />

(ROUND (MEAN.GOOD (Test?) ) ) ;<br />

IF (COUNT.GOOD (Test?) ) NE 8, PUT ] $<br />

1022: Average is 85.<br />

1248: Average is 88.<br />

Missing some tests.<br />

The text about <strong>the</strong> missing tests appears on a new line. (It would not be necessary <strong>to</strong> use <strong>the</strong> control character “@”<br />

in <strong>the</strong> TEXTWRITER command, which assumes continuous printing of text.<br />

11.25 OPTIONAL IDENTIFIERS: PostScript<br />

The TEXTWRITER command can use any font that is available on your PostScript printer <strong>to</strong> produce cameraready<br />

prin<strong>to</strong>ut. Figure 11.11 shows <strong>the</strong> output that results when PostScript controls are added <strong>to</strong> <strong>the</strong> complex report<br />

described in Figures 11.8, 11.9, and 11.10. The controls for <strong>the</strong> page and paragraph are changed <strong>to</strong> request<br />

fonts:<br />

PUT @PAGE @FONT2<br />

First Last > SS.Number @NEXT @FONT1 ;<br />

PUT @15 @PARA @FONT3; ]<br />

The following PostScript identifiers are added before <strong>the</strong> final “$”. LEFT.EDGE, which uses inches, replaces<br />

WIDTH 70, which uses number of characters, <strong>to</strong> determine <strong>the</strong> width of <strong>the</strong> prin<strong>to</strong>ut.<br />

POSTSCRIPT, PORTRAIT, FONT1 TIMES ROMAN BOLD 14,<br />

FONT2 TIMES ROMAN BOLD 12, FONT3 ARIAL 12,<br />

FONT4 TIMES ROMAN BOLDITALIC 12, LEFT.EDGE 2., PR 'Test.ps' $<br />

The identifier POSTSCRIPT causes <strong>the</strong> initial control codes that PostScript requires <strong>to</strong> be written <strong>to</strong> <strong>the</strong> output<br />

file. You should always use a PR identifier as PostScript output written <strong>to</strong> a non-PostScript device such as <strong>the</strong><br />

terminal prints <strong>the</strong> control words ra<strong>the</strong>r than implementing <strong>the</strong>m. Usually one of <strong>the</strong> two identifiers PORTRAIT


TEXTWRITER: Report Writing 11.21<br />

or LANDSCAPE is used. PORTRAIT is used when <strong>the</strong> prin<strong>to</strong>ut is going <strong>to</strong> paper that is 8.5 nches wide and 11<br />

inches high. LANDSCAPE, <strong>the</strong> assumed orientation, is used for output that is 11 inches wide by 8.5 inches high.<br />

If you have paper that is a different size, <strong>the</strong> P-<strong>STAT</strong> POSTSCRIPT.SETUP command can be used <strong>to</strong> set <strong>the</strong> paper<br />

size. POSTSCRIPT.SETUP can also be used <strong>to</strong> set fonts and margins.<br />

The area that is <strong>to</strong> be used on <strong>the</strong> paper can be controlled by using <strong>the</strong> TOP.EDGE. BOTTOM.EDGE,<br />

LEFT.EDGE and RIGHT.EDGE identifiers. The arguments are given in inches. 1 inch margins are assumed for<br />

any edge that is not supplied. In Figures 11.1 through 11.4, <strong>the</strong> identifiers used <strong>to</strong> make <strong>the</strong> printed output fit nicely<br />

are:<br />

TOP.EDGE .5, BOTTOM.EDGE 5.5, LEFT.EDGE 1.5, RIGHT.EDGE 1.5<br />

11.26 PostScript Page Changes<br />

The assumption is that each time you use <strong>the</strong> @PAGE control word, <strong>the</strong> PostScript page will be sent <strong>to</strong> <strong>the</strong> printer.<br />

This setting is controlled by <strong>the</strong> SHOWPAGE identifier. However, using PostScript is more like drawing on a<br />

slate than writing lines on a page. It is possible <strong>to</strong> move around <strong>the</strong> page, overwrite, draw lines or print text. In<br />

<strong>the</strong> P-<strong>STAT</strong> implementation any command that has PostScript support can be combined on a single page with any<br />

o<strong>the</strong>r such command.<br />

When NO SHOWPAGE is used, a page is not sent <strong>to</strong> <strong>the</strong> printer until a subsequent command uses <strong>the</strong> SHOW-<br />

PAGE identifier. SHOWPAGE is assumed unless NO SHOWPAGE is used. An au<strong>to</strong>matic page change occurs<br />

when a block of text extends beyond <strong>the</strong> defined bot<strong>to</strong>m of <strong>the</strong> page unless NO SHOWPAGE is in effect.<br />

__________________________________________________________________________<br />

Figure 11.11 PostScript Output<br />

James Wilmot (243-24-5007)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills, tea pouring and knowledge of current affairs are in<br />

<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />

writing are excellent. Beer drinking is above average. Field hockey play is<br />

average. Car parking is marginal. Juggling is poor.<br />

Sheila Higgin (311-04-8831)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />

range by Company standards. Horseback riding and beer drinking are<br />

excellent. Field hockey play, juggling and memo writing are above<br />

average. Car parking is marginal. Tea pouring is poor.<br />

__________________________________________________________________________<br />

11.27 Setting <strong>the</strong> Fonts<br />

The identifiers that are used <strong>to</strong> set <strong>the</strong> fonts are: FONT and FONT1 through FONT9. When FONT is used by<br />

itself it sets all 9 of <strong>the</strong> available fonts <strong>to</strong> <strong>the</strong> supplied setting. If FONT is not supplied <strong>the</strong> assumed font is Times-<br />

Roman. The assumed pointsize depends on <strong>the</strong> combination of orientation (LANDSCAPE or PORTRAIT) and


11.22 TEXTWRITER: Report Writing<br />

<strong>the</strong> output width. In PORTRAIT orientation with an output width less than 80, <strong>the</strong> default pointsize is 10. In<br />

LANDSCAPE orientation with a width that is greater than 80, <strong>the</strong> pointsize is set <strong>to</strong> 8.<br />

Font names must be correctly spelled. Several of <strong>the</strong> more common font names are available as keywords so<br />

that you need not remember <strong>the</strong> exact form (upper and lower case and hyphenation). The available combinations<br />

are:<br />

TIMES HELVETICA COURIER<br />

TIMES BOLD HELVETICA BOLD COURIER BOLD<br />

TIMES ITALIC HELVETICA OBLIQUE COURIER OBLIQUE<br />

TIMES BOLDITALIC HELVETICA BOLDOBLIQUE COURIER BOLDOBLIQUE<br />

These must be preceded by one of <strong>the</strong> FONT identifiers and optionally followed by <strong>the</strong> desired pointsize.<br />

FONT1 HELVETICA 10, FONT3 TIMES ITALIC, FONT4 COURIER,<br />

Helvetica and Times are proportional fonts. In a proportional font each letter takes up an appropriate amount of<br />

space so that an i is not as wide as a W. Courier is a monospace font and each letter takes up <strong>the</strong> same amount of<br />

room.<br />

You may use any font that is available on your laser printer. However, if it is not in <strong>the</strong> list of keywords, it<br />

must be enclosed in quotes. For example:<br />

FONT9 'ZapfChancery-MediumItalic' 10,<br />

Fonts can be only be defined with <strong>the</strong> TEXTWRITER identifiers or in a previous POSTSCRIPT or POST-<br />

SCRIPT.SETUP command. Their usage in <strong>the</strong> textwriter output is done by using TEXTWRITER control words.<br />

11.28 TEXTWRITER Control Words: The Fonts<br />

The font control words are any of @FONT1 through @FONT9. The font change takes effect immediately and<br />

remains in effect until <strong>the</strong> next font control word is specified. If a number of fonts have been specified but no font<br />

control word is used, FONT1 is assumed.<br />

The output in Figures 11.1 <strong>to</strong> 11.4 use four fonts defined by identifiers in <strong>the</strong> command:<br />

FONT1 TIMES BOLD 14, FONT2 TIMES BOLD 12,<br />

FONT3 ARIAL 12, FONT4 TIMES BOLDITALIC 12<br />

Within <strong>the</strong> TEXTWRITER <strong>PPL</strong>, <strong>the</strong> control words @FONT1, @FONT2, @FONT3 and @FONT4 are used <strong>to</strong><br />

specify which of <strong>the</strong> defined fonts <strong>to</strong> use for a particular piece of text.<br />

In Figure 11.11 <strong>the</strong> 2 paragraphs are in a regular Arial font while <strong>the</strong> two heading lines are in Times Bold 12<br />

and Times Bold 14. Justific1ation is not requested for Figure 11.11. In Figure 11.12 <strong>the</strong> text in <strong>the</strong> paragraphs is<br />

right/left justified. This is done by adding “, JUSTIFY” <strong>to</strong> <strong>the</strong> end of <strong>the</strong> TEXTWRITER command.<br />

In Figure 11.13 <strong>the</strong> paragraphs are not justified but have font changes in <strong>the</strong> middle of <strong>the</strong> paragraph. A<br />

change was made in <strong>the</strong> TEXTWRITER command <strong>to</strong> isolate <strong>the</strong> variable “Score” so that it could be printed in a<br />

bold italic font:<br />

SET Score = #S;<br />

IF #Sentence EQ 1, PUT ;<br />

PUT @FONT4; PUT Score;<br />

IF #Sentence NE 1, PUT ;<br />

PUT @FONT3;<br />

IF #Sentence EQ 1, PUT >;<br />

__________________________________________________________________________


TEXTWRITER: Report Writing 11.23<br />

Figure 11.12 Justification in PostScript Text<br />

James Wilmot (243-24-5007)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills, tea pouring and knowledge of current affairs are in<br />

<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />

writing are excellent. Beer drinking is above average. Field hockey play is<br />

average. Car parking is marginal. Juggling is poor.<br />

Sheila Higgin (311-04-8831)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />

range by Company standards. Horseback riding and beer drinking are<br />

excellent. Field hockey play, juggling and memo writing are above<br />

average. Car parking is marginal. Tea pouring is poor.<br />

__________________________________________________________________________<br />

__________________________________________________________________________<br />

Figure 11.13 Changing Fonts Text in a PostScript Paragraph<br />

James Wilmot (243-24-5007)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills, tea pouring and knowledge of current affairs are in<br />

<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />

writing are excellent. Beer drinking is above average. Field hockey play<br />

is average. Car parking is marginal. Juggling is poor.<br />

Sheila Higgin (311-04-8831)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />

range by Company standards. Horseback riding and beer drinking<br />

are excellent. Field hockey play, juggling and memo writing are above<br />

average. Car parking is marginal. Tea pouring is poor.<br />

__________________________________________________________________________


11.24 TEXTWRITER: Report Writing<br />

In Figure11.14, <strong>the</strong> text of <strong>the</strong> paragraphs is both justified and has font changes in <strong>the</strong> middle of <strong>the</strong> text. As<br />

you can see <strong>the</strong> spacing is not as good as it is in Figure 11.12. This is because any font change, color change or<br />

underline causes a flush. The program does <strong>the</strong> justification by estimating what might come next. This usually<br />

results in somewhat more space between <strong>the</strong> words. Text with many font changes may result in somewhat less<br />

attractive results than text without intermediate font changes. Text with many long words will also tend <strong>to</strong> have<br />

less attractive results when justified.<br />

__________________________________________________________________________<br />

Figure 11.14 Font Changes in a Justified PostScript Paragraph<br />

James Wilmot (243-24-5007<br />

Comments on INTELLECTUAL MEASURE<br />

Vocabulary skills, tea pouring and knowledge of current affairs are in<br />

<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />

writing are excellent. Beer drinking is above average. Field hockey play<br />

is average. Car parking is marginal. Juggling is poor.<br />

Sheila Higgin (311-04-8831<br />

Comments on INTELLECTUAL MEASURE<br />

Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />

range by Company standards. Horseback riding and beer drinking<br />

are excellent. Field hockey play, juggling and memo writing are above<br />

average. Car parking is marginal. Tea pouring is poor.<br />

11.29 Control Words: Positioning <strong>the</strong> Text<br />

There are two types of text <strong>to</strong> consider:<br />

1. text that spans multiple lines like <strong>the</strong> paragraphs in <strong>the</strong> previous figures and<br />

2. tables or short pieces of text which need <strong>to</strong> be positioned at particular places.<br />

When PostScript is in effect, <strong>the</strong> control words such as @ and @BEFORE do not work well with proportional<br />

fonts. To account for <strong>the</strong> effects of proportional fonts and <strong>the</strong> fact that a PostScript page is not written from <strong>to</strong>p<br />

<strong>to</strong> bot<strong>to</strong>m but “drawn” on <strong>the</strong> page, <strong>the</strong>re are many TEXTWRITER control words specifically for use with<br />

PostScript.<br />

The following control words use inch measurements <strong>to</strong> specify where <strong>the</strong> string or number that follows is <strong>to</strong><br />

be placed and how it is <strong>to</strong> be placed in relation <strong>to</strong> that location.<br />

3. @CINCH=nncenters <strong>the</strong> text that follows at inch location nn. The text ends with <strong>the</strong> next control<br />

word of a type that causes a buffer flush such as @NEXT, @FLUSH, or ano<strong>the</strong>r<br />

@CINCH type. Suppose var1 equals 111.222 .<br />

@CINCH=3.1 @PLACES2 var1 @SKIP2;<br />

produces “variable one = 111.22” and puts is so that its middle point is 3.1<br />

inches in<strong>to</strong> <strong>the</strong> current line.


TEXTWRITER: Report Writing 11.25<br />

Note: @PLACES is a textwriter control word that does not cause a flush of <strong>the</strong><br />

current text buffer. Since @SKIP flushes <strong>the</strong> text buffer, it ends <strong>the</strong> CINCH.<br />

4. @CINCH.U=nncenters and also underlines <strong>the</strong> string at inch nn.<br />

5. @RINCH=nnputs <strong>the</strong> textwriter text right justified at <strong>the</strong> specified location. This works well for<br />

numbers if <strong>the</strong> number of decimal places is controlled.<br />

6. @RINCH.U=nnright justifies and underlines <strong>the</strong> text.<br />

7. @LINCH=nn left justifies text at <strong>the</strong> specified location.<br />

8. @LINCH.U=nnleft justifies and underlines <strong>the</strong> text<br />

__________________________________________________________________________<br />

Figure 11.15 TEXTWRITER: Tabular Ouput with PostScript<br />

LIST numbers $<br />

Var1 Var2 Var3<br />

123.11 168.50 568.12<br />

12.239 45.67 33.20<br />

123.45 211.99 444.44<br />

TEXTWRITER numbers<br />

[ IF FIRST ( .FILE. ) THEN;<br />

PUT @LEADING=2 @Y1 @NEXT<br />

@CINCH.U=2.5 <br />

@RINCH.U=4.7 <br />

@LINCH.U=5.7 @SKIP=2;<br />

ENDIF;<br />

PUT @NOPLACES;<br />

IF var1 GE 35 THEN;<br />

PUT @PINCH=2.5 var1; ELSE; PUT @PINCH.U=2.5 var1; ENDIF;<br />

PUT @PLACES2;<br />

IF var2 GE 35 THEN;<br />

PUT @RINCH=4.5 var2; ELSE; PUT @RINCH.U=4.5 var2; ENDIF;<br />

IF var3 GE 35 THEN;<br />

PUT @LINCH=5.7 var3; ELSE; PUT @LINCH.U=5.7 var3; ENDIF;<br />

PUT @NEXT;<br />

IF LAST ( .FILE. ) THEN;<br />

PUT @NEXT @Y2 @LINEWIDTH=1.5 @DRAW.BOX @LINEWIDTH;<br />

PUT @X1=3.4 @DRAW.V @X1=5.2 @DRAW.V;<br />

ENDIF; ],<br />

POSTSCRIPT, PORTRAIT,<br />

LEFT.EDGE .5, RIGHT.EDGE .5, PR number.ps $<br />

__________________________________________________________________________<br />

9. @PINCH=nncenters <strong>the</strong> text around a specified lineup character, which is assumed <strong>to</strong> be a decimal<br />

point. This is good for writing a column of fractional numbers when <strong>the</strong> number<br />

of decimal places differs.


11.26 TEXTWRITER: Report Writing<br />

10. @PINCH.U=nnlike PINCH but it also underlines.<br />

11. @PINCH.CHAR='c' provides an alternate character such as '=' <strong>to</strong> be used in <strong>the</strong> pinch lineups. If no<br />

argument, it reverts <strong>to</strong> <strong>the</strong> default '.' .<br />

12. @FLUSH flushes <strong>the</strong> current textwriter buffer without also moving <strong>to</strong> <strong>the</strong> next line. The effect<br />

of flushing turns off temporary options like @CINCH or @UNDERLINE. It<br />

does not affect color settings.<br />

__________________________________________________________________________<br />

Figure 11.16 PostScript: Tables with Proportional Fonts<br />

column one column two column three<br />

123.1100 168.50 568.12<br />

12.23900 45.67 33.20<br />

123.4500 211.99 444.44<br />

__________________________________________________________________________<br />

If you wish <strong>to</strong> draw lines or boxes it is useful <strong>to</strong> he able <strong>to</strong> specify <strong>the</strong> coordinates which define <strong>the</strong> drawing<br />

area. The following are control words that make it easy <strong>to</strong> move <strong>the</strong> current location on <strong>the</strong> page <strong>to</strong> resume printing<br />

at that location or for line and box drawing:<br />

1. @X1 s<strong>to</strong>res a value that is <strong>the</strong> current left margin. If <strong>the</strong>re is an argument as @X1=3.4, that<br />

inch location is s<strong>to</strong>red. in <strong>the</strong> X1 variable.<br />

2. @X2 s<strong>to</strong>res a value that is <strong>the</strong> current right margin. An argument such as @X2=8.3 s<strong>to</strong>res<br />

that inch value in <strong>the</strong> X2 variable.<br />

3. @Y1 s<strong>to</strong>res a value that is <strong>the</strong> current <strong>to</strong>p margin. If an argument is supplied, that value is<br />

s<strong>to</strong>red in <strong>the</strong> Y1 variable.<br />

4. @Y2 s<strong>to</strong>res a value that is <strong>the</strong> current bot<strong>to</strong>m margin. If no argument is supplied, that argument<br />

is s<strong>to</strong>red in <strong>the</strong> Y2 variable.<br />

5. @MOVETO sets <strong>the</strong> current location <strong>to</strong> <strong>the</strong> X1/Y1 position. The next text string will begin at that<br />

location.<br />

6. @DRAW.H draws a horizontal line from <strong>the</strong> X1/Y1 position <strong>to</strong> <strong>the</strong> X2/Y1 position.<br />

7. @DRAW.V draws a vertical line from <strong>the</strong> X1/Y1 position <strong>to</strong> <strong>the</strong> X1/Y2 position.<br />

8. @DRAW.U=nnunderlines <strong>the</strong> current line from X1 <strong>to</strong> X2. nn is <strong>the</strong> amount below <strong>the</strong> current position<br />

where <strong>the</strong> underline should be drawn. The argument, nn, is in units of 1/<br />

72 of an inch. If no argument is given, <strong>the</strong> assumed value is 3.<br />

9. @DRAW.BOXdraws a box using X1/Y1 as <strong>the</strong> upper left coordinate and X2/Y2 as <strong>the</strong> lower right<br />

coordinate.


TEXTWRITER: Report Writing 11.27<br />

Yet ano<strong>the</strong>r group of control words which determine location include:<br />

1. @DOWN=nnmoves down that many lines. The actual distance depends on <strong>the</strong> point size of <strong>the</strong> font<br />

and <strong>the</strong> leading (<strong>the</strong> space between <strong>the</strong> lines). If nn is not specified, 1 is<br />

assumed.<br />

2. @UP=nn moves up that many lines. The actual distance depends on <strong>the</strong> point size and <strong>the</strong> leading.<br />

If nn is not specified, 1 is assumed.<br />

3. @TOP moves <strong>to</strong> <strong>the</strong> first line, just below <strong>the</strong> <strong>to</strong>p margin.<br />

4. @BOTTOM moves <strong>to</strong> <strong>the</strong> last line, just above <strong>the</strong> bot<strong>to</strong>m margin.<br />

5. @LEADING=nnspecifies <strong>the</strong> space between lines. LEADING is usually set <strong>to</strong> 1/72 of an inch.<br />

LEADING=3, increases <strong>the</strong> space <strong>to</strong> 3/72 of an inch. A larger LEADING improves<br />

<strong>the</strong> readability of text when a large point size is used.<br />

6. @LINEWIDTH=nn specifies <strong>the</strong> width of <strong>the</strong> lines and boxes that are drawn. LINEWIDTH is usually<br />

set at .5. This measurement is units of 1/72 of an inch. <strong>Inc</strong>reasing <strong>the</strong><br />

LINEWIDTH causes bolder looking lines. @LINEWITH with no argument resets<br />

it <strong>to</strong> <strong>the</strong> original value of .5. LINEWIDTH=36 would provide a border 1/2<br />

inch wide.<br />

Figure 11.15 contains <strong>the</strong> TEXTWRITER command which creates <strong>the</strong> table in Figure 11.16. Each of <strong>the</strong><br />

three columns is formatted differently <strong>to</strong> show <strong>the</strong> effect of left, right and center justification. The headings are<br />

both properly justified and underlined. The LEADING is increased so that <strong>the</strong>re is more room between lines for<br />

<strong>the</strong> underlining.<br />

PUT @LEADING=2 @Y1 @NEXT<br />

@CINCH.U=2.5 <br />

@RINCH.U=4.7 <br />

@LINCH.U=5.7 <br />

The use of @Y1 s<strong>to</strong>res <strong>the</strong> location of <strong>the</strong> line before <strong>the</strong> headings. This value is needed later when we draw<br />

<strong>the</strong> box around <strong>the</strong> table. The first column is of particular interest because <strong>the</strong> numbers are placed so that <strong>the</strong> decimal<br />

point is located 2.5 inches from <strong>the</strong> left edge of <strong>the</strong> paper. This location may or may not be <strong>the</strong> actual center<br />

of <strong>the</strong> number.<br />

IF var1 GE 35 THEN;<br />

PUT @PINCH=2.5 var1; ELSE; PUT @PINCH.U=2.5 var1; ENDIF;<br />

This IF statement tests <strong>the</strong> value of Var1. If <strong>the</strong> value is relatively large, <strong>the</strong> @PINCH control word is used<br />

<strong>to</strong> locate <strong>the</strong> value so that <strong>the</strong> decimal point falls 2.5 inches from <strong>the</strong> left edge of <strong>the</strong> paper. If <strong>the</strong> value is small,<br />

PINCH.U places <strong>the</strong> value and underlines it. PINCH and PINCH.U are very useful when a column of numbers<br />

contains values which have different numbers of decimal places. PINCH and PINCH.U can be used <strong>to</strong> line strings<br />

up around a character o<strong>the</strong>r than <strong>the</strong> decimal point; For example:<br />

PUT @PINCH.CHAR=”=”<br />

causes <strong>the</strong> equal sign <strong>to</strong> be used in determining <strong>the</strong> location of <strong>the</strong> text that follows.<br />

The second column in Figure 11.16 is right justified 4.5 inches from <strong>the</strong> left edge of <strong>the</strong> page. This works<br />

well when all <strong>the</strong> numbers have <strong>the</strong> same number of decimal places. The third column illustrates that left justification<br />

of a column of numbers is seldom satisfac<strong>to</strong>ry.<br />

11.30 Indenting Text<br />

The identifiers are used <strong>to</strong> set <strong>the</strong> initial margins for <strong>the</strong> page. The right and left margins can be adjusted by using<br />

@L.MARGIN and @R.MARGIN <strong>to</strong> supply an indent value. This value is an offset <strong>to</strong> <strong>the</strong> existing margins.<br />

@L.MARGIN=.5 @R.MARGIN=.5


11.28 TEXTWRITER: Report Writing<br />

This provides a half inch indent on each side of <strong>the</strong> page. Figure 11.17 illustrates <strong>the</strong> command and <strong>the</strong> resulting<br />

output. @L.MARGIN is used <strong>to</strong> create a hanging indent with text that explains both @L.MARGIN and<br />

@R.MARGIN. Note: a postive number as <strong>the</strong> argument moves <strong>the</strong> margin <strong>to</strong>wards <strong>the</strong> center of <strong>the</strong> page. A<br />

negative number moves <strong>the</strong> margin <strong>to</strong>wards <strong>the</strong> edge of <strong>the</strong> page.<br />

__________________________________________________________________________<br />

Figure 11.17 Indenting <strong>the</strong> Text<br />

TEXTWRITER Work [ Case 1;<br />

PUT @L.MARGIN=1.5;<br />

PUT ;<br />

PUT @SKIP @L.MARGIN @L.MARGIN=1.5;<br />

PUT ; ],<br />

POSTSCRIPT, PORTRAIT, PR marg.ps,<br />

FONT1 TIMES 12, LEFT.EDGE 1.25, RIGHT.EDGE 1.25$<br />

__________________________________________________________________________<br />

11.31 Colors in PostScript Output<br />

The assumption is that postscript output will be black on white. The black can be changed <strong>to</strong> any of red, orange,<br />

yellow, green, blue or violet. The control words are @RED, @ORANGE, @YELLOW, @GREEN, @BLUE,<br />

@VIOLET, @BLACK and @NOCOLOR. @NOCOLOR reverts <strong>to</strong> black. The change in color flushes any text<br />

that has preceded it but not yet been placed on <strong>the</strong> page.<br />

If you wish more control over <strong>the</strong> colors, you can use <strong>the</strong> POSTSCRIPT.SETUP command <strong>to</strong> assign colors<br />

<strong>to</strong> specific fonts using 3 numeric values for <strong>the</strong> amount of red, green and blue <strong>to</strong> be used. For example:<br />

POSTSCRIPT.SETUP, FONT4 HELVETICA, COLOR FONT4 .2 .6 .1<br />

Flushing makes a difference when justification is being done and <strong>the</strong> section of text is not yet complete. Justification<br />

is done by adding a tiny bit <strong>to</strong> <strong>the</strong> spaces between each word. To figure out how much <strong>to</strong> add, it is<br />

necessary <strong>to</strong> know <strong>the</strong> length of <strong>the</strong> text in <strong>the</strong> current font. The difference between <strong>the</strong> length of <strong>the</strong> text and <strong>the</strong><br />

available line width is divided by <strong>the</strong> number of words in <strong>the</strong> line. This is <strong>the</strong> amount used <strong>to</strong> pad <strong>the</strong> spaces.<br />

When a flush occurs in <strong>the</strong> middle, <strong>the</strong> <strong>to</strong>tal length of <strong>the</strong> text is not known so <strong>the</strong> amount must be estimated.<br />

It is this estimation which causes some of <strong>the</strong> justified lines <strong>to</strong> have more space between words that you expect<br />

looking at <strong>the</strong> text. When <strong>the</strong> lines are not justified, <strong>the</strong> flushing does not affect <strong>the</strong> spacing within <strong>the</strong> paragraphs<br />

even when <strong>the</strong>re are font and color changes.<br />

NOTE: Because <strong>the</strong> postscript commands do not have <strong>the</strong> actual font tables available, <strong>the</strong> spacing is based on<br />

estimates. The use of capitalized words in a justified line may cause overprinting. The flushing of <strong>the</strong> line that<br />

occurs when font changes are made can also contribute <strong>to</strong> imperfections in <strong>the</strong> spacing. The leading between lines


TEXTWRITER: Report Writing 11.29<br />

is also based on <strong>the</strong> pointsize of <strong>the</strong> fonts. Changing font sizes in <strong>the</strong> middle of paragraphs will also cause <strong>the</strong><br />

leading <strong>to</strong> change.<br />

11.32 Underlining Text<br />

It is easy <strong>to</strong> underline items in tables when using <strong>the</strong> @CINCH, @LINCH, @RINCH and @PINCH control<br />

words. Each of <strong>the</strong>se has an underline format which is <strong>the</strong> control word followed by “.U” <strong>to</strong> indicate underlining.<br />

To underline text in <strong>the</strong> middle of a sentence it is necessary <strong>to</strong> indicate where <strong>the</strong> underlining starts and where it<br />

ends. This is done with @UNDERLINE and @NOUNDERLINE.<br />

Figure 11.18 illustrates <strong>the</strong> output if @UNDERLINE and @NOUNDERLINE are added <strong>to</strong> <strong>the</strong> font changes<br />

for variable Score.<br />

PUT @FONT4 @UNDERLINE;<br />

PUT Score; IF Sentence NE 1, PUT ;<br />

PUT @FONT2 @NOUNDERLINE ;<br />

The code which figures out where <strong>to</strong> break up a line looks for blanks or <strong>the</strong> end of a chunk of text. In <strong>the</strong> example<br />

above if <strong>the</strong> decimal point that ends <strong>the</strong> sentence is separated from <strong>the</strong> word that precedes it by a font, color or<br />

underline control word, it may well end up by itself on <strong>the</strong> next line.<br />

There are three ways <strong>to</strong> emphasize important text.<br />

1. Change <strong>the</strong> font <strong>to</strong> an italic or bold typeface<br />

2. Change <strong>the</strong> color of <strong>the</strong> text.<br />

3. Underline <strong>the</strong> text.<br />

These methods are not exclusive and you may if you wish use all three <strong>to</strong> produce for example text that is red,<br />

italic and underlined.<br />

PUT @FONT6 @RED @UNDERLINE <br />

@FONT1 @BLACK @NOUNDERLINE ;<br />

@UNDERLINE and <strong>the</strong> “.U” control words all underline from <strong>the</strong> start of <strong>the</strong> designated string <strong>to</strong> <strong>the</strong> end of<br />

that string. With @UNDERLINE that can be many lines down a page. Underline ends when @NOUNDERLINE<br />

or an @NEXT, @SKIP or @PARA control word is encountered. It is not affected by a color or font change.<br />

The @DRAW.U control word underlines from one specific location on a line <strong>to</strong> ano<strong>the</strong>r specific location on<br />

that line. The position of that underline is usually 3/72 of an inch below <strong>the</strong> current line unless an argument is<br />

provided <strong>to</strong> provide a different distance. The start and end of <strong>the</strong> underline are determined by <strong>the</strong> values of @X1<br />

and @X2 which are initially set <strong>to</strong> <strong>the</strong> left and right edge values.<br />

@X1=3.5 @X2=6 @DRAW.U=5<br />

draws a line 2.5 inches long beginning 3.5 inches from <strong>the</strong> left edge of <strong>the</strong> paper and 5/72 of an inch below <strong>the</strong><br />

current line.


11.30 TEXTWRITER: Report Writing<br />

__________________________________________________________________________<br />

Figure 11.18 Underlining <strong>the</strong> Text<br />

James Wilmot (243-24-5007)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills, tea pouring and knowledge of current affairs are in<br />

<strong>the</strong> superior range by Company standards. Horseback riding and memo<br />

writing are excellent. Beer drinking is above average. Field hockey play<br />

is average. Car parking is marginal. Juggling is poor.<br />

Sheila Higgin (311-04-8831)<br />

Comments on INTELLECTUAL MEASURES<br />

Vocabulary skills and knowledge of current affairs are in <strong>the</strong> superior<br />

range by Company standards. Horseback riding and beer drinking<br />

are excellent. Field hockey play, juggling and memo writing are above<br />

average. Car parking is marginal. Tea pouring is poor.


TEXTWRITER: Report Writing 11.31<br />

TEXTWRITER<br />

Required:<br />

TEXTWRITER Invoices<br />

SUMMARY<br />

[ IF FIRST (.FILE.),<br />

PUT @PAGE .DATE. @SKIP1 ;<br />

IF Date.Paid MISSING AND<br />

DAYS (.NDATE., 'YYYYMMDD') -<br />

DAYS (Date.Invoice, 'YYYYMMDD') GT 30,<br />

PUT @NEXT Inv.Number ><br />

Amount.Due Date.Invoice<br />

Company<br />

> Phone ],<br />

JUSTIFY, WIDTH 60 $<br />

The TEXTWRITER command produces textual reports about <strong>the</strong> data in a P-<strong>STAT</strong> system file. TEXT-<br />

WRITER uses <strong>the</strong> PUT instruction <strong>to</strong> place text strings, <strong>the</strong> values of variables, scratch variables and<br />

system variables, and <strong>the</strong> evaluations of expressions in <strong>the</strong> report. Character strings are enclosed in<br />

quotes or between <strong>the</strong> directional signs “ > ”:<br />

[ IF FIRST (.FILE.),<br />

PUT @PAGE .DATE. @SKIP1 ;<br />

The PUTL instruction may also be used <strong>to</strong> put variable labels in <strong>the</strong> text. O<strong>the</strong>r <strong>PPL</strong> instructions, functions<br />

and opera<strong>to</strong>rs may be used for logical testing, recoding, calculations and o<strong>the</strong>r tasks. The placement<br />

of text is controlled using <strong>the</strong> "@" symbol and control words.<br />

Text is not right justified unless <strong>the</strong> identifier JUSTIFY or <strong>the</strong> control word @JUST specifies right justification.<br />

The WIDTH identifier or <strong>the</strong> control word @WIDTH specifies an output width o<strong>the</strong>r than <strong>the</strong><br />

current one.<br />

The previous TEXTWRITER command produces output in <strong>the</strong> following form:<br />

Outstanding Invoices as of Apr 16, 1995<br />

Invoice Number 1260 for $212.55, dated 950211, is past due.<br />

Please call Smith, Jakes & Row at 215-356-7000.<br />

TEXTWRITER fn<br />

specifies <strong>the</strong> name of <strong>the</strong> required input file. The filename is followed directly by P-<strong>STAT</strong> <strong>Programming</strong><br />

<strong>Language</strong> (<strong>PPL</strong>) clauses. (No comma follows <strong>the</strong> filename.)<br />

fn=file name nn=number cs=character string arg=keyword argument


11.32 TEXTWRITER: Report Writing<br />

Optional Identifiers:<br />

BLANKS nn<br />

CASE<br />

JUSTIFY<br />

gives <strong>the</strong> maximum number of blanks that may come between any two words after line justification.<br />

TEXTWRITER inserts additional blanks after certain punctuation characters and between words, if necessary,<br />

<strong>to</strong> justify <strong>the</strong> line of text. The default setting of BLANKS is four. A smaller number may be<br />

specified, but justification may be affected.<br />

specifies that text be flushed and printed at <strong>the</strong> conclusion of processing of each case, and that a new line<br />

be started at <strong>the</strong> start of processing of <strong>the</strong> next case. All control words are reset at <strong>the</strong> start of each case.<br />

This is <strong>the</strong> assumed mode.<br />

requests that <strong>the</strong> text be right-justified as well as left-justified; that is, <strong>the</strong> lines of text align on <strong>the</strong> right<br />

edge as well as <strong>the</strong> left. When JUSTIFY is not specified, only <strong>the</strong> left edge of <strong>the</strong> report is aligned. The<br />

control words @JUST and @NOJUST may be used in <strong>the</strong> <strong>PPL</strong> clauses <strong>to</strong> override <strong>the</strong> current justification<br />

setting.<br />

LABELS fn<br />

provides <strong>the</strong> name of a labels file. If a value in <strong>the</strong> prin<strong>to</strong>ut belongs <strong>to</strong> a numeric variable which is represented<br />

in <strong>the</strong> labels file, <strong>the</strong> text for that value is used instead of <strong>the</strong> number. Extended variable labels<br />

in <strong>the</strong> labels file are ignored.<br />

LEADBLANK and NO LEADBLANK<br />

each line of text is usually started with an initial blank. This is used as a carriage control character. If<br />

<strong>the</strong> output is not going <strong>to</strong> a printer you may use <strong>the</strong> NO LEADBLANK identifier <strong>to</strong> remove this extra<br />

blank. LEADBLANK is <strong>the</strong> assumed setting.<br />

MARGIN nn<br />

specifies <strong>the</strong> number of columns <strong>to</strong> indent text from <strong>the</strong> left margin. MARGIN 0 is assumed when <strong>the</strong><br />

MARGIN identifier is not used. Within <strong>PPL</strong> clauses, <strong>the</strong> control word @INDENT can be used for additional<br />

indentation beyond <strong>the</strong> MARGIN setting.<br />

OUT fn<br />

provides <strong>the</strong> name for a new P-<strong>STAT</strong> system file which contains <strong>the</strong> contents in <strong>the</strong> input file as <strong>the</strong>y are<br />

modified by <strong>the</strong> TEXTWRITER <strong>PPL</strong>.<br />

PUTL.CHARS 'cs'<br />

STREAM<br />

provides 1 <strong>to</strong> 3 characters which will replace <strong>the</strong> “ = “ (blank, equal-sign, blank) which usually separates<br />

<strong>the</strong> variable name from <strong>the</strong> value in a PUTL situation.<br />

specifies <strong>the</strong> continuous output of text — text is not flushed and printed upon <strong>the</strong> completion of each<br />

case. All control words except @INDENT and @WIDTH, which cause <strong>the</strong> flushing of text, are reset at<br />

<strong>the</strong> start of processing of each case. CASE is assumed when STREAM is not specified.<br />

SPREAD and NO SPREAD<br />

SPREAD is assumed. It causes a single blank <strong>to</strong> be placed between adjacent variables. NO SPREAD<br />

causes <strong>the</strong> variables <strong>to</strong> be written out directly one after <strong>the</strong> o<strong>the</strong>r with no intervening space.<br />

arg=keyword argument fn=file name nn=number cs=character string


TEXTWRITER: Report Writing 11.33<br />

WIDTH nn<br />

gives <strong>the</strong> number of columns <strong>to</strong> be used for <strong>the</strong> report. When WIDTH is not used, <strong>the</strong> current output<br />

width defines <strong>the</strong> number of columns up <strong>to</strong> a maximum of 400. A specified WIDTH can be from 2 <strong>to</strong><br />

400. The control word @WIDTH does <strong>the</strong> same thing within PUT instructions.<br />

WIDTH is measured from <strong>the</strong> first column. Thus, if MARGIN 20 and WIDTH 80 are specified, <strong>the</strong>re<br />

are 60 columns available for text. These columns can be referred <strong>to</strong> using @1 through @60.<br />

Optional Control Words:<br />

@nn<br />

Control words are used in TEXTWRITER <strong>to</strong> control positioning of text. Any of <strong>the</strong>m, except @IN-<br />

DENT, @JUST and @WIDTH, may also be used in <strong>the</strong> <strong>PPL</strong> that may follow any command for data<br />

cleaning or producing brief reports. The control words all begin with “@” and many of <strong>the</strong>m are followed<br />

by a number giving a column location or o<strong>the</strong>r value. The number may follow directly after <strong>the</strong><br />

control word (@SKIP2) or may follow after an equal-sign (@SKIP=2). Although <strong>the</strong> argument directly<br />

following <strong>the</strong> control word is typically a number, it may be any expression (within paren<strong>the</strong>ses) that evaluates<br />

<strong>to</strong> a numeric value.<br />

The following control words remain in effect throughout <strong>the</strong> processing of a case unless <strong>the</strong>y are specifically<br />

changed or turned off:<br />

@INDENT @EQUAL<br />

@WIDTH @MISS<br />

@JUST @COMMAS<br />

@TRIM @PLACES<br />

Prefacing a control word with “NO” (@NOCOMMAS) turns off a prior setting.<br />

The following control words apply only <strong>to</strong> <strong>the</strong> variable expression or character string that directly<br />

follows:<br />

@nn @PAGE<br />

@PLUS @NEXT<br />

@MINUS @PARA<br />

@BEFORE @SKIP<br />

(“nn” represents a positive whole number.) These control words must be reissued <strong>to</strong> produce <strong>the</strong> desired<br />

results again.<br />

specifies a column location, measured from <strong>the</strong> start of <strong>the</strong> line. The column pointer moves <strong>to</strong> this location<br />

and <strong>the</strong> next character is written in this column.<br />

[PUT @10 'The initial T in this line is in column 10.'] produces:<br />

The initial T in this line is in column 10.<br />

123456789012345678901234567890<br />

(The additional line is a scale and not part of <strong>the</strong> output.)<br />

@BEFORE nn<br />

specifies a column location against which <strong>the</strong> next output element is right aligned. The text is written<br />

before <strong>the</strong> specified column. When no location is given, text is written before <strong>the</strong> current location of <strong>the</strong><br />

column pointer.<br />

(PUT @BEFORE30 'The period is in column 29.') produces:<br />

fn=file name nn=number cs=character string arg=keyword argument


11.34 TEXTWRITER: Report Writing<br />

The period is in column 29.<br />

123456789012345678901234567890<br />

This instruction:<br />

[ PUT 'The amount due is: '<br />

@BEFORE (' $' // CHARACTER (Amount.Due) ) ;<br />

produces:<br />

@COMMAS<br />

The amount due is: $12.56<br />

requests that all numeric values be printed with commas inserted every three digits (counting from <strong>the</strong><br />

decimal point <strong>to</strong> <strong>the</strong> left). This makes large numbers easier <strong>to</strong> read. @NOCOMMAS turns off<br />

@COMMAS.<br />

@EQUAL nn<br />

gives <strong>the</strong> column location of <strong>the</strong> equal-sign separating a variable name from its value. @EQUAL is used<br />

with <strong>the</strong> <strong>PPL</strong> instruction PUTL which puts a variable name (label) as well as its value in <strong>the</strong> line of text.<br />

Multiple locations may be specified:<br />

(PUTL @EQUAL10:30:50 Last First Age)<br />

The output line contains <strong>the</strong> three variables and <strong>the</strong>ir values, with <strong>the</strong> equal-signs in <strong>the</strong> specified<br />

columns:<br />

Last = Wilson First = Margaret Age = 23<br />

If <strong>the</strong> spacing is not adequate for <strong>the</strong> actual length of some of <strong>the</strong> character variables or <strong>the</strong> actual width<br />

of numeric variables, those long values print on <strong>the</strong> next line. @NOEQUAL turns off <strong>the</strong> @EQUAL<br />

specifications. An easy way <strong>to</strong> print all <strong>the</strong> variables in your file is <strong>to</strong> use <strong>the</strong> system variable .ALL. instead<br />

of <strong>the</strong> list of variable names.<br />

@INDENT nn<br />

@JUST<br />

specifies an additional number of columns <strong>to</strong> indent text from <strong>the</strong> current margin. The value after @IN-<br />

DENT is added <strong>to</strong> <strong>the</strong> current margin setting and text is indented that many columns from <strong>the</strong> left. This<br />

defines <strong>the</strong> new left margin of <strong>the</strong> report. (The current margin is that set by <strong>the</strong> identifier MARGIN in<br />

<strong>the</strong> TEXTWRITER command or, if MARGIN is not used, <strong>the</strong> default value 0.) @IN is an abbreviation<br />

for @INDENT. @NOINDENT resets <strong>the</strong> indentation <strong>to</strong> that specified by <strong>the</strong> MARGIN identifier or <strong>to</strong><br />

0 if MARGIN was not used.<br />

requests that <strong>the</strong> text be right justified as well as left justified — that is, <strong>the</strong> lines of text be aligned on<br />

<strong>the</strong> right edge as well as <strong>the</strong> left one. @NOJUST turns off right justification, overriding <strong>the</strong> JUSTIFY<br />

identifier in <strong>the</strong> TEXTWRITER command.<br />

@MISS 'cs'<br />

defines a character <strong>to</strong> print <strong>to</strong> indicate any of <strong>the</strong> three types of missing values. It is used when characters<br />

o<strong>the</strong>r than dashes are desired <strong>to</strong> indicate missing values. @MISS1, @MISS2 and @MISS3 specify different<br />

characters for <strong>the</strong> three individual types of missing values:<br />

[ PUT Student.ID Last.Name Course.No<br />

@MISS1 @MISS2<br />

arg=keyword argument fn=file name nn=number cs=character string


TEXTWRITER: Report Writing 11.35<br />

@NEXT First.Sec<br />

@NEXT Second.Sec ]<br />

@M, @M1, @M2 and @M3 are abbreviations. @NOMISS (or @NOMISS2, etc.) resets <strong>the</strong> specified<br />

missing character back <strong>to</strong> dashes.<br />

@MINUS nn<br />

@NEXT<br />

@PARA<br />

@PAGE<br />

requests that <strong>the</strong> column pointer move left <strong>the</strong> specified number of columns. The current column location<br />

minus <strong>the</strong> numeric argument yields <strong>the</strong> column in which text will print. @PLUS moves <strong>the</strong> pointer <strong>to</strong><br />

<strong>the</strong> right.<br />

moves <strong>the</strong> column pointer <strong>to</strong> <strong>the</strong> beginning of <strong>the</strong> next line. Subsequent text is written on this new line .<br />

When @NEXT is not used, text is written on <strong>the</strong> current line until it is full and <strong>the</strong>n text continues on <strong>the</strong><br />

next line.<br />

(Note that this is opposite <strong>to</strong> what occurs when PUT is used in <strong>PPL</strong> following commands o<strong>the</strong>r than TEX-<br />

TWRITER. Then, a new line is started for each PUT clause unless an “@” by itself is used <strong>to</strong> hold <strong>the</strong><br />

column pointer in <strong>the</strong> current line.)<br />

requests that a new paragraph start. Subsequent text prints on <strong>the</strong> next line, beginning in <strong>the</strong> fourth<br />

column.<br />

requests that subsequent text print on a new page.<br />

@PLACES nn<br />

gives <strong>the</strong> number of decimal places <strong>to</strong> use in printing numeric values. Numbers are rounded if <strong>the</strong>y have<br />

more than <strong>the</strong> specified number of decimal places, or zeros are added <strong>to</strong> pad <strong>the</strong> numbers if <strong>the</strong>y have<br />

fewer than <strong>the</strong> specified number of places. @PL is an abbreviation. @NOPLACES turns off <strong>the</strong> prior<br />

places specification. Numbers <strong>the</strong>n print with <strong>the</strong>ir actual number of decimal places. (@PLACES0 or<br />

@PLACES=0 should be used <strong>to</strong> specify no decimal places or decimal point in <strong>the</strong> output.)<br />

@PLUS nn<br />

requests that <strong>the</strong> column pointer move right <strong>the</strong> specified number of columns. @MINUS moves <strong>the</strong><br />

pointer <strong>to</strong> <strong>the</strong> left.<br />

@SKIP nn<br />

@SPREAD<br />

@TRIM<br />

specifies <strong>the</strong> number of lines <strong>to</strong> skip before printing text. When no number follows @SKIP, one line is<br />

skipped. @SK is an abbreviation.<br />

is assumed and causes a single blank <strong>to</strong> be inserted between variables. @NOSPREAD causes that blank<br />

<strong>to</strong> be omitted.<br />

requests that lead and trailing blanks be trimmed from character values before <strong>the</strong>y are positioned in <strong>the</strong><br />

text. This is assumed by TEXTWRITER and need not be specified explicitly. @NOTRIM turns trimming<br />

off. Untrimmed, a character string will occupy as many columns as its defined length, even though<br />

it may be only one character long or entirely blank.<br />

fn=file name nn=number cs=character string arg=keyword argument


11.36 TEXTWRITER: Report Writing<br />

@WIDTH nn<br />

defines <strong>the</strong> output line width of <strong>the</strong> report. It overrides any previous output width settings defined by <strong>the</strong><br />

command OUTPUT.WIDTH or <strong>the</strong> identifier WIDTH in TEXTWRITER. The argument for @WIDTH<br />

may range from 2 <strong>to</strong> 400.<br />

@NOWIDTH turns off <strong>the</strong> line width setting, which <strong>the</strong>n reverts <strong>to</strong> that defined by <strong>the</strong> WIDTH identifier<br />

or <strong>the</strong> OUTPUT.WIDTH command. @NOWIDTH resets <strong>the</strong> line width <strong>to</strong> <strong>the</strong> original output width.<br />

TEXTWRITER and POSTSCRIPT<br />

Also Required:<br />

POSTSCRIPT<br />

Requires additional identifiers following <strong>the</strong> TEXTWRITER text.<br />

JUSTIFY, POSTSCRIPT, PORTRAIT,<br />

LEFT.EDGE 2., RIGHT.EDGE 2.,<br />

FONT1 TIMES 12, FONT2 TIMES BOLD 12 $<br />

is required unless <strong>the</strong> command is included within a PostScript block. A PostScript block begins with a<br />

POSTSCRIPT command and ends with a POSTSCRIPT.CLOSE command.<br />

Optional Identifiers for PostScript Output:<br />

BOTTOM.EDGE nn<br />

sets <strong>the</strong> bot<strong>to</strong>m edge <strong>the</strong> specified number of inches from <strong>the</strong> bot<strong>to</strong>m of <strong>the</strong> page. If <strong>the</strong>re is no argument,<br />

<strong>the</strong> bot<strong>to</strong>m edge is reset <strong>to</strong> its beginning value. This is usually 1 inch for all edges unless changed in a<br />

POSTSCRIPT.SETUP command. The measurements can be fractional.<br />

FONT arg arg nn<br />

provides <strong>the</strong> name, type, and point size for <strong>the</strong> fonts <strong>to</strong> be used. A character string in quotes can replace<br />

<strong>the</strong> first two arguments <strong>to</strong> specify an alternate font not supported in <strong>the</strong> keywords. Available keyword<br />

combinations are:<br />

TIMES HELVETICA COURIER<br />

TIMES BOLD HELVETICA BOLD COURIER BOLD<br />

TIMES ITALIC HELVETICA OBLIQUE COURIER OBLIQUE<br />

TIMES BOLDITALIC HELVETICA BOLDOBLIQUE COURIER BOLDOBLIQUE<br />

FONT1-FONT9 arg arg nn<br />

provides up <strong>to</strong> 9 different font/type/size combinations for use in <strong>the</strong> command.<br />

LANDSCAPE<br />

specifies that <strong>the</strong> orientation of <strong>the</strong> page is 11 wide by 8 1/2 high<br />

LEFT.EDGE nn<br />

PORTRAIT<br />

specifies <strong>the</strong> starting location from <strong>the</strong> left edge in inches. The number may be fractional. If no number<br />

is supplied <strong>the</strong> left edge is reset <strong>to</strong> <strong>the</strong> beginning value.<br />

specifies that <strong>the</strong> orientation of <strong>the</strong> pages is 8 1/2 inches wide by 11 inches high<br />

arg=keyword argument fn=file name nn=number cs=character string


TEXTWRITER: Report Writing 11.37<br />

RIGHT.EDGE nn<br />

specifies <strong>the</strong> number of inches that <strong>the</strong> output should be from <strong>the</strong> right hand edge of <strong>the</strong> paper. The number<br />

may be fractional. If it is used without an argument it is reset <strong>to</strong> <strong>the</strong> beginning value.<br />

‘SHOWPAGE and NO SHOWPAGE<br />

SHOWPAGE is assumed. NO SHOWPAGE is used when you wish <strong>to</strong> put more than one command on<br />

a single sheet of paper.<br />

TOP.EDGE nn<br />

specifies <strong>the</strong> starting location of <strong>the</strong> prin<strong>to</strong>ut from <strong>the</strong> <strong>to</strong>p of <strong>the</strong> paper in inches which may be fractional.<br />

If no number is supplied, it is reset <strong>to</strong> <strong>the</strong> beginning value.<br />

Optional Control Words:<br />

@FONT1-@FONT9<br />

causes an immediate change in <strong>the</strong> font that is used. That font remains in effect until <strong>the</strong> next FONTn<br />

control word is processed.<br />

@CINCH.U=nn<br />

@RINCH=nn<br />

centers and also underlines <strong>the</strong> string at inch nn.<br />

puts <strong>the</strong> TEXTWRITER text right justified at <strong>the</strong> specified location. This works well for numbers if <strong>the</strong><br />

number of decimal places is controlled.<br />

@RINCH.U=nn<br />

@LINCH=nn<br />

right justifies and underlines <strong>the</strong> text.<br />

left justifies text at <strong>the</strong> specified location.<br />

@LINCH.U=nn<br />

@PINCH=nn<br />

left justifies and underlines <strong>the</strong> text<br />

centers <strong>the</strong> text around a specified lineup character, which is assumed <strong>to</strong> be a decimal point. This is good<br />

for writing a column of fractional numbers when <strong>the</strong> number of decimal places differs.<br />

@PINCH.U=nn<br />

like PINCH but it also underlines.<br />

@PINCH.CHAR='c'<br />

@FLUSH<br />

provides an alternate character such as '=' <strong>to</strong> be used in <strong>the</strong> pinch lineups. If no argument, it reverts <strong>to</strong><br />

<strong>the</strong> default '.' .<br />

flushes <strong>the</strong> current TEXTWRITER buffer without also moving <strong>to</strong> <strong>the</strong> next line. The effect of flushing<br />

turns off temporary options like @CINCH or @UNDERLINE. It does not affect color settings.<br />

fn=file name nn=number cs=character string arg=keyword argument


11.38 TEXTWRITER: Report Writing<br />

@X1<br />

@X2<br />

@Y1<br />

@Y2<br />

@MOVETO<br />

@DRAW.H<br />

@DRAW.V<br />

s<strong>to</strong>res a value that is <strong>the</strong> current left margin. If <strong>the</strong>re is an argument as @X1=3.4, that inch location is<br />

s<strong>to</strong>red. in <strong>the</strong> X1 variable.<br />

s<strong>to</strong>res a value that is <strong>the</strong> current right margin. An argument such as @X2=8.3 s<strong>to</strong>res that inch value in<br />

<strong>the</strong> X2 variable.<br />

s<strong>to</strong>res a value that is <strong>the</strong> current <strong>to</strong>p margin. If an argument is supplied, that value is s<strong>to</strong>red in <strong>the</strong> Y1<br />

variable.<br />

s<strong>to</strong>res a value that is <strong>the</strong> current bot<strong>to</strong>m margin. If no argument is supplied, that argument is s<strong>to</strong>red in<br />

<strong>the</strong> Y2 variable.<br />

sets <strong>the</strong> current location <strong>to</strong> <strong>the</strong> X1/Y1 position. The next text string will begin at that location.<br />

draws a horizontal line from <strong>the</strong> X1/Y1 position <strong>to</strong> <strong>the</strong> X2/Y1 position.<br />

draws a vertical line from <strong>the</strong> X1/Y1 position <strong>to</strong> <strong>the</strong> X1/Y2 position.<br />

@DRAW.U=nn<br />

underlines <strong>the</strong> current line from X1 <strong>to</strong> X2. nn is <strong>the</strong> amount below <strong>the</strong> current line in units of 72nds of<br />

an inch where <strong>the</strong> line should be drawn. If no argument is given, <strong>the</strong> assumed value is 3.<br />

@DOWN=nn<br />

@UP=nn<br />

@TOP<br />

@BOTTOM<br />

moves down that many lines. The actual distance depends on <strong>the</strong> point size of <strong>the</strong> font and <strong>the</strong> leading<br />

(<strong>the</strong> space between <strong>the</strong> lines). If nn is not specified, 1 is assumed.<br />

moves up that many lines. The actual distance depends on <strong>the</strong> point size and <strong>the</strong> leading. If nn is not<br />

specified, 1 is assumed.<br />

moves <strong>to</strong> <strong>the</strong> first line, just below <strong>the</strong> <strong>to</strong>p margin.<br />

moves <strong>to</strong> <strong>the</strong> last line, just above <strong>the</strong> bot<strong>to</strong>m margin.<br />

@LEADING=nn<br />

specifies <strong>the</strong> space between lines. LEADING is usually set <strong>to</strong> 1/72 of an inch. LEADING=3, increases<br />

<strong>the</strong> space <strong>to</strong> 3/72 of an inch. A larger LEADING improves <strong>the</strong> readability of text when a large point size<br />

is used.<br />

arg=keyword argument fn=file name nn=number cs=character string


TEXTWRITER: Report Writing 11.39<br />

@LINEWIDTH=nn<br />

specifies <strong>the</strong> width of <strong>the</strong> lines and boxes that are drawn. LINEWIDTH is usually set at .5. <strong>Inc</strong>reasing<br />

<strong>the</strong> LINEWIDTH causes bolder looking lines. @LINEWITH with no argument resets it <strong>to</strong> <strong>the</strong> original<br />

value of .5.<br />

@UNDERLINE<br />

begin underlining and continue until a subsequent @SKIP, @NEXT or @PAGE control word ends <strong>the</strong><br />

current chunk of output. @NOUNDERLINE can be also be used <strong>to</strong> end underlining.<br />

@NOUNDERLINE<br />

end of underlined text.<br />

The following control words can be used <strong>to</strong> control <strong>the</strong> color of subsequent printing. Color stays in effect until it<br />

is changed. @NOCOLOR is equivalent <strong>to</strong> @BLACK<br />

@RED<br />

@ORANGE<br />

@YELLOW<br />

@GREEN<br />

@BLUE<br />

@VIOLET<br />

@BLACK<br />

@NOCOLOR<br />

Color can also be changed by using <strong>the</strong> POSTSCRIPT.SETUP command <strong>to</strong> define fonts with specific<br />

colors. When <strong>the</strong> font is changed, <strong>the</strong> specified color will be used.<br />

fn=file name nn=number cs=character string arg=keyword argument


12<br />

P-<strong>STAT</strong> MACROS<br />

A macro is a named collection of text that can be inserted at any point in a P-<strong>STAT</strong> run. It may contain an entire<br />

command or a series of many commands. It may contain a fragment of a command, subcommand or data record.<br />

A macro can be changed dynamically at execution by passing keyword or positional arguments. This chapter<br />

covers:<br />

1. Macro format<br />

2. Activating a macro<br />

3. Types of macros<br />

4. Keyword arguments<br />

5. Positional arguments<br />

6. Using arguments<br />

7. Default values for arguments<br />

8. Instream macros<br />

9. multi-command macros<br />

10. SUBFILES command<br />

11. DIALOG command<br />

12.1 MACRO FORMAT<br />

MACRO ABC $<br />

contents of <strong>the</strong> macro<br />

ENDMACRO $<br />

A macro has three elements. The MACRO command supplies <strong>the</strong> name of <strong>the</strong> macro and may also have argument<br />

information. It is, in effect, <strong>the</strong> macro header. The contents or body of <strong>the</strong> macro may have an indefinite number<br />

of records (including none). The ENDMACRO command completes <strong>the</strong> macro and must be <strong>the</strong> only thing on its<br />

record. The ENDMACRO command can have a statement label such as:<br />

EXIT: ENDMACRO $<br />

12.2 Types of Macros<br />

There are two types of macros, BLOCK macros and INSTREAM macros. A BLOCK macro contains one or more<br />

full-fledged commands which may have subcommands and data records. It is invoked by using <strong>the</strong> RUN<br />

command.<br />

An IN STREAM macro can contain whatever one wishes. Its contents are inserted in<strong>to</strong> a command or subcommand<br />

wherever !!macname or !!(macname) is found (where macname is <strong>the</strong> name of <strong>the</strong> macro).<br />

Both types of macro can have positional or keyword arguments, and can be defined with default values for<br />

those arguments. Alternatively, a macro can be defined without any arguments. The commands within a BLOCK<br />

macro can contain INSTREAM macro calls. INSTREAM macros can contain o<strong>the</strong>r instream macro calls.


12.2 P-<strong>STAT</strong> MACROS<br />

The following macro contains only one line. Since it does not contain a set of complete commands, it could<br />

not be used as a BLOCK macro but could, for example, be inserted in<strong>to</strong> a SURVEY subcommand. It would be<br />

called by using !!vvv in <strong>the</strong> subcommand. The characters 'age by income ' would replace !!vvv as <strong>the</strong> subcommand<br />

is read.<br />

MACRO vvv $<br />

age by income<br />

ENDMACRO $<br />

This next macro contains a block of commands. It could not be used by an instream reference, since that<br />

would insert several commands within <strong>the</strong> command or subcommand that contained <strong>the</strong> !!rrr, which would cause<br />

syntax errors galore. The correct way <strong>to</strong> use it is by saying RUN rrr $. Its first three lines are comments.<br />

MACRO rrr $<br />

/* example 1 of macro rrr. */<br />

/* no use of arguments. */<br />

/* no use of instream macro calls. */<br />

CORRELATE data1 [ KEEP age income education ], OUT work1$<br />

LIST work1 $<br />

ENDMACRO $<br />

12.3 S<strong>to</strong>ring and Activating Macros<br />

Macros can be s<strong>to</strong>red as ordinary ASCII (or EBCDIC) files which can be edited by an external edi<strong>to</strong>r. Within<br />

P-<strong>STAT</strong>’s edi<strong>to</strong>r each macro appears as a single command even when it is a block macro containing many commands.<br />

The body of <strong>the</strong> macro is s<strong>to</strong>red as data records for <strong>the</strong> macro command. A macro with no body will<br />

appear in <strong>the</strong> edi<strong>to</strong>r with a single data record, <strong>the</strong> ENDMACRO command.<br />

A macro must first be activated before it can be used. Activating is done by processing <strong>the</strong> definition in <strong>the</strong><br />

normal course of processing P- <strong>STAT</strong> commands. When a macro is activated, information about its arguments is<br />

acquired and <strong>the</strong> macro is placed, ready <strong>to</strong> be used, on a temporary file. The currently active macros can be seen<br />

by using <strong>the</strong> SHOWMACROS$ command.<br />

If <strong>the</strong> macro is entered from <strong>the</strong> terminal it is active as soon as <strong>the</strong> ENDMACRO $ command is processed.<br />

Macros s<strong>to</strong>red in an external ASCII file are activated by a TRANSFER command. Macros that are s<strong>to</strong>red in<br />

P-<strong>STAT</strong>’s edit file format are activated au<strong>to</strong>matically when <strong>the</strong> OLD.EDIT.FILE command is executed. In <strong>the</strong><br />

edi<strong>to</strong>r macros can be changed by editing <strong>the</strong> data records. The changed macro is activated by using <strong>the</strong> X (eXecute)<br />

edit instruction.<br />

After a block macro is active (i.e., its definition has been read by P-<strong>STAT</strong>), it can be executed by using <strong>the</strong><br />

RUN command. The RUN command executes <strong>the</strong> entire series of commands defined in <strong>the</strong> macro. For example:<br />

RUN Sales $<br />

An instream macro is executed when <strong>the</strong> command that references it is used. For example, macro VVV defined<br />

above could be executed by:<br />

SURVEY Psfile;<br />

!!vvv ;<br />

$<br />

Figure 12.1 illustrates a command stream that activates three macros. Two are instream macros and one is a<br />

block macro, which uses <strong>the</strong> o<strong>the</strong>r two. The block macro executes a CORRELATE command and a LIST command.<br />

The CORRELATE command uses <strong>the</strong> first instream macro <strong>to</strong> provide <strong>the</strong> input file name and <strong>the</strong> second<br />

instream macro <strong>to</strong> select variables.<br />

The final step in Figure 12.1 is <strong>the</strong> RUN command which calls <strong>the</strong> first macro. The block macro <strong>the</strong>n references<br />

<strong>the</strong> two instream macros. It does not matter in what order <strong>the</strong> macros are activated.


P-<strong>STAT</strong> MACROS 12.3<br />

__________________________________________________________________________<br />

Figure 12.1 Activating Three Macros<br />

MACRO rrr $<br />

/* example 2 of macro rrr, using instream macros. */<br />

/* this macro correlates some variables */<br />

/* and <strong>the</strong>n lists <strong>the</strong> result. */<br />

CORRELATE !!aaa [ KEEP !!bbb ], OUT work1 $<br />

LIST work1 $<br />

ENDMACRO $<br />

MACRO aaa$<br />

data1<br />

ENDMACRO $<br />

MACRO bbb$<br />

age income education<br />

ENDMACRO $<br />

RUN rrr $<br />

__________________________________________________________________________<br />

12.4 Comments Within a Macro<br />

Comments can be used freely in macros. They are particularly useful at <strong>the</strong> beginning of a macro <strong>to</strong> document<br />

what <strong>the</strong> macro does, when it was last changed, who maintains it, and so forth.<br />

Comments start with a /* and end with a */. For example:<br />

/* this macro correlates some variables<br />

and <strong>the</strong>n lists <strong>the</strong> result */<br />

is a valid comment, as is<br />

/*---------*/<br />

/* comment */<br />

/*---------*/<br />

12.5 Macros With Arguments<br />

The macros shown so far have not had any arguments. The only way <strong>to</strong> generalize such a macro was by calling<br />

o<strong>the</strong>r macros. That is perfectly legal but often, especially with block macros, generalizing is better done by using<br />

arguments. There are two types of notation for defining macro arguments: keyword and positional.<br />

The paren<strong>the</strong>ses after a macro name define its arguments, if any. These arguments are known as DUMMY<br />

ARGUMENTS. (In some languages <strong>the</strong>y are known as formal parameters.) When a macro is CALLED, each<br />

occurrence of a dummy argument in <strong>the</strong> body of <strong>the</strong> macro is replaced by <strong>the</strong> associated ARGUMENT VALUE<br />

in <strong>the</strong> call. For example:<br />

MACRO rrr ( file, vars) $<br />

defines a macro named rrr with two keyword argument: file and vars. Figure 12.2 illustrates a version of macro<br />

rrr that has <strong>the</strong> same effect as <strong>the</strong> macro in Figure 12.1 except that <strong>the</strong> names of <strong>the</strong> P-<strong>STAT</strong> system file and <strong>the</strong><br />

variables <strong>to</strong> be used are passed <strong>to</strong> <strong>the</strong> macro in <strong>the</strong> RUN command ra<strong>the</strong>r than from <strong>the</strong> instream macros. However,<br />

using arguments is a simpler way <strong>to</strong> allow <strong>the</strong> macro <strong>to</strong> be used with differing filenames and sets of variables.


12.4 P-<strong>STAT</strong> MACROS<br />

__________________________________________________________________________<br />

Figure 12.2 Block Macro With Keyword Arguments<br />

. MACRO rrr ( file, vars) $<br />

/* example 3 of macro rrr. */<br />

/* this macro correlates some variables */<br />

/* and <strong>the</strong>n lists <strong>the</strong> result. */<br />

/* it uses KEYWORD arguments instead */<br />

/* of calls <strong>to</strong> o<strong>the</strong>r macros. */<br />

CORRELATE &file [ KEEP &vars ], OUT work1 $<br />

LIST work1 $<br />

ENDMACRO $<br />

RUN rrr ( data1, age income education ) $<br />

__________________________________________________________________________<br />

The ‘&’ is used <strong>to</strong> identify keywords <strong>to</strong> be replaced within <strong>the</strong> macro. Since ‘file’ was defined as <strong>the</strong> first<br />

keyword dummy argument, every use of &file within <strong>the</strong> macro is replaced by <strong>the</strong> first argument value. Similarly,<br />

&vars is replaced by <strong>the</strong> second argument value. &(file) and &(vars) can also be used; <strong>the</strong>se specify <strong>the</strong> keyword<br />

more precisely. Argument values are separated by commas.<br />

( data1, age income education )<br />

Thus data1 is a single argument and since it is <strong>the</strong> first argument it is associated with <strong>the</strong> keyword “file”. The<br />

second argument “vars” receives <strong>the</strong> entire string “age income education”. The following is what is actually<br />

executed:<br />

CORRELATE data1 [ KEEP age income education ], OUT work1 $<br />

LIST work1 $<br />

Figure 12.3 illustrates <strong>the</strong> same macro using positional arguments. This is done by providing <strong>the</strong> number of<br />

arguments in <strong>the</strong> paren<strong>the</strong>sis after <strong>the</strong> macro name.<br />

MACRO rrr ( 2 ) $<br />

When positional arguments are used, <strong>the</strong> body of <strong>the</strong> macro contains <strong>the</strong> position preceded by <strong>the</strong> “&”. Thus wherever<br />

&1 or &(1) is found within <strong>the</strong> macro <strong>the</strong> first argument value found in <strong>the</strong> call will be used.<br />

__________________________________________________________________________<br />

Figure 12.3 Block Macro With Positional Arguments<br />

MACRO rrr ( 2 ) $<br />

/* example 4 of macro rrr. */<br />

/* <strong>the</strong> same thing using POSITIONAL arguments */<br />

CORRELATE &1 [ KEEP &2 ], OUT work1 $<br />

LIST work1 $<br />

ENDMACRO $<br />

RUN rrr ( data1, age income education ) $<br />

__________________________________________________________________________


P-<strong>STAT</strong> MACROS 12.5<br />

12.6 Using Arguments<br />

There are a few simple rules <strong>to</strong> follow when using arguments in a macro.<br />

1. A keyword for a dummy argument should start with a letter, contain letters, digits and decimal<br />

points, and have no more than 16 characters. For example, FILE and VARS in Figure 12.2.<br />

2. Each such keyword should be found at least once in <strong>the</strong> macro, preceded by an ampersand (&).<br />

3. Similarly, if macro ppp(4)$ were used, <strong>the</strong> macro should contain at least one usage each of &1, &2,<br />

&3 and &4.<br />

4. The keyword or integer can be within paren<strong>the</strong>ses, like &(file) or &(2).<br />

5. There can be as many as 150 keywords or positional arguments. I.e., macro xxx(150)$ is possible.<br />

The order in which <strong>the</strong>y are found in <strong>the</strong> body of <strong>the</strong> macro does not matter.<br />

6. Usages of <strong>the</strong> first &keyword are replaced by <strong>the</strong> first argument value, usages of <strong>the</strong> second &keyword<br />

by <strong>the</strong> second argument value, etc.<br />

7. Positional macros behave similarly. Usages of &1 are replaced by <strong>the</strong> first argument value, usages<br />

of &2 by <strong>the</strong> second argument value, etc.<br />

8. The number of dummy arguments given in <strong>the</strong> definition must be <strong>the</strong> same as <strong>the</strong> number of argument<br />

values supplied when <strong>the</strong> macro is called.<br />

There are similar rules for <strong>the</strong> actual arguments used when <strong>the</strong> macro is invoked:<br />

1. Argument values are separated by commas. Argument values of 11 and 22 for macro zzz would be<br />

conveyed by saying<br />

RUN ZZZ (11,22)$ or !!ZZZ(11,22) or !!(ZZZ)(11,22)<br />

2. Argument values can be quoted. This is necessary when <strong>the</strong> value contains a comma or right paren<strong>the</strong>sis<br />

or a form of quote. Thus, "john's house" is valid, as is 'xx"xx'. Ei<strong>the</strong>r quote(") or apostrophe<br />

(') can be used unless that character is part of <strong>the</strong> value, in which case <strong>the</strong> o<strong>the</strong>r should be used <strong>to</strong><br />

bound <strong>the</strong> value. Suppose you want <strong>to</strong> pass, literally, 'title text' <strong>to</strong> a macro. The argument should be<br />

“'title text'”<br />

3. A quoted value can be empty, as in !!abc( “”). This is called a NULL value but it is none<strong>the</strong>less a<br />

value. The associated &keyword in <strong>the</strong> macro would simply vanish. If !!abc(" ") were used, <strong>the</strong><br />

one blank would replace <strong>the</strong> &keyword.<br />

4. If quoted, <strong>the</strong> value that is used is <strong>the</strong> contents of <strong>the</strong> quotes. If not quoted, it is <strong>the</strong> first nonblank<br />

through <strong>the</strong> last nonblank before <strong>the</strong> comma or right paren<strong>the</strong>sis. Consider <strong>the</strong>se macro calls both<br />

of which do exactly <strong>the</strong> same thing:<br />

!!ppl ( age income education )<br />

!!ppl ( 'age income education' )<br />

Both are evaluated as having one argument. Since <strong>the</strong> defining quotes are stripped as <strong>the</strong> argument<br />

is moved in<strong>to</strong> place. However, consider <strong>the</strong>se macro calls:<br />

!!ppl ( IF age missing, DELETE )<br />

!!ppl ( 'IF age missing, DELETE' )<br />

The first will be evaluated as having two arguments, because of <strong>the</strong> comma. The second has but one<br />

argument. Put a value within quotes if it contains commas, etc.<br />

5. An argument value can, in one situation, have several actual values, as in<br />

!!abc ( 'age' 'income' 'education' )


12.6 P-<strong>STAT</strong> MACROS<br />

Each of those values must be quoted. These are used when a subcommand record contains <strong>the</strong> associated<br />

&keyword and nothing else. That record is discarded and, in its place, a subcommand record<br />

is written for each of <strong>the</strong> values. The above would generate three records: one containing age, one<br />

containing income, and one containing education.<br />

6. An argument can be omitted. !!zzz( , ) has two omitted arguments, which is allowed only when <strong>the</strong><br />

macro was defined with default values <strong>to</strong> be used when a call omits a value. Defaults are described<br />

below.<br />

7. P(3) or such can be used as an argument value. If P(3) is set <strong>to</strong> 123.456, those seven characters constitute<br />

<strong>the</strong> resulting argument value. In o<strong>the</strong>r words, <strong>the</strong> internal double precision binary number<br />

currently in P(3) is formatted in<strong>to</strong> ascii characters, and those ascii characters serve as <strong>the</strong> actual argument.<br />

The value should not be missing.<br />

8. #N or ##TOTAL or such can be used as an argument value. The scratch variable can be numeric or<br />

character. The actual argument is <strong>the</strong> formatted ascii representation of a numeric scratch variable,<br />

or <strong>the</strong> current character value of a character scratch variable. The value should not be missing.<br />

__________________________________________________________________________<br />

Figure 12.4 Macro With Positional Arguments and Default Values<br />

MACRO sss ( 2 ) ( age, income )$<br />

BANNER &1, STUB &2;<br />

ENDMACRO $<br />

SURVEY data2;<br />

!!sss (,) which becomes: BANNER age, STUB income;<br />

$<br />

_________________________________________________________________________.<br />

12.7 Default Values for Arguments<br />

When a macro is defined with arguments, it can contain default values <strong>to</strong> be used when a call omits one or more<br />

values. Defaults are placed in paren<strong>the</strong>ses after <strong>the</strong> keyword or positional paren<strong>the</strong>ses. Since <strong>the</strong>y are <strong>to</strong> be used<br />

as argument values when necessary, <strong>the</strong>ir syntax is <strong>the</strong> same as that of <strong>the</strong> argument values in a macro call.<br />

MACRO abc ( fff, vvv ) ( work1, )$<br />

LIST &fff [ KEEP &vvv ]$<br />

ENDMACRO$<br />

Macro abc has two arguments. A default is supplied for <strong>the</strong> first argument. There is no supplied default for<br />

<strong>the</strong> second argument. A default value will be used when <strong>the</strong> call does not supply a value for <strong>the</strong> argument. For<br />

example:<br />

RUN abc( , age income )$<br />

has no initial argument because <strong>the</strong>re are only blanks before <strong>the</strong> initial comma. Since <strong>the</strong>re is no first value <strong>to</strong><br />

replace &fff, a default for that argument must have been included in <strong>the</strong> definition and will be used now. The<br />

expansion is:<br />

LIST work1 [ KEEP age income ]$<br />

The defaults are <strong>to</strong>tally ignored if <strong>the</strong> call has actual values for each argument. The existence of defaults do<br />

not change <strong>the</strong> need for a call <strong>to</strong> indicate <strong>the</strong> presence or absence of its argument or arguments. Given <strong>the</strong> macro<br />

above, <strong>the</strong>se calls are errors:<br />

RUN abc ( age income )$ that is just one value, and<br />

<strong>the</strong> macro has two arguments.


P-<strong>STAT</strong> MACROS 12.7<br />

Values are separated by commas.<br />

RUN abc ( age income, )$ now we have two values, <strong>the</strong><br />

second being explicitly omitted,<br />

but <strong>the</strong> definition has no<br />

default for <strong>the</strong> second value.<br />

Figure 12.4 illustrates a macro with 2 positional arguments. Both have default values provided in <strong>the</strong> definition.<br />

Because both values are available <strong>the</strong> macro can be used with no values, one value or both values. A<br />

definition can also have omitted default values. The following macro has 5 positional arguments. Defaults are<br />

provided for &1 and &3 but not for &2, &4 and &5.<br />

MACRO xxx (5) ( aaa,, ccc,, )$<br />

Consider <strong>the</strong> following instream macro call.<br />

!!mmm ( abc, "", ).<br />

It has three values. Value 2 is null, but it is still regarded as a value. Only argument 3 would invoke a default<br />

value.<br />

___________________________________________________________________________<br />

Figure 12.5 Macros Can Call Macros<br />

.<br />

MACRO aaa $<br />

1 2 3<br />

!!bbb<br />

7 8 9<br />

MACEND $<br />

MACRO bbb $<br />

11 12 13<br />

!!ccc<br />

MACEND $<br />

MACRO ccc $<br />

101 102 103<br />

MACEND$<br />

MAKE work1, NV 3;<br />

!!aaa<br />

$<br />

LIST work1$<br />

1 2 3 from aaa<br />

11 12 13 from bbb<br />

101 102 103 from ccc<br />

7 8 9 from aaa<br />

___________________________________________________________________________<br />

12.8 Nested Instream Macros<br />

Macros can call macros. Figure 12.5 illustrates <strong>the</strong> use of instream macros in which macro aaa is called within<br />

a MAKE command.


12.8 P-<strong>STAT</strong> MACROS<br />

MAKE work1, NV 3;<br />

!!aaa<br />

$<br />

Macro aaa contains 3 data records. The second record activates ano<strong>the</strong>r instream macro, bbb.<br />

MACRO aaa$<br />

1 2 3<br />

!!bbb<br />

7 8 9<br />

ENDMACRO $<br />

Macro bbb contains 2 data records, one of which is a call <strong>to</strong> macro ccc.<br />

MACRO ccc$<br />

101 102 103<br />

ENDMACRO $<br />

Macro ccc contains a single data record. Since it does not have a call <strong>to</strong> ano<strong>the</strong>r instream macro, <strong>the</strong> command<br />

is completed with records taken from <strong>the</strong> 3 instream macros in <strong>the</strong> order in which <strong>the</strong> records are processed.<br />

There is no rule that prohibits macros from recursion. For example macro ccc could call macro aaa. This will<br />

cause <strong>the</strong> MAKE command <strong>to</strong> continue until it runs out of disk space.<br />

12.9 Instream Macros in a Command<br />

A command can have many instream macro calls. They can occur anywhere after <strong>the</strong> command name and<br />

before <strong>the</strong> ending dollar or semicolon. The command text is scanned from its beginning for macro calls after each<br />

macro insertion. Therefore, its macros can call o<strong>the</strong>r macros indefinitely.<br />

The characters of an instream macro record are inserted through <strong>the</strong> right-most non blank, possibly with an<br />

additional padding blank when <strong>the</strong> record has less than 80 characters. These insertions may extend a command<br />

by hundreds or even thousands of characters. That causes no problems as long as <strong>the</strong> command does not exceed<br />

its maximum size, which in <strong>the</strong> Whopper/2 version of P-<strong>STAT</strong> is 50,000 characters.<br />

12.10 Instream Macros in Subcommands<br />

A single subcommand can also have many instream macro calls. Subcommand processing, however, is done differently<br />

due <strong>to</strong> <strong>the</strong> limit of 80 characters in a single subcommand record. There are two forms of subcommand<br />

macro expansion. The first occurs when <strong>the</strong> macro call is NOT <strong>the</strong> only thing in <strong>the</strong> record. For example,<br />

BANNER !!aaa, STUB !!bbb;<br />

The expansions are done in an array that can hold 800 bytes, which is ten times <strong>the</strong> size of a subcommand<br />

record. As with commands, <strong>the</strong> array is re-scanned after each insertion so that nested macros are honored. When<br />

no more macros are found, <strong>the</strong> array is written in up-<strong>to</strong>-80 character chunks <strong>to</strong> <strong>the</strong> subcommand buffer for use by<br />

<strong>the</strong> command that is currently active. Up-<strong>to</strong>-80 means that each chunk ends with at a reasonable point at or before<br />

80. Reasonable means breaking at a blank, comma, right paren<strong>the</strong>sis, etc.<br />

Different rules prevail when <strong>the</strong> macro call is <strong>the</strong> only thing on <strong>the</strong> subcommand record. First, that record<br />

vanishes. Instead, a subcommand record is generated for each line of <strong>the</strong> macro. However, what about arguments<br />

on one of <strong>the</strong>se lines?<br />

If <strong>the</strong> positional argument in <strong>the</strong> macro (&3 or such) is not <strong>the</strong> only thing on <strong>the</strong> line, <strong>the</strong> record is expanded<br />

by replacing <strong>the</strong> argument with its value. This continues until <strong>the</strong> line has no more arguments. Then it is broken<br />

in<strong>to</strong> 80's as described above. If <strong>the</strong> positional argument in <strong>the</strong> macro (&3 or such) IS <strong>the</strong> only thing on <strong>the</strong> line, a<br />

subcommand record is generated for EACH of <strong>the</strong> argument's non-null values.<br />

In an instream macro, <strong>the</strong> default is <strong>to</strong> insert <strong>the</strong> characters through <strong>the</strong> right most non-blank AND THEN<br />

ADD ONE BLANK. Consider:


P-<strong>STAT</strong> MACROS 12.9<br />

MACRO vars $<br />

age<br />

income<br />

ENDMACRO $<br />

An instream usage might be:<br />

LIST x[ KEEP !!(vars) ] $<br />

Is <strong>the</strong> !!(vars) replaced by 9 characters (ageincome), or by 11 characters (age income ), or by 160 characters<br />

(age + 77 blanks and income + 74 blanks)? In o<strong>the</strong>r words, do we PAD <strong>the</strong> records as <strong>the</strong>y are inserted? If so, by<br />

how much?<br />

A run begins with <strong>the</strong> padding default set <strong>to</strong> one. However, <strong>the</strong>re are several ways <strong>to</strong> change <strong>the</strong> default.<br />

MACRO.PAD n$ is a command that specifies <strong>the</strong> padding default for macros activated subsequently. N can be<br />

zero (which sets a no-pad status) or some larger integer, like 1 or 80.<br />

MACRO XXX (file), PAD n $ causes <strong>the</strong> pad default for that specific macro <strong>to</strong> be n (also 0 <strong>to</strong> 80) ra<strong>the</strong>r than<br />

<strong>the</strong> current MACRO.PAD setting.<br />

___________________________________________________________________________<br />

Figure 12.6 Instream Macros in Subcommand Records.<br />

MACRO sss $<br />

STUB Q1 TO Q43<br />

ENDMACRO $<br />

MACRO bbb $<br />

BANNER Age <strong>Inc</strong>ome Education<br />

ENDMACRO $<br />

SURVEY work1, ECHO; produces<br />

!!sss, !!bbb; STUB Q1 TO Q43, BANNER Age <strong>Inc</strong>ome Education;<br />

$<br />

SURVEY work1, ECHO; produces<br />

!!sss, STUB Q1 TO Q43,<br />

!!bbb; BANNER Age <strong>Inc</strong>ome Education;<br />

$<br />

___________________________________________________________________________<br />

A specific record in an instream macro can contain a padding specification which takes precedence over any<br />

pad default. This is done using back-slashes. If a record in a macro ends with two back-slashes, like<br />

age \\<br />

<strong>the</strong> characters up <strong>to</strong> (but not including) <strong>the</strong> back-slashes will be inserted. The above record will cause <strong>the</strong> 4 characters<br />

‘age ’<strong>to</strong> be inserted.<br />

The same 4-character insertion would happen with<br />

age \\ /*a comment*/<br />

Ei<strong>the</strong>r of <strong>the</strong>se would insert just 3 characters:<br />

age\\ /*a comment*/<br />

age\\<br />

__________________________________________________________________________


12.10 P-<strong>STAT</strong> MACROS<br />

Figure 12.7 Lots of Instream Macros<br />

.<br />

MACRO input $<br />

ibm.data<br />

ENDMACRO $<br />

MACRO ppl $<br />

if age gt 20, retain;<br />

set region <strong>to</strong> recode ( region, 99=m )<br />

ENDMACRO $<br />

MACRO labfile $<br />

"ibm.labels"<br />

ENDMACRO $<br />

MACRO date $<br />

August 13, 2006<br />

ENDMACRO $<br />

MACRO layout $<br />

layout question <strong>to</strong>tals labels body summary,<br />

places means 3,<br />

row column percents,<br />

ENDMACRO $<br />

MACRO define $<br />

define 'under $20,000' income 1 <strong>to</strong> 3,<br />

define 'over $20,000' income 4 <strong>to</strong> 6,<br />

ENDMACRO $<br />

MACRO stub.banner $<br />

stub age income, banner region;<br />

ENDMACRO $<br />

__________________________________________________________________________<br />

12.11 Using Lots of Instream Macros<br />

TRANSFER 'macro.file' $<br />

SURVEY !!input [ !!ppl ], LABELS !!labfile ;<br />

TITLE "this was run on !!date",<br />

!!layout<br />

!!define<br />

!!stub.banner<br />

$<br />

In <strong>the</strong> above example, a transfer is done first <strong>to</strong> a file containing <strong>the</strong> macro definitions that may be needed.<br />

The SURVEY command uses seven macros; <strong>the</strong> macro.file should contain those seven and may well contain<br />

more. Activating a macro which goes unused does not cause any problems; it simply uses a bit more space on a<br />

temporary scratch-file on disk. Figure 12.7 illustrates what <strong>the</strong> transfer file might contain.


P-<strong>STAT</strong> MACROS 12.11<br />

12.12 MACRO COMMANDS<br />

Thus far we have seen three commands that are associatied with macros.<br />

1. MACRO provides <strong>the</strong> macro name<br />

2. ENDMACROdefines <strong>the</strong> end of <strong>the</strong> macro<br />

3. RUN executes a block macro<br />

There are four more useful macro commands:<br />

4. MACRO.PAD<br />

5. SHOW.MACROS<br />

6. COUNT.MACROS<br />

7. FULL.MACRO.ARGS<br />

MACRO.PAD 0 $ changes <strong>the</strong> default padding for records of instream macros activated subsequently. The<br />

run begins with a default of 1. Values of 0 <strong>to</strong> 80 can be used.<br />

SHOW.MACROS $ can be used <strong>to</strong> display <strong>the</strong> currently activated macros. This prints <strong>the</strong> entire contents of<br />

<strong>the</strong> activated macros. SHOW.MACROS, NAMES $ can be used <strong>to</strong> list just <strong>the</strong> names of <strong>the</strong> currently activated<br />

macros. Adding FILE 'filename' causes <strong>the</strong> output <strong>to</strong> be written <strong>to</strong> that file.<br />

COUNT.MACROS $ simply reports how many macros have been read, how many had errors, and how many<br />

are usable. Since TRANSFER only reports each macro activation when verbosity is 4, using COUNT.MACROS<br />

after a transfer <strong>to</strong> a macro library gives a sense of what went on.<br />

When a macro has many arguments, some of which may not be present, constructing a call with <strong>the</strong> proper<br />

number of null arguments can be tricky. FULL.MACRO.ARGS is a command that can be used <strong>to</strong> specify whe<strong>the</strong>r<br />

trailing null arguments are required or optional.<br />

FULL.MACRO.ARGS OFF $<br />

turns off <strong>the</strong> requirement that all macro arguments must be fully represented. Thus in a macro that references 1<br />

<strong>to</strong> 12 months and has 12 arguments in its definition<br />

!!zzz ( Jan, Feb )<br />

can be used instead of:<br />

!!zzz ( Jan, Feb ,,,,,,,,, )<br />

This setting works only for <strong>the</strong> trailing (rightmost) arguments. The command <strong>to</strong> require fully supplied arguments<br />

is:<br />

FULL.MACRO.ARGS $<br />

The macro call can still have null arguments or a comma for defaults but <strong>the</strong>re must be something represented for<br />

every arguments.<br />

The records in a macro definition should not exceed 80 characters. A macro must be activated before it can<br />

be used. This is largely au<strong>to</strong>matic. If you TRANSFER <strong>to</strong> a file which holds all of your macro definitions, <strong>the</strong><br />

result of <strong>the</strong> transfer is <strong>to</strong> activate all of <strong>the</strong> macros it found <strong>the</strong>re.<br />

12.13 CORRECTING MACROS IN THE EDITOR<br />

A macro appears in <strong>the</strong> edi<strong>to</strong>r as a single MACRO command which has some number of 'data' records. Its data<br />

records are, in fact, <strong>the</strong> rest of <strong>the</strong> macro. To change it you should modify <strong>the</strong> text as needed and <strong>the</strong>n EXECUTE<br />

<strong>the</strong> macro command. That de-activates <strong>the</strong> old version and activates <strong>the</strong> new version.


12.12 P-<strong>STAT</strong> MACROS<br />

If a macro appears in <strong>the</strong> edi<strong>to</strong>r as a series of commands, ending with an ENDMACRO$ command, it can be<br />

changed in <strong>the</strong> usual way. Then, <strong>to</strong> activate <strong>the</strong> changed version, simply EXECUTE <strong>the</strong> macro command; <strong>the</strong> rest<br />

of <strong>the</strong> macro will au<strong>to</strong>matically be included in <strong>the</strong> activation.<br />

12.14 BLOCK MACROS<br />

A block macro is a named collection of P-<strong>STAT</strong> commands and subcommands or data records. It is only necessary<br />

<strong>to</strong> use <strong>the</strong> RUN command with <strong>the</strong> name of <strong>the</strong> macro in order <strong>to</strong> execute <strong>the</strong> entire series of commands. This<br />

section covers:<br />

1. The special features of block macros.<br />

2. SUBFILES controls a loop through a series of commands. The loop is executed once for each subgroup<br />

found in <strong>the</strong> SUBFILES input file. SUBFILES can only be used within a block macro.<br />

3. DIALOG permits a conversation with <strong>the</strong> user. DIALOG is usually, but not necessarily, used within<br />

a macro.<br />

___________________________________________________________________________<br />

Figure 12.8 Defining a Block Macro<br />

MACRO Sales ( Month ) $<br />

/*<br />

TO USE: RUN Sales ( Month )$<br />

For Month substitute <strong>the</strong> 3 letter abbreviation for <strong>the</strong> current month.<br />

*/<br />

The Sales macro is <strong>to</strong> be run on <strong>the</strong> 5th of each month.<br />

Copies of <strong>the</strong> report should be sent immediately <strong>to</strong> all department<br />

heads and <strong>to</strong> Sam Knightbridge, Vice President of Sales.<br />

TITLE 'Sales by Region and Department for <strong>the</strong> Month of &Month, 2010' $<br />

SORT Sales&Month,<br />

BY Region Department,<br />

OUT &MonthSales $<br />

LIST &MonthSales,<br />

TOTALS Dollar.Amounts Sales,<br />

MEANS Dollar.Amounts,<br />

BY Region Department,<br />

TITLES $<br />

ENDMACRO $<br />

___________________________________________________________________________<br />

12.15 Executing a Block Macro<br />

After a block macro is active (i.e., its definition has been read by P-<strong>STAT</strong>), it can be executed by using <strong>the</strong> RUN<br />

command. The RUN command executes <strong>the</strong> entire series of commands defined in <strong>the</strong> macro:<br />

RUN ABC $


P-<strong>STAT</strong> MACROS 12.13<br />

RUN also passes character string arguments <strong>to</strong> <strong>the</strong> macro. The number of arguments depends on <strong>the</strong> macro<br />

definition. The arguments are enclosed in paren<strong>the</strong>sis and can be ei<strong>the</strong>r keyword or positional. This sales macro<br />

has a single keyword dummy argument. When it is executed <strong>the</strong> run command must provide <strong>the</strong> actual value <strong>to</strong><br />

be used for that argument. Given:<br />

MACRO Sales ( State ) $<br />

LIST &State $<br />

ENDMACRO $<br />

The macro is executed by a RUN command such as:<br />

RUN Sales ( NJ) $<br />

___________________________________________________________________________<br />

Figure 12.9 The RUN Command and Partial Output<br />

ECHO $<br />

RUN Sales ( Jan ) $<br />

TITLE 'Sales by Region and Department for <strong>the</strong> Month of Jan, 2010' $<br />

SORT SalesJan,<br />

BY Region Department,<br />

OUT JanSales $<br />

Sort on 4 cases completed.<br />

The largest change in position for any case was 2 positions.<br />

LIST JanSales,<br />

TOTALS Dollar.Amounts Sales,<br />

MEANS Dollar.Amounts,<br />

BY Region Department,<br />

COMMAS$<br />

Sales by Region and Department for <strong>the</strong> Month of Jan, 2010<br />

-- Region : East --<br />

-- Department: Clothing --<br />

Dollar<br />

Sales Amounts<br />

45,265 534,500<br />

25,430 435,005<br />

Department ------ ---------<br />

Total 70,695 969,505<br />

Department ------------<br />

Mean 484,752.50<br />

___________________________________________________________________________


12.16 Macro Substitution Using Strings<br />

Figure 12.8 contains a macro <strong>to</strong> do a monthly report. The macro “Sales” contains a series of commands <strong>to</strong> process<br />

sales records on a monthly basis. It contains a comment section which begins with /* and ends with */ .<br />

The input <strong>to</strong> <strong>the</strong> Sales macro is a P-<strong>STAT</strong> system file with a name such as SalesJan or SalesFeb. In <strong>the</strong> macro<br />

<strong>the</strong> name of <strong>the</strong> input P-<strong>STAT</strong> system file is Sales&Month. “&Month” is a string that will change depending on<br />

<strong>the</strong> report that is needed. When <strong>the</strong> macro is executed, <strong>the</strong> string “&Month" is replaced by <strong>the</strong> argument value<br />

provided in <strong>the</strong> RUN command for <strong>the</strong> dummy argument Month. &(Month) can also be used.<br />

RUN Sales ( Jan ) $<br />

The substitution is done wherever an ampersand (&) is immediately followed by “Month”, <strong>the</strong> dummy argument<br />

in <strong>the</strong> macro definition. Substitution occurs in commands, subcommands and even in data records. The use<br />

of <strong>the</strong> & before each use of <strong>the</strong> dummy argument ensures that <strong>the</strong> substitution is only done where it is intended.<br />

The form of &(Month) can also be used.<br />

Figure 12.9 contains <strong>the</strong> RUN command for <strong>the</strong> Sales macro as well as partial output. Before <strong>the</strong> RUN command,<br />

<strong>the</strong>re is an ECHO command. The reason for using ECHO is <strong>to</strong> see <strong>the</strong> commands after text substitution has<br />

occurred. Note: The comment text is not echoed because it is discarded as it is read.<br />

12.17 Scope of Temporary Scratch Variables<br />

Temporary scratch variables, such as #N, usually are erased when <strong>the</strong> command in which <strong>the</strong>y are created ends;<br />

however, a temporary scratch variable generated in a macro exists for <strong>the</strong> life of that macro. It is, <strong>the</strong>refore, available<br />

for use by all commands in <strong>the</strong> macro. It is erased only when <strong>the</strong> macro exits.<br />

GEN #Hname:C $<br />

PROCESS All [<br />

IF Hid EQ &Hnum, SET #Hname = Hospital, QUITCOMMAND ] $<br />

LIST #Hname [ KEEP Name Age Diagnosis ] $<br />

ENDMACRO $<br />

RUN ListH ( 1 ) $<br />

The PROCESS command is used <strong>to</strong> search for <strong>the</strong> first case which has a value of 1 for variable Hid. The value<br />

of variable Hospital is s<strong>to</strong>red in a character temporary scratch variable and <strong>the</strong> command terminates. When <strong>the</strong><br />

LIST command is scanned by <strong>the</strong> P-<strong>STAT</strong> executive routines <strong>the</strong> value s<strong>to</strong>red in #Hname is substituted for <strong>the</strong><br />

filename. If it is not a legal name for a P-<strong>STAT</strong> file an error occurs.<br />

If a permanent scratch variable is defined for local (within-macro) use, it runs <strong>the</strong> risk of stepping on a permanent<br />

scratch variable of <strong>the</strong> same name used elsewhere in some o<strong>the</strong>r way. Having a temporary scratch variable<br />

be usable across commands within a macro avoids this risk.<br />

12.18 Scratch Variables and Nested Macros<br />

Suppose macro AAA begins with:<br />

SET P(1) = 22 $<br />

GEN ##A = 23 $<br />

GEN #B = 24 $<br />

RUN XXX $<br />

P(1) is now 22 and ##A is 23. Since <strong>the</strong>ir scope is global, macro XXX can use <strong>the</strong>m and get or change <strong>the</strong> values<br />

that macro AAA just set. Also <strong>the</strong> values do not vanish when macro AAA exits.<br />

What about temporary scratch variable #B?<br />

1. as mentioned before, it 'belongs’ <strong>to</strong> macro AAA. It exists for <strong>the</strong> commands within macro AAA. It<br />

vanishes when macro AAA exits. In o<strong>the</strong>r words, its scope is local <strong>to</strong> macro AAA.


P-<strong>STAT</strong> MACROS 12.15<br />

2. It also exists for commands in macros called by macro AAA if <strong>the</strong> called macro does not generate<br />

its own version of #B.<br />

If AAA calls XXX, and XXX uses #B without a GENERATE, it gets and can change <strong>the</strong> #B that belongs <strong>to</strong><br />

macro AAA. If macro XXX does a GENERATE of #B, it now has its own #B, unrelated <strong>to</strong> <strong>the</strong> #B in macro AAA.<br />

This feature can be useful when a macro does some things and calls ano<strong>the</strong>r macro <strong>to</strong> finish <strong>the</strong> task. An example<br />

of this is a DIALOG macro calling a do-<strong>the</strong>-work macro, described later.<br />

12.19 Temporary Files in Macros<br />

Intermediate files produced in macros are often given names that begin with “MACFILE.” <strong>to</strong> indicate that <strong>the</strong>y<br />

are temporary files that are not needed after <strong>the</strong> macro completes. There can be one <strong>to</strong> eight characters after <strong>the</strong><br />

“MACFILE”. These files are referenced by <strong>the</strong>ir names in <strong>the</strong> macro, but <strong>the</strong>y are written on disk with names<br />

composed of <strong>the</strong> P-<strong>STAT</strong> prefix for temporary files, “W_”, and some random characters that are generated <strong>to</strong> produce<br />

a unique name. If you include a FILES $ command within a macro you will see a display like:<br />

---------------au<strong>to</strong>save files: D:\PSFILES--------------------------------<br />

| name current previous |<br />

| |<br />

|#macfil1.sor W_10eZX3.PS1 |<br />

| (# indicates a temporary WORK file) |<br />

-------------------------------------------------------------------------<br />

Temporary macro files are deleted when <strong>the</strong> macro finishes. (O<strong>the</strong>r temporary files are deleted when <strong>the</strong><br />

P-TAT session ends.) Figure 12.10 shows <strong>the</strong> Sales macro with <strong>the</strong> output from <strong>the</strong> sort command as a temporary<br />

file. This is <strong>the</strong>n input <strong>to</strong> <strong>the</strong> LIST command. When <strong>the</strong> ENDMACRO statement is processed, <strong>the</strong> temporary file<br />

is erased.<br />

___________________________________________________________________________<br />

Figure 12.10 Macros: Temporary File Names<br />

MACRO Sales ( Month ) $<br />

TITLE 'Sales by Region and Department for <strong>the</strong> Month of &Month, 2010' $<br />

SORT Sales&Month,<br />

BY Region Department,<br />

OUT MACFILE.sor $<br />

LIST MACFILE.sor,<br />

TOTALS Dollar.Amounts Sales,<br />

MEANS Dollar.Amounts,<br />

BY Region Department,<br />

TITLES $<br />

ENDMACRO $<br />

___________________________________________________________________________<br />

12.20 Subcommands in Macros<br />

Figure 12.11 is a variation on <strong>the</strong> Sales macro with a SURVEY command instead of <strong>the</strong> LIST command. SUR-<br />

VEY requires subcommand information. If <strong>the</strong> table is always <strong>the</strong> same and only <strong>the</strong> file varies, <strong>the</strong>n <strong>the</strong><br />

subcommand records can be included in <strong>the</strong> macro in <strong>the</strong>ir final form. If, however, <strong>the</strong> tables may change, <strong>the</strong>n<br />

<strong>the</strong>re must be provision for substitution of <strong>the</strong> subcommands.


12.16 P-<strong>STAT</strong> MACROS<br />

In this example, <strong>the</strong>re is provision for 2 subcommand records; one <strong>to</strong> provide <strong>the</strong> BANNER (column) information<br />

and one <strong>to</strong> provide <strong>the</strong> STUB (row) information. Each such record is limited <strong>to</strong> 80 characters. The RUN<br />

command for this variation would look like:<br />

RUN Sales( Jan,<br />

BANNER Region,<br />

STUB Department ) $<br />

Note that “BANNER Region” is a single argument replacing “bvar” in <strong>the</strong> macro definition.<br />

__________________________________________________________________________<br />

Figure 12.11 Macros: Supplying Subcommands<br />

MACRO Sales ( Month, bvar, svar )$<br />

TITLE 'Sales by Region and Department for <strong>the</strong> Month &Month, 2010' $<br />

SORT Sales&Month,<br />

BY Region Department,<br />

OUT MACFILE.sor $<br />

SURVEY MACFILE.sor, TITLES;<br />

PLACES PERCENTS 0,<br />

&bvar,<br />

&svar;<br />

$<br />

ENDMACRO $<br />

___________________________________________________________________________<br />

If <strong>the</strong> macro does not supply <strong>the</strong> subcommand punctuation as:<br />

SURVEY MACFILE.sor, TITLES;<br />

PLACES PERCENTS 0,<br />

&bvar &svar<br />

<strong>the</strong>n that punctuation must be in <strong>the</strong> arguments provided in <strong>the</strong> RUN command. Since <strong>the</strong> punctuation is meaningful<br />

<strong>to</strong> <strong>the</strong> RUN command, <strong>the</strong>se arguments must be enclosed in quotes.<br />

RUN Sales( Jan,<br />

'BANNER Region,',<br />

'STUB Department;' ) $<br />

In this type of situation, <strong>the</strong> block macro might well be designed <strong>to</strong> use instream macros for <strong>the</strong> subcommand<br />

definitions. Instream macros are covered in <strong>the</strong> previous chapter. Quotes around <strong>the</strong> arguments are stripped off<br />

and <strong>the</strong> contents of <strong>the</strong> quotes are substituted for <strong>the</strong> arguments in <strong>the</strong> macro. This means that you must use double<br />

quotes if you wish <strong>to</strong> pass a quoted string. For example:<br />

MACRO ttt ( t ) $<br />

TITLE &t $<br />

LIST sales.jan, TITLES $<br />

ENDMACRO $<br />

RUN ttt ( '".DATE."' ) $<br />

If <strong>the</strong> TITLE command itself contains <strong>the</strong> quotes:<br />

TITLE '&t' $<br />

The RUN command can be entered without quotes:<br />

RUN ttt ( .DATE. ) $


P-<strong>STAT</strong> MACROS 12.17<br />

___________________________________________________________________________<br />

Figure 12.12 Macro with Conditional Execution<br />

MACRO bvar $<br />

/* BANNER aa bb cc, */<br />

ENDMACRO $<br />

MACRO svar $<br />

/* STUB v1 v2 v3 */<br />

ENDMACRO $<br />

MACRO Sales.Report ( Month ) $<br />

/*<br />

TO USE:<br />

1. GENERATE ##CONTROL:C = 'LIST', 'SURVEY', or 'BOTH'<br />

2. RUN Sales.Report ( abc ) $<br />

For abc substitute <strong>the</strong> 3 letter abbreviation for <strong>the</strong> current month.<br />

If you are requesting a SURVEY you must supply stub variables in MACRO<br />

svar and (optionally) banner variables in MACRO bvar.<br />

*/<br />

TITLE 'Sales by Region and Department for <strong>the</strong> Month &Month, 2010 ' $<br />

SORT Sales&Month, BY Region Department, OUT MACFILE.sor $<br />

IF ##CONTROL EQ 'LIST' BRANCH Step1 $<br />

IF ##CONTROL EQ 'SURVEY' BRANCH Step2 $<br />

IF ##CONTROL NE 'BOTH' THEN;<br />

PUT 'Macro Sales.Report: ##CONTROL must be set <strong>to</strong> LIST, SURVEY or BOTH' ;<br />

BRANCH Finish ;<br />

ENDIF $<br />

Step1: LIST MACFILE.sor, TOTALS Dollar.Amounts Sales,<br />

MEANS Dollar.Amounts,<br />

BY Region Department,<br />

TITLES $<br />

IF ##CONTROL NE 'BOTH' BRANCH Finish $<br />

Step2: SURVEY MACFILE.sor, TITLES;<br />

PLACES PERCENTS 0,<br />

!!bvar<br />

!!svar ;<br />

$<br />

Finish: ENDMACRO $<br />

__________________________________________________________________________


12.18 P-<strong>STAT</strong> MACROS<br />

12.21 Conditional Execution of Commands<br />

Macro Sales.Report in Figure 12.11 is an enhanced version of macro Sales. When you execute this macro,<br />

you not only choose your file but which commands you wish <strong>to</strong> execute. The choice in this example is ei<strong>the</strong>r a<br />

LIST command, a SURVEY command or both <strong>the</strong> LIST and <strong>the</strong> SURVEY. The choice is made by setting a permanent<br />

system variable before running <strong>the</strong> macro. The macro in Figure 12.12 tests <strong>the</strong> scratch variable and<br />

branches <strong>to</strong> <strong>the</strong> desired command.<br />

GENERATE ##CONTROL:C = 'LIST' $<br />

RUN Sales.Report ( Jan ) $<br />

produces a report with just <strong>the</strong> LIST command. The following commands:<br />

GENERATE ##CONTROL = 'BOTH' $<br />

MACRO svar $<br />

STUB Department Region;<br />

ENDMACRO $<br />

RUN Sales.Report ( Jan ) $<br />

produce a report with a LIST and <strong>the</strong>n a SURVEY containing two 1-way tables.<br />

It is <strong>the</strong> BRANCH <strong>PPL</strong> instruction which transfers control <strong>to</strong> <strong>the</strong> appropriate set of commands. BRANCH is<br />

followed by <strong>the</strong> label of <strong>the</strong> next command <strong>to</strong> be executed. That label must be at <strong>the</strong> beginning of a command line<br />

followed by a colon (:). BRANCH can be used in any command stream <strong>to</strong> bypass commands.<br />

__________________________________________________________________________<br />

Figure 12.13 Macros: Reversing <strong>the</strong> Order of Execution<br />

MACRO Sales.Report ( Month ) $<br />

SORT Sales&Month, BY Region Department, OUT MACFILE.sor $<br />

IF ##CONTROL AMONG ( 'LIST' 'BOTH' ) BRANCH Step1 $<br />

IF ##CONTROL AMONG ( 'SURVEY' 'REVERSE' ) BRANCH Step2 $<br />

PUT 'Macro Sales: Invalid value for ##CONTROL' $<br />

BRANCH Finish $<br />

Step1: LIST MACFILE.sor, TOTALS Dollar.Amounts Sales,<br />

MEANS Dollar.Amounts,<br />

BY Region Department,<br />

TITLES $<br />

IF ##CONTROL AMONG ( 'LIST' 'REVERSE' ) BRANCH Finish $<br />

Step2: SURVEY MACFILE.sor, TITLES;<br />

PLACES PERCENTS 0,<br />

!!bvar<br />

!!svar ;<br />

$<br />

IF ( ##CONTROL EQ 'REVERSE' ) BRANCH Step1 $<br />

Finish: ENDMACRO $<br />

___________________________________________________________________________


P-<strong>STAT</strong> MACROS 12.19<br />

In a macro, BRANCH can be used <strong>to</strong> ei<strong>the</strong>r bypass commands or <strong>to</strong> branch back and execute commands that<br />

occur earlier in <strong>the</strong> macro. Thus it is easy <strong>to</strong> change Sales.Report so that <strong>the</strong> order of <strong>the</strong> report, LIST and <strong>the</strong>n<br />

SURVEY or SURVEY and <strong>the</strong>n LIST, is also controlled. This requires only <strong>the</strong> ability <strong>to</strong> branch around <strong>the</strong> LIST<br />

and <strong>the</strong>n possibly <strong>to</strong> branch back. Figure 12.13 contains <strong>the</strong> changes needed <strong>to</strong> add this option.<br />

12.22 DIALOG<br />

DIALOG is a <strong>PPL</strong> function which can be used anywhere but is especially useful when you wish <strong>to</strong> make a macro<br />

easy for someone else <strong>to</strong> use. If <strong>the</strong> macro is designed correctly with DIALOG, it can be run interactively by a<br />

user who knows little more than <strong>the</strong> names of <strong>the</strong> macro and <strong>the</strong> files or variables he wishes <strong>to</strong> select. With DIA-<br />

LOG in place <strong>the</strong> RUN command for this version of <strong>the</strong> Sales Report macro is simply:<br />

RUN sales.report $<br />

Using <strong>the</strong> macro in Figure 12.14, <strong>the</strong> following messages <strong>the</strong>n appear on <strong>the</strong> screen. User replies are in bold-faced<br />

type:<br />

-------------------------------------------------<br />

Enter <strong>the</strong> three letter abbreviation for <strong>the</strong> month<br />

feb<br />

Enter one of <strong>the</strong> numbers 1-4 for <strong>the</strong>se choices<br />

1: LIST command only<br />

2: LIST and SURVEY commands<br />

3: SURVEY command only<br />

4: SURVEY and LIST commands<br />

4<br />

Enter <strong>the</strong> names of your column (banner) variables<br />

region department<br />

Enter <strong>the</strong> names of your stub (row) variables<br />

item1 TO item10<br />

___________________________________________________________________________<br />

Figure 12.14 Macros: DIALOG Provides an Interactive Front End<br />

MACRO Sales.Report $<br />

GEN ##Reply, GEN #Mon:c3 $<br />

GEN #bvar:c78 =' ', GEN #svar:c78 =' ' $<br />

GEN #bvar2:c80=' ', GEN #svar2:c80=' ' $<br />

Prompt1: DIALOG #Mon<br />

'-------------------------------------------------'<br />

'Enter <strong>the</strong> three letter abbreviation for <strong>the</strong> month'<br />

HELP 'Expected abbreviations include'<br />

'jan feb mar apr may jun jul aug sep oct nov dec' $<br />

IF .RESPONSE. EQ 0 OR .RESPONSE. EQ -9 BRANCH Finish $<br />

IF .RESPONSE. NE 14 BRANCH Prompt1 $<br />

IF #Mon NOTAMONG ( 'jan' 'feb' 'mar' 'apr' 'may' 'jun'<br />

'jul' 'aug' 'sep' 'oct' 'nov' 'dec' )<br />

BRANCH Prompt1 $<br />

Prompt2: DIALOG ##Reply ' '<br />

'Enter one of <strong>the</strong> numbers 1-4 for <strong>the</strong>se choices'


12.20 P-<strong>STAT</strong> MACROS<br />

'1: LIST command only'<br />

'2: LIST and SURVEY commands'<br />

'3: SURVEY command only'<br />

'4: SURVEY and LIST commands' $<br />

IF .RESPONSE. EQ 0 BRANCH Finish $<br />

IF .RESPONSE. NE 1 BRANCH Prompt2 $<br />

IF ##REPLY LT 1 .OR. ##REPLY GT 4 BRANCH Prompt2 $<br />

IF ##REPLY EQ 1 BRANCH Do.it $<br />

Prompt3: DIALOG #bvar<br />

'Enter <strong>the</strong> names of your column (banner) variables' $<br />

IF .RESPONSE. NOTAMONG ( -2 14 16 ) BRANCH Prompt3 $<br />

IF .RESPONSE. NE -2 SET #bvar2 = 'BAN' /// LRTRIM ( #bvar ) // ',' $<br />

Prompt4: DIALOG #svar<br />

'Enter <strong>the</strong> names of your stub (row) variables' $<br />

IF .RESPONSE. NOTAMONG ( -2 14 16 ) BRANCH Prompt4 $<br />

IF .RESPONSE. NE -2 SET #svar2 = 'STUB' /// LRTRIM ( #svar ) // ',' $<br />

Do.it: RUN Report.Step2 ( #mon, #bvar2, #svar2 ) $<br />

FINISH: ENDMACRO $<br />

MACRO Report.Step2 ( Month, bvar, svar ) $<br />

TITLE 'Sales by Region and Department for <strong>the</strong> Month &Month, 2010 ' $<br />

SORT Sales&Month, BY Region Department, OUT MACFILE.sor $<br />

IF ##REPLY EQ 1 OR ##REPLY EQ 2 BRANCH Step1 $<br />

IF ##REPLY EQ 3 OR ##REPLY EQ 4 BRANCH Step2 $<br />

Step1: LIST MACFILE.sor, TOTALS Dollar.Amounts Sales,<br />

MEANS Dollar.Amounts,<br />

BY Region Department,<br />

TITLES $<br />

IF ##REPLY EQ 1 OR ##REPLY EQ 4 BRANCH Finish $<br />

Step2: SURVEY MACFILE.sor, TITLES;<br />

PLACES PERCENTS 0,<br />

&bvar<br />

&svar ;<br />

$<br />

IF ##REPLY EQ 4 BRANCH Step1 $<br />

Finish: ENDMACRO $<br />

__________________________________________________________________________


P-<strong>STAT</strong> MACROS 12.21<br />

In order <strong>to</strong> supply this friendly front end, <strong>the</strong> Sales.Report macro is rewritten as “Report.Step2” and a new<br />

Sales.Report macro is designed which prompts for <strong>the</strong> information it needs. It uses this information <strong>to</strong> build <strong>the</strong><br />

RUN command for Report.Step2. Figure 12.14 lists <strong>the</strong> new Sales.Report macro and <strong>the</strong> revised Report.Step2.<br />

Report.Step2 is very like <strong>the</strong> previous Sales.Report except that <strong>the</strong> character ##CONTROL variable is replaced<br />

by <strong>the</strong> use of <strong>the</strong> ##REPLY numeric scratch variable. The SURVEY command is also slightly changed so<br />

that <strong>the</strong> user need only know <strong>the</strong> names of <strong>the</strong> variables that define <strong>the</strong> rows and columns ra<strong>the</strong>r than rewrite <strong>the</strong><br />

supporting instream bvar and svar macros.<br />

There is a great deal of overhead in a DIALOG macro if you wish <strong>to</strong> provide for all <strong>the</strong> possible responses<br />

that a user may make. There should be provisions for QUIT. Help text and tests for appropriate replies should be<br />

provided whenever possible.<br />

12.23 Format of <strong>the</strong> DIALOG command<br />

The DIALOG command has a scratch variable and some number of lines of text enclosed in quotes. The<br />

scratch variable is required only if a reply is expected. Each line of text is displayed on a separate line on <strong>the</strong><br />

terminal. The lines of text can contain scratch variables. If so, <strong>the</strong>ir current values are displayed.<br />

Optional HELP text is also part of <strong>the</strong> DIALOG command. This is not displayed unless <strong>the</strong> user requests it<br />

by entering ei<strong>the</strong>r “H” or “HELP” in reply <strong>to</strong> <strong>the</strong> prompt. The keyword “HELP” separates <strong>the</strong> normal DIALOG<br />

text from <strong>the</strong> HELP text. In Figure 12.14, <strong>the</strong> first DIALOG command:<br />

Prompt1: DIALOG #Mon<br />

'-------------------------------------------------'<br />

'Enter <strong>the</strong> three letter abbreviation for <strong>the</strong> month'<br />

HELP 'Expected abbreviations include'<br />

'jan feb mar apr may jun jul aug sep oct nov dec' $<br />

contains a scratch variable, 2 lines of text, and <strong>the</strong> HELP key word followed by 2 lines of help text. Note: <strong>the</strong><br />

scratch variable must be created before <strong>the</strong> DIALOG command.<br />

There are two mechanisms for examining a user reply. The first is <strong>the</strong> user reply which is s<strong>to</strong>red in <strong>the</strong> DIA-<br />

LOG scratch variable. The second is a numeric system variable .RESPONSE. which contains a code indicating<br />

<strong>the</strong> type of <strong>the</strong> reply. .RESPONSE. is set each time DIALOG is executed. .RESPONSE. values are:<br />

negative: no response, or an invalid response:<br />

-2 = entirely blank<br />

-4 = H or HELP, but <strong>the</strong> dialog had no help text<br />

-6 = 'abc' for a numeric scratch variable, or such<br />

-8 = a scratch variable was not supplied<br />

-9 = in batch mode<br />

zero: <strong>the</strong> response was Q or QUIT<br />

positive: a valid response:<br />

1 = integer, like 1990<br />

2 = non-integer, like 3.1416<br />

11 = Y or YES<br />

12 = N or NO<br />

14 = character response o<strong>the</strong>r than yes/no/quit<br />

that is a legal p-stat name or label<br />

16 = o<strong>the</strong>r character response<br />

In Figure 12.14 <strong>the</strong> code which looks at <strong>the</strong> user reply first checks <strong>to</strong> see whe<strong>the</strong>r QUIT was entered and <strong>to</strong><br />

make sure that <strong>the</strong> macro is not being inappropriately used in a batch run.<br />

IF .RESPONSE. EQ 0 OR .RESPONSE. EQ -9 BRANCH Finish $


12.22 P-<strong>STAT</strong> MACROS<br />

The next check is <strong>to</strong> make sure that <strong>the</strong> response is a single word:<br />

IF .RESPONSE. NE 14 BRANCH Prompt1 $<br />

The final check tests <strong>the</strong> character scratch variable #Mon <strong>to</strong> make sure that it is one of <strong>the</strong> 12 months.<br />

IF #Mon NOTAMONG ( 'jan' 'feb' 'mar' 'apr' 'may' 'jun'<br />

'jul' 'aug' 'sep' 'oct' 'nov' 'dec' )<br />

BRANCH Prompt1 $<br />

These checks are not as complete and informative as <strong>the</strong>y might be. In <strong>the</strong> example above <strong>the</strong> BRANCH<br />

might better have been preceded by:<br />

PUT '<strong>Inc</strong>orrect reply. Use H <strong>to</strong> get a list of <strong>the</strong> 3 character months',<br />

The following provides a more complete diagnostic of <strong>the</strong> problem when DIALOG is run in a batch job:<br />

IF .RESPONSE. EQ -9 PUT <br />

<br />

<br />

;<br />

GO TO FINISH;<br />

Note <strong>the</strong> use of ##REPLY which is a permanent scratch variable when <strong>the</strong> o<strong>the</strong>r scratch variables in <strong>the</strong><br />

Sales.Report macro are generated with a single # sign as temporary scratch variables. If ##REPLY is generated<br />

as a temporary scratch variable, Report.Step2 cannot be run as a standalone macro without <strong>the</strong> front end dialog.<br />

The o<strong>the</strong>r information that it needs, <strong>the</strong> month and <strong>the</strong> stub and banner variables are passed <strong>to</strong> it as arguments and<br />

it does not matter whe<strong>the</strong>r <strong>the</strong> RUN command comes from <strong>the</strong> dialog macro or from a standalone RUN command.<br />

With ##REPLY as a permanent scratch variable, <strong>the</strong> macro can be run interactively with a dialog or in a batch<br />

command stream.<br />

The o<strong>the</strong>r three prompt sections in Figure 12.14 are all similar <strong>to</strong> <strong>the</strong> first prompt section. In each case <strong>the</strong><br />

essentials are in place, but improvements could be made <strong>to</strong> <strong>the</strong> error handling.<br />

There is one tricky piece in preparing <strong>the</strong> character string arguments for <strong>the</strong> Report.Step2 macro. The problem<br />

occurs when passing a character string <strong>to</strong> a macro if that character string contains a comma. If a string is not<br />

enclosed in quotes when it is given <strong>to</strong> <strong>the</strong> RUN command <strong>the</strong> comma which is needed in <strong>the</strong> macro instead serves<br />

as a delimiter between <strong>the</strong> arguments of <strong>the</strong> RUN command. If it is enclosed in quotes, <strong>the</strong> quotes are stripped off<br />

by <strong>the</strong> MACRO command after <strong>the</strong> string is properly s<strong>to</strong>red.<br />

Because quotes are stripped off as <strong>the</strong> RUN command is processed, a string that requires quotes within <strong>the</strong><br />

macro must be enclosed in double quotes or angle brackets. For example<br />

MACRO small ( t ) $<br />

TITLES &t $<br />

LIST myfile, TITLES $<br />

ENDMACRO $<br />

RUN small ( ) $<br />

12.24 Does <strong>the</strong> File Exist<br />

A user friendly macro can also check that <strong>the</strong> files, which are referenced in <strong>the</strong> macro, exist, and if <strong>the</strong>y do not<br />

provide a reasonable error message. The P-<strong>STAT</strong> command INQUIRE.EXTERNAL is used <strong>to</strong> test <strong>the</strong> existence<br />

of a given external file. It sets a system variable, .XINQUIRE. <strong>to</strong> 1 if <strong>the</strong> file is <strong>the</strong>re and <strong>to</strong> 0 if it is not <strong>the</strong>re. If<br />

<strong>the</strong> Sales.Report macro used a labels file named 'report.lab' we could check its existence:<br />

GEN #LABNAME = "'report.lab'" $<br />

INQUIRE.EXTERNAL #LABNAME $<br />

IF .XINQUIRE. EQ 1, BRANCH OK $<br />

DIALOG 'Labels file #LABNAME is needed.',<br />

OK: etc.


P-<strong>STAT</strong> MACROS 12.23<br />

The existence of a P-<strong>STAT</strong> system file can also be tested. INQUIRE ABC $ set .INQUIRE. <strong>to</strong> 1 if it exists,<br />

and <strong>to</strong> zero if it does not.<br />

12.25 SUBFILES<br />

The SUBFILES command is a major feature which is only available within macros. SUBFILES provides a BY<br />

capability for all <strong>the</strong> commands within its provenance. SUBFILES is similar <strong>to</strong> MACROS in that its domain begins<br />

and ends with a P-<strong>STAT</strong> command. For SUBFILES, <strong>the</strong> ending command is ENDSUBFILES $.<br />

Figure 12.15 Macro With SUBFILES<br />

___________________________________________________________________________<br />

MACRO Sales ( Month )$<br />

SUBFILES Sales&Month, BY Region $<br />

SORT SUBFILE, BY Department, OUT Work $<br />

TITLES 'Report by #Region for &Month 2010' $<br />

LIST Work, BY Department,<br />

NO.CASES, TOTALS,<br />

TITLES $<br />

ENDSUBFILES $<br />

ENDMACRO $<br />

___________________________________________________________________________<br />

Figure 12.15 contains <strong>the</strong> commands for a simple macro with a SUBFILES command. The macro prints a<br />

separate report for each value of <strong>the</strong> BY variable Region. The file does NOT have <strong>to</strong> be sorted on <strong>the</strong> BY variable.<br />

The TITLES command refers <strong>to</strong> #REGION, a scratch variable that appears <strong>to</strong> be undefined. That is because it is<br />

defined behind <strong>the</strong> scenes by <strong>the</strong> SUBFILES command. For every BY variable a scratch variable is created which<br />

has <strong>the</strong> same name as <strong>the</strong> BY variable with <strong>the</strong> single # prefix. These scratch variables contain <strong>the</strong> current value<br />

of each BY variable as <strong>the</strong> SUBFILE iterations are done.<br />

The input <strong>to</strong> <strong>the</strong> SUBFILES command is a P-<strong>STAT</strong> system file. This will usually be <strong>the</strong> only time that file<br />

is referenced in <strong>the</strong> SUBFILES loop. The file name "SUBFILE" is used <strong>to</strong> refer <strong>to</strong> <strong>the</strong> current subgroup that is<br />

being processed regardless of <strong>the</strong> original input file name.<br />

It is <strong>the</strong> value of <strong>the</strong> scratch variables which allow you <strong>to</strong> easily identify which of <strong>the</strong> many possible subgroups<br />

is currently being processed. These are temporary scratch variables but since <strong>the</strong>y are defined within a MACRO<br />

command, <strong>the</strong>y exist as long as <strong>the</strong> macro is being processed. This means that <strong>the</strong> values are available throughout<br />

<strong>the</strong> subfile process. These scratch variables can be used in TITLES and in PUT statements.<br />

The SUBFILES command needs <strong>to</strong> know <strong>the</strong> name of <strong>the</strong> input file and <strong>the</strong> names of <strong>the</strong> BY variables There<br />

can be up <strong>to</strong> 15 different BY variables and <strong>the</strong>re can be a mixture of numeric and character variables. Thus it is<br />

possible <strong>to</strong> have hundreds of different groups defined by all possible combinations of <strong>the</strong> by group variables. For<br />

each such group a pass is made through all <strong>the</strong> commands within <strong>the</strong> current SUBFILE.<br />

12.26 SUBFILES Optional Identifiers<br />

The SUBFILES command has several optional identifiers. Usually <strong>the</strong> groups are presented in <strong>the</strong> order in<br />

which <strong>the</strong>y are encountered in <strong>the</strong> data file. This can be controlled by using <strong>the</strong> identifiers UP or DOWN. The<br />

following illustrates how <strong>the</strong>se identifiers work with two BY variables, one numeric and one character. The first<br />

pair of columns is <strong>the</strong> order in which <strong>the</strong> initial case of each subgroup is found in <strong>the</strong> input file. The second pair<br />

of columns is <strong>the</strong> way <strong>the</strong> groups are organized if UP is used. The third pair of columns illustrates <strong>the</strong> DOWN<br />

order.


12.24 P-<strong>STAT</strong> MACROS<br />

Natural Order UP Order DOWN Order<br />

2 West 1 North 3 South<br />

1 North 1 West 3 East<br />

3 South 2 East 2 West<br />

2 East 2 North 2 South<br />

1 West 2 South 2 North<br />

2 North 2 West 2 East<br />

3 East 3 East 1 West<br />

2 South 3 South 1 North<br />

FREQUENCIES is a SUBFILES identifier that causes <strong>the</strong> groups <strong>to</strong> be displayed according <strong>to</strong> <strong>the</strong> number of<br />

cases in <strong>the</strong> group. When FREQUENCIES is used, DOWN is assumed unless UP is specified. When DOWN is<br />

used <strong>the</strong> group with <strong>the</strong> largest number of cases is first. When UP is used that group is last.<br />

SUBFILES Myfile, BY Age Region, FREQUENCIES, UP $<br />

Character variables are considered a match if <strong>the</strong> characters are identical even if <strong>the</strong> case of <strong>the</strong> characters is<br />

different. The identifier EXACT can be used. When EXACT is used, a value will be considered part of a new<br />

group unless all <strong>the</strong> characters are exactly <strong>the</strong> same in every way. South is different from SOUTH which is different<br />

from south, and so on.<br />

The final identifier is <strong>the</strong> GROUPS identifier which is followed by <strong>the</strong> name of a file of group definitions.<br />

Usually this file is generated for you and stays behind <strong>the</strong> scenes. Figure 12.16 shows <strong>the</strong> commands that are actually<br />

executed when <strong>the</strong> Sales macro with SUBFILES is run.<br />

There are two commands that do <strong>the</strong> work in a SUBFILES loop. LOCATE.GROUPS reads through <strong>the</strong> input<br />

file and determines how many groups <strong>the</strong>re are and how many cases are in each group. It also notes where <strong>the</strong>first<br />

and last cases in each group are located in <strong>the</strong> file. A GROUPS file from <strong>the</strong> LOCATE.GROUPS command with<br />

a single BY variable with just 2 values might look like:<br />

Number<br />

First Last of Compare<br />

case case cases Region mode<br />

1 22 15 West not exact<br />

6 24 9 East not exact<br />

Figure 12.16 The SUBFILE Commands<br />

___________________________________________________________________________<br />

SUBFILES Salesfeb, BY Region $<br />

LOCATE.GROUPS Salesfeb, by Region,<br />

verbosity 1, groups WORK0032 $<br />

SUBNEXT Salesfeb [ cases 1111 <strong>to</strong> 9999],<br />

groups WORK0032, out subfile $<br />

TITLES 'Report by #Region for feb 2010' $<br />

SORT SUBFILE, BY Department, OUT WORK $<br />

LIST WORK, BY Department, TITLES, TOTALS$<br />

Report by West for feb 2010<br />

-- Department: Clothing --


P-<strong>STAT</strong> MACROS 12.25<br />

( Rest of report for West follows )<br />

ENDSUBFILES $<br />

SUBNEXT Salesfeb [ cases 1111 <strong>to</strong> 9999],<br />

groups WORK0032, out subfile $<br />

TITLES 'Report by #Region for feb 2010' $<br />

SORT SUBFILE, BY Department, OUT WORK $<br />

LIST WORK, BY Department, TITLES, TOTALS$<br />

Report by East for feb 2010<br />

-- Department: Clothing --<br />

( Rest of report for East follows )<br />

ENDSUBFILES $<br />

MACDONE$<br />

___________________________________________________________________________<br />

12.27 SUBFILES Looping<br />

The second command, SUBNEXT, controls <strong>the</strong> looping. It keeps track of <strong>the</strong> current group and, using <strong>the</strong><br />

GROUPS file from <strong>the</strong> LOCATE.GROUPS command, creates a subset of <strong>the</strong> original data file which contains<br />

just <strong>the</strong> members of <strong>the</strong> current group. The SUBNEXT command which appears <strong>to</strong> be:<br />

SUBNEXT SalesFeb [ CASES 1111 TO 9999 ] is executed as if it were<br />

SUBNEXT SalesFeb [ CASES 1 TO 22 ] for <strong>the</strong> first group and<br />

SUBNEXT SalesFeb [ CASES 6 TO 24 ] for <strong>the</strong> second group.<br />

This enables <strong>the</strong> SUBNEXT command <strong>to</strong> work very efficiently, especially if <strong>the</strong> file is already partially or fully<br />

sorted.<br />

The output file from SUBNEXT is always written <strong>to</strong> a file with <strong>the</strong> name "SUBFILE". This explains why<br />

<strong>the</strong> input <strong>to</strong> <strong>the</strong> SORT in Figure 12.16 is file "subfile". It is not a magic name out of nowhere, it is an actual temporary<br />

file that is created <strong>to</strong> contain <strong>the</strong> current subgroup.<br />

The SUBNEXT command is internal <strong>to</strong> SUBFILES and cannot be executed by a user. The LO-<br />

CATE.GROUPS command, on <strong>the</strong> o<strong>the</strong>r hand, can be executed at any time during a run and provides an easy way<br />

<strong>to</strong> determine <strong>the</strong> number of cases in <strong>the</strong> subgroups of a file. You can run <strong>the</strong> LOCATE.GROUPS command before<br />

a SUBFILE loop. The final identifier <strong>to</strong> <strong>the</strong> SUBFILES command is <strong>the</strong> GROUPS identifier which, if used, requires<br />

<strong>the</strong> name of a GROUPS output file from a previous LOCATE.GROUPS command.<br />

Because <strong>the</strong> GROUPS file is a P-<strong>STAT</strong> system file which can itself be modified with <strong>PPL</strong> <strong>the</strong>re is yet fur<strong>the</strong>r<br />

control over <strong>the</strong> groups that are processed. For example:<br />

MACRO Grouper $<br />

LOCATE.GROUPS Myfile, BY County State, GROUPS MyGroups $<br />

SUBFILES Myfile, BY County State,<br />

GROUPS MyGroups [ if Number.of.cases LT 20, EXCLUDE ] $<br />

This will cause all <strong>the</strong> small groups <strong>to</strong> be omitted from <strong>the</strong> rest of <strong>the</strong> SUBFILE loop.<br />

The SUBFILES identifiers UP and DOWN apply <strong>to</strong> all <strong>the</strong> BY variables. If <strong>the</strong> order that you want is UP on<br />

one variable and DOWN on ano<strong>the</strong>r, that can be accomplished by using <strong>the</strong> LOCATE.GROUPS command followed<br />

by a SORT command.<br />

LOCATE.GROUPS Myfile, BY Sales State, GROUPS Mygroup $


12.26 P-<strong>STAT</strong> MACROS<br />

SORT Mygroup, BY Sales (D) State (U), OUT Mygroup $<br />

SUBFILES Myfile, GROUPS Mygroup $<br />

If you use LOCATE.GROUPS <strong>to</strong> create your own GROUPS file, you may use any of <strong>the</strong> SUBFILES identifiers<br />

UP, DOWN, FREQUENCIES, and EXACT in <strong>the</strong> LOCATE.GROUPS command. However, if you provide<br />

your own GROUPS file <strong>to</strong> <strong>the</strong> SUBFILES command, you cannot use BY, UP, DOWN, FREQUENCIES or EX-<br />

ACT in that SUBFILES command.<br />

If <strong>the</strong> LOCATE.GROUPS command has <strong>PPL</strong> which deletes cases, <strong>the</strong> GROUPS file no longer describes <strong>the</strong><br />

original input file. If you <strong>the</strong>n use SUBFILES with <strong>the</strong> new GROUPS file and <strong>the</strong> original input file, <strong>the</strong> cases<br />

selected will not be <strong>the</strong> correct cases. The solution is easy. Add <strong>the</strong> OUT identifier <strong>to</strong> <strong>the</strong> LOCATE.GROUPS<br />

command <strong>to</strong> produce a file that corresponds <strong>to</strong> <strong>the</strong> GROUPS file.<br />

LOCATE.GROUPS Myfile [IF Department LT 10, EXCLUDE],<br />

BY Sales State, GROUPS Mygroup, OUT Temp $<br />

SORT Mygroup, BY Sales (D) State (U), OUT Mygroup $<br />

SUBFILES Temp, GROUPS Mygroup $<br />

12.28 SUBFILES System Variables<br />

There are three system variables that are set by <strong>the</strong> SUBFILES command.<br />

1. .SUBFILEPASS.counts <strong>the</strong> number of times through <strong>the</strong> subfile loop.<br />

2. .SUBFILEMAX.<strong>the</strong> <strong>to</strong>tal number of iterations <strong>to</strong> be done. This is <strong>the</strong> same as <strong>the</strong> <strong>to</strong>tal number of<br />

groups.<br />

3. .SUBFILECASES<strong>the</strong> number of cases in <strong>the</strong> current group.<br />

These variables can be used <strong>to</strong> provide different paths depending on <strong>the</strong>ir values. The following is a simplistic<br />

macro which uses all three variables:<br />

MACRO Counter $<br />

GEN #Big $<br />

SUBFILES Myfile, BY Region $<br />

IF .SUBFILEPASS. EQ 1, SET #BIG = 0, PUT $<br />

IF .SUBFILECASES GT 260 INCREASE #BIG $<br />

IF .SUBFILEPASS. EQ .SUBFILEMAX.<br />

PUT #BIG >$<br />

ENDSUBFILES $<br />

ENDMACRO $


P-<strong>STAT</strong> MACROS 12.27<br />

MACRO<br />

SUMMARY<br />

The MACRO command provides a name for a collection of P-<strong>STAT</strong> text. There are two type of macros.<br />

Block macros contain one or more P-<strong>STAT</strong> commands. They are executed with <strong>the</strong> RUN command.<br />

Instream macros contains pieces of command, programming language (<strong>PPL</strong>), subcommands or data.<br />

A block macro is a named collection of P-<strong>STAT</strong> commands and data records. The MACRO command<br />

supplies a name for <strong>the</strong> macro and defines any macro arguments. The arguments can be ei<strong>the</strong>r keyword<br />

or positional. It is followed by one or more P-<strong>STAT</strong> commands, subcommands and data records. A macro<br />

is nei<strong>the</strong>r checked for syntax nor executed when it is defined.<br />

The RUN command executes <strong>the</strong> entire series of commands that comprise <strong>the</strong> macro. The RUN command<br />

passes <strong>the</strong> true values for each of <strong>the</strong> macro arguments. If <strong>the</strong> arguments are keywords, substitution<br />

is done whenever <strong>the</strong> keyword preceded by an ampersand (&month) is found in <strong>the</strong> macro text. When<br />

<strong>the</strong> arguments are positional, substitution is done for &1, &2, etc.<br />

File names in macros, prefaced with “MACFILE.”, are temporary files that disappear after <strong>the</strong> macro<br />

finishes.<br />

A set of macro definitions can be created and modified in a simple ASCII file using a text edi<strong>to</strong>r. The<br />

macros are <strong>the</strong>n made available <strong>to</strong> a P-<strong>STAT</strong> run by doing a TRANSFER <strong>to</strong> that file.<br />

A macro appears in <strong>the</strong> P-<strong>STAT</strong> edi<strong>to</strong>r as a single command. Its commands are s<strong>to</strong>red as data records <strong>to</strong><br />

<strong>the</strong> macro command. A macro can be edited just as any o<strong>the</strong>r command is edited. It must <strong>the</strong>n be reexecuted<br />

(X) from within <strong>the</strong> edi<strong>to</strong>r for <strong>the</strong> changes <strong>to</strong> take effect.<br />

Macros support both keyword and positional arguments. Default values can be provided. If defaults are<br />

not provided in <strong>the</strong> macro definition, values must be supplied when <strong>the</strong> macro is used.<br />

MACRO rrr ( file, vars) $<br />

CORRELATE &file [ KEEP &vars ], OUT work1$<br />

LIST work1 $<br />

ENDMACRO$<br />

Instream macros are executed by providing <strong>the</strong> name preceeded by !! (two exclamation points).<br />

MACRO survey.def $<br />

layout question <strong>to</strong>tals labels body missing summary,<br />

places means 3, places percents 2,<br />

row.<strong>to</strong>tals on right,<br />

ENDMACRO $<br />

SURVEY PsFile;<br />

!!survey.def<br />

BANNER Age Education, STUB Q1 TO Q43;<br />

$<br />

MACRO Sales ( Month, Region ) ( jan, east ) $<br />

MACRO Sales ( 2 ) ( jan, '' ) $<br />

RUN rrr ( data1, age income education ) $


12.28 P-<strong>STAT</strong> MACROS<br />

Required:<br />

MACRO name $<br />

Optional Identifier:<br />

PAD nn<br />

This specifies <strong>the</strong> default padding for instream records as <strong>the</strong>y are inserted.<br />

ENDMACRO<br />

RUN<br />

ENDMACRO $ ends <strong>the</strong> macro definition.<br />

RUN SALARY $<br />

RUN SALES ( Sept ) $<br />

The run command causes a block macro <strong>to</strong> be executed. Argument substitution is supported.<br />

FULL.MACRO.ARGS<br />

All arguments must be supplied when a macro is called. An argument can be a replacement value, a null<br />

value, or a comma if default values are available. This is <strong>the</strong> default.<br />

FULL.MACRO.ARGS OFF<br />

The commas for trailing arguments need not be supplied if defaults are available.<br />

COUNT.MACROS<br />

COUNT.MACROS simply reports how many macros have been read, how many had errors, and how<br />

many are usable. Since TRANSFER only reports each macro activation when verbosity is 4, using<br />

COUNT.MACROS after a transfer <strong>to</strong> a macro library gives a sense of what went on.<br />

SHOW.MACROS<br />

Optional:<br />

NAMES<br />

SHOW.MACROS can be used <strong>to</strong> display <strong>the</strong> currently activated macros. This prints <strong>the</strong> entire contents<br />

of <strong>the</strong> activated macros.<br />

SHOW.MACROS, NAMES $ can be used <strong>to</strong> list <strong>the</strong> names of <strong>the</strong> currently activated macros.


P-<strong>STAT</strong> MACROS 12.29<br />

FILE “fn”<br />

Name for an external file where <strong>the</strong> SHOW.MACRO command is <strong>to</strong> put its results.<br />

SHOW.MACRO, FILE “MyMacros” $<br />

MACRO.PAD nn<br />

is a command that specifies <strong>the</strong> padding default for macros activated subsequently. N can be zero (which<br />

set a no-pad status) or some larger integer, like 1 or 80.<br />

SUBFILES<br />

Required:<br />

begins a SUBFILES loop. The SUBFILES command can only be used within a macro.<br />

SUBFILES Myfile, BY County State, FREQUENCIES $ or<br />

SUBFILES Myfile, GROUPS Mygroups $<br />

SUBFILES fn<br />

provides <strong>the</strong> name of <strong>the</strong> P-<strong>STAT</strong> system file<br />

BY vn vn<br />

provides <strong>the</strong> names of <strong>the</strong> BY variables. Up <strong>to</strong> 15 BY variables may be cited. A SUBFILES loop is done<br />

for each subgroup that is defined by <strong>the</strong> different values of <strong>the</strong> BY variables. The groups are usually processed<br />

in <strong>the</strong> order in which <strong>the</strong>y occur in <strong>the</strong> input file.<br />

Optional Identifiers:<br />

DOWN<br />

EXACT<br />

specifies that <strong>the</strong> groups are <strong>to</strong> be organized in descending order of <strong>the</strong> BY group values or, if FRE-<br />

QUENCIES is used, by descending size.<br />

specifies that character variable must match not only in <strong>the</strong>ir spelling but also in <strong>the</strong> case of <strong>the</strong> characters<br />

<strong>to</strong> be considered as members of <strong>the</strong> same group.<br />

FREQUENCIES<br />

UP<br />

specifies that <strong>the</strong> groups are <strong>to</strong> be ordered by <strong>the</strong>ir frequencies. UP and DOWN can be used <strong>to</strong> control<br />

whe<strong>the</strong>r <strong>the</strong> largest or smallest group comes first.<br />

specifies that <strong>the</strong> groups are <strong>to</strong> be organized in ascending order of <strong>the</strong> BY group values or, if FREQUEN-<br />

CIES is also used, by ascending size.<br />

GROUPS fn<br />

provides <strong>the</strong> name of a file that was created by a previous LOCATE.GROUPS command. If GROUPS<br />

is used, none of <strong>the</strong> o<strong>the</strong>r identifiers can be used. The groups file contains all <strong>the</strong> relevant information.


12.30 P-<strong>STAT</strong> MACROS<br />

Subfiles System Variables<br />

.SUBFILEPASS.<br />

counts <strong>the</strong> number of times through <strong>the</strong> subfile loop.<br />

.SUBFILEMAX.<br />

<strong>the</strong> <strong>to</strong>tal number of iterations <strong>to</strong> be done. This is <strong>the</strong> same as <strong>the</strong> <strong>to</strong>tal number of groups.<br />

.SUBFILECASES.<br />

<strong>the</strong> number of cases in <strong>the</strong> current group.<br />

ENDSUBFILES<br />

ends a SUBFILES loop<br />

LOCATE.GROUPS<br />

Required:<br />

LOCATE.GROUPS reads a P-<strong>STAT</strong> system file and counts <strong>the</strong> number of cases in each of <strong>the</strong> subgroups<br />

that are defined by <strong>the</strong> BY variables. If <strong>the</strong> <strong>PPL</strong> deletes any of <strong>the</strong> cases, <strong>the</strong> OUT file should also be<br />

created and used with <strong>the</strong> GROUPS file in any subsequent SUBFILES commands.<br />

LOCATE.GROUPS Myfile [ IF Age LT 20, EXCLUDE ], OUT Myfile2<br />

GROUPS MyGroup, FREQUENCIES $<br />

LOCATE.GROUPS fn<br />

provides <strong>the</strong> name of <strong>the</strong> P-<strong>STAT</strong> system file<br />

BY vn vn<br />

provides <strong>the</strong> names of <strong>the</strong> BY variables. Up <strong>to</strong> 15 BY variables may be cited. A SUBFILES loop is done<br />

for each subgroup that is defined by <strong>the</strong> different values of <strong>the</strong> BY variables. The groups are usually processed<br />

in <strong>the</strong> order in which <strong>the</strong>y occur in <strong>the</strong> input file.<br />

Optional Identifiers:<br />

DOWN<br />

EXACT<br />

specifies that <strong>the</strong> groups are <strong>to</strong> be organized in descending order of <strong>the</strong> BY group values or, if FRE-<br />

QUENCIES is used, by descending size.<br />

specifies that character variable must match not only in <strong>the</strong>ir spelling but also in <strong>the</strong> case of <strong>the</strong> characters<br />

<strong>to</strong> be considered as members of <strong>the</strong> same group.<br />

FREQUENCIES<br />

specifies that <strong>the</strong> groups are <strong>to</strong> be ordered by <strong>the</strong>ir frequencies. UP and DOWN can be used <strong>to</strong> control<br />

whe<strong>the</strong>r <strong>the</strong> largest or smallest group comes first.


P-<strong>STAT</strong> MACROS 12.31<br />

GROUPS fn<br />

UP<br />

provides <strong>the</strong> name for an output file containing information about each subgroup. <strong>Inc</strong>luded are <strong>the</strong> frequencies<br />

for each subgroup and <strong>the</strong> locations of <strong>the</strong> first and last cases of <strong>the</strong> subgroup.<br />

specifies that <strong>the</strong> groups are <strong>to</strong> be organized in ascending order of <strong>the</strong> BY group values or, if FREQUEN-<br />

CIES is used, by ascending size.<br />

OUT fn<br />

provides <strong>the</strong> name for an output file which is <strong>the</strong> same as <strong>the</strong> input file after <strong>the</strong> <strong>PPL</strong>, if any, has been<br />

processed.<br />

INQUIRE.EXTERNAL<br />

Required:<br />

INQUIRE.EXTERNAL 'cs'<br />

provides <strong>the</strong> name of an external file in quotes. The results are returned in <strong>the</strong> system variable .XIN-<br />

QUIRE. which is set <strong>to</strong> one if <strong>the</strong> file is found and is zero if <strong>the</strong> file is not found.<br />

INQUIRE<br />

Required:<br />

INQUIRE fn<br />

provies <strong>the</strong> name of a P-<strong>STAT</strong> system file. The results are returned in <strong>the</strong> system variable .INQUIRE.<br />

which is set <strong>to</strong> one if <strong>the</strong> file is found and is zero if <strong>the</strong> file is not found.<br />

DIALOG<br />

DIALOG #Mon<br />

'-------------------------------------------------'<br />

'Enter <strong>the</strong> three letter abbreviation for <strong>the</strong> month'<br />

HELP 'Expected abbreviations include'<br />

'jan feb mar apr may jun jul aug sep oct nov dec' $<br />

The DIALOG command has a scratch variable and some number of lines of text enclosed in quotes. The<br />

scratch variable is required only if a reply is expected. Each line of text is displayed on a separate line<br />

on <strong>the</strong> terminal. The lines of text can contain scratch variables. If so, <strong>the</strong>ir current value is displayed.<br />

Optional HELP text is also part of <strong>the</strong> DIALOG command. This is not displayed unless <strong>the</strong> user requests<br />

it by entering ei<strong>the</strong>r "H" or 'HELP' in reply <strong>to</strong> <strong>the</strong> prompt. The keyword 'HELP' separates <strong>the</strong> normal<br />

DIALOG text from <strong>the</strong> HELP text.<br />

There are two mechanisms for examining a user reply. The first is <strong>the</strong> user reply which is s<strong>to</strong>red in <strong>the</strong><br />

DIALOG scratch variable. The second is a numeric system variable .RESPONSE. which contains a code<br />

indicating <strong>the</strong> type of <strong>the</strong> reply. .RESPONSE. is set each time DIALOG is executed. .RESPONSE. values<br />

are:<br />

negative: no response, or an invalid response:


12.32 P-<strong>STAT</strong> MACROS<br />

-2 = entirely blank<br />

-4 = H or HELP, but <strong>the</strong> dialog had no help text<br />

-6 = 'abc' for a numeric scratch variable, or such<br />

-8 = a scratch variable was not supplied<br />

-9 = in batch mode<br />

zero: <strong>the</strong> response was Q or QUIT<br />

positive: a valid response:<br />

1 = integer, like 1990<br />

2 = non-integer, like 3.1416<br />

11 = Y or YES<br />

12 = N or NO<br />

14 = character response o<strong>the</strong>r than yes/no/quit<br />

that is a legal p-stat name or label<br />

16 = o<strong>the</strong>r character response


i Index<br />

Symbols<br />

^ MATCHES meta-character 9.21<br />

? MATCHES meta-character 9.21<br />

? variable name wildcard 2.7, 2.20<br />

_ MATCHES meta-character 9.21<br />

- MATCHES meta-character 9.21<br />

- <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />

* MATCHES meta-character 9.21<br />

* <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />

** <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />

*/ comment ending 2.20, 3.20<br />

/ <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />

/* comment beginning 2.20, 3.20<br />

// concatenate 9.4, 9.31<br />

/// squeeze concatenate 9.5, 9.31<br />

\\ MATCHES meta-character 9.21<br />

& MACRO substitution 12.4<br />

&& concatenation of character constants<br />

9.23, 9.31<br />

# MATCHES meta-character 9.21<br />

# scratch variables 8.3<br />

+ MATCHES meta-character 9.21<br />

+ <strong>PPL</strong> numeric opera<strong>to</strong>r 2.9, 2.25<br />

< > MATCHES meta-character 9.21<br />

| MATCHES meta-character 9.21<br />

$ MATCHES meta-character 9.21<br />

0 + MATCHES meta-character 9.21<br />

0 1 MATCHES meta-character 9.21<br />

1 + MATCHES meta-character 9.21<br />

1 1 MATCHES meta-character 9.21<br />

SystemVariables<br />

.ALL. 3.8, 3.11, 11.14, 11.34<br />

.CDATE. 6.26<br />

.CHARACTER. 2.6, 6.17, 6.24<br />

.COLLECTIONS. 8.27, 8.30<br />

.COLLECTMAX. 8.27, 8.30<br />

.COLLECTMIN. 8.27, 8.30<br />

.COLLECTSIZE. 8.23, 8.27, 8.30<br />

.COLLECTSUM. 8.27, 8.30<br />

.CTIME. 6.26<br />

.DATE. 6.19, 6.24<br />

.e. 6.14, 6.24<br />

.FILE. 6.18, 6.24, 8.2<br />

.G. 2.12, 6.14, 6.24<br />

.HERE. 3.6, 6.16, 6.24<br />

.INQUIRE. 12.23, 12.31<br />

.M. 2.12, 6.15, 6.24<br />

system variable 2.24<br />

.M1., .M2., .M3. 6.15, 6.25<br />

.N. 3.6, 6.16, 6.25<br />

.NDATE. 6.19, 6.26<br />

.NEW. 2.6, 6.5, 6.25<br />

.NTIME. 6.19, 6.26<br />

.NUMERIC. 2.6, 6.17, 6.25<br />

.NV. 6.15, 6.25<br />

.ON. 2.3, 6.25<br />

.OTHERS. 2.6, 6.25<br />

.PAGE. 6.19, 6.25<br />

.PI. 6.14, 6.25<br />

.PUT. 3.9, 6.18, 6.25<br />

with system variables 8.11<br />

.RDATE. 6.26<br />

.RESPONSE. 12.21, 12.31<br />

.RTIME. 6.26<br />

.SUBFILECASES. 12.26<br />

.SUBFILEMAX. 12.26<br />

.SUBFILEPASS. 12.26<br />

.TIME. 6.19, 6.25<br />

.USED. 6.16, 6.26<br />

.XDATE. 6.26<br />

.XINQUIRE. 12.22, 12.31<br />

.XTIME. 6.26<br />

( ) MATCHES meta-character 9.21<br />

[ ] MATCHES meta-character 9.21<br />

PUT and TEXTWRITER Controls<br />

@<br />

in PUT and PUTL 3.9, 3.19<br />

in TEXTWRITER command 11.9, 11.33<br />

@ MATCHES meta-character 9.21<br />

@BEFORE<br />

in PUT and PUTL 3.19<br />

in TEXTWRITER command 11.10,<br />

11.33<br />

@COMMAS<br />

in PUT and PUTL 3.19<br />

in TEXTWRITER command 11.11,<br />

11.34<br />

@EQUAL<br />

in PUTL 3.11, 3.19<br />

in TEXTWRITER command 11.13,


Index ii<br />

11.34<br />

@INDENT<br />

in TEXTWRITER command 11.10,<br />

11.34<br />

@JUST<br />

in TEXTWRITER command 11.10,<br />

11.34<br />

@LABEL<br />

in PUT and PUTL 3.19<br />

in TEXTWRITER command 11.14<br />

@MINUS<br />

in PUT and PUTL 3.19<br />

in TEXTWRITER command 11.9, 11.35<br />

@MISS<br />

in PUT and PUTL 3.19<br />

in TEXTWRITER command 11.14,<br />

11.34<br />

@NEXT<br />

in PUT and PUTL 3.11, 3.13, 3.19<br />

in TEXTWRITER command 11.10,<br />

11.35<br />

@PAGE<br />

in PUT and PUTL 3.13<br />

in TEXTWRITER command 11.10,<br />

11.35<br />

@PARA<br />

in PUT and PUTL 3.19<br />

in TEXTWRITER command 11.10,<br />

11.35<br />

@PLACES<br />

in PUT and PUTL 3.19<br />

in TEXTWRITER command 11.12,<br />

11.35<br />

@PLUS<br />

in PUT and PUTL 3.19<br />

in TEXTWRITER command 11.8, 11.9,<br />

11.35<br />

@SKIP<br />

in PUT and PUTL 3.13, 3.19<br />

in TEXTWRITER command 11.8, 11.35<br />

@SPREAD<br />

in TEXTWRITER command 11.13,<br />

11.35<br />

@TRIM<br />

in PUT AND PUTL 3.19<br />

in TEXTWRITER command 11.10,<br />

11.35<br />

@WIDTH<br />

in TEXTWRITER command 11.10,<br />

11.36<br />

A<br />

ABS<br />

<strong>PPL</strong> function 6.2, 6.20<br />

Absolute value function 6.2<br />

ACOS<br />

<strong>PPL</strong> function 6.3, 6.20<br />

Add dates and times 10.21<br />

Add variables<br />

see GENERATE<br />

Addition opera<strong>to</strong>r + 2.9<br />

MATCHES meta-character 9.21<br />

ALL<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.23<br />

AMONG<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.15, 2.23, 9.31<br />

AND<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.13, 2.24<br />

ANY<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.24<br />

Arc cosine function 6.3<br />

Arc sine function 6.3<br />

Arc tangent function 6.3<br />

Arguments<br />

in a macro 12.3<br />

ARRAY Commands<br />

DEFINE.ARRAY 8.7<br />

DROP.ARRAY 8.7<br />

SHOW.ARRAYS 8.7<br />

summary 8.28<br />

Arrays<br />

multi-dimensional user-defined 8.1<br />

user defined 8.7<br />

ASIN<br />

<strong>PPL</strong> function 6.3, 6.21<br />

Asterisk<br />

MATCHES meta-character 9.21<br />

multiplication opera<strong>to</strong>r 2.9<br />

Asterisk, double<br />

exponentiation opera<strong>to</strong>r 2.9<br />

ATAN<br />

<strong>PPL</strong> function 6.3, 6.21


iii Index<br />

B<br />

Backslash<br />

MATCHES meta-character 9.21<br />

Bernoulli distribution 7.4<br />

Binary random number 7.1<br />

Binomial distribution 7.4<br />

inverse 7.5<br />

BLANK<br />

<strong>PPL</strong> function 9.10, 9.24<br />

BLANKS<br />

in TEXTWRITER command 11.7<br />

Brackets<br />

MATCHES meta-character 9.21<br />

BRANCH<br />

conditional execution 12.18<br />

BY<br />

in COLLECT function 8.20<br />

in LOCATE.GROUPS command 12.30<br />

in SUBFILES command 12.23, 12.29<br />

C<br />

C.TRANSPOSE 2.20<br />

CAPS<br />

<strong>PPL</strong> function 9.7, 9.25<br />

CARRY<br />

in COLLECT function 8.20<br />

in SPLIT function 8.13<br />

CASE<br />

in TEXTWRITER command 11.6<br />

CASES<br />

<strong>PPL</strong> instruction 2.1, 2.3, 2.21<br />

Ceiling function 6.2<br />

CENTER<br />

<strong>PPL</strong> function 9.25<br />

CHANGE<br />

<strong>PPL</strong> function 9.10, 9.25<br />

CHARACTER<br />

<strong>PPL</strong> function 9.13, 9.25<br />

Character constants<br />

concatenation with && 9.23, 9.31<br />

CHAREX<br />

<strong>PPL</strong> function 9.14, 9.26<br />

CHECK 1.3, 1.6<br />

Chi-square<br />

distribution 7.4<br />

inverse 7.5<br />

CLAG<br />

character function 6.8<br />

<strong>PPL</strong> function 9.23<br />

COLLECT<br />

<strong>PPL</strong> function 8.29<br />

BY option 8.20<br />

CARRY option 8.21<br />

COLLECT counter 8.20<br />

complex usage 8.22<br />

example 9.17<br />

INDEX option 8.21<br />

SORT option 8.21<br />

COMBINATIONS<br />

<strong>PPL</strong> function 6.11, 6.22<br />

Comments 3.20<br />

in <strong>PPL</strong> clauses 11.6<br />

within or between commands 3.14<br />

COMPARE 1.3, 1.6<br />

P-<strong>STAT</strong> system files 1.6<br />

COMPRESS<br />

<strong>PPL</strong> function 9.11, 9.26<br />

Concatenation<br />

of files on-<strong>the</strong>-fly 3.4<br />

opera<strong>to</strong>r // 9.4<br />

opera<strong>to</strong>r /// 9.5<br />

Conditional execution<br />

BRANCH 12.18<br />

CONTAINS<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.4, 9.32<br />

Control words<br />

in TEXTWRITER command 11.8<br />

COS<br />

<strong>PPL</strong> function 6.3, 6.20<br />

Cosine function 6.3<br />

COUNT.GOOD<br />

<strong>PPL</strong> function 6.7, 6.23, 9.3, 9.30<br />

COUNT.MACROS 12.11, 12.28<br />

CREATE<br />

in SPLIT function 8.14<br />

CURRENT<br />

<strong>PPL</strong> instruction 1.4, 1.6<br />

CURRENT.DATE function 10.4, 10.20<br />

CVAL<br />

<strong>PPL</strong> function 9.14, 9.26<br />

CYCLE<br />

in SPLIT function 8.17


Index iv<br />

D<br />

Data<br />

cleaning 3.8, 8.11<br />

DATE.LANGUAGE 10.14, 10.23<br />

DATE.ORDER 10.14, 10.24<br />

Dates 10.1–10.13<br />

adding 10.9, 10.21<br />

changing 10.11, 10.23<br />

difference between 10.12, 10.23<br />

extracting 10.10, 10.22<br />

logical opera<strong>to</strong>rs 10.16, 10.24<br />

simple functions 10.3<br />

DAY.MONTH.YEAR 10.3<br />

DAY.YEAR.MONTH 10.3<br />

MONTH.YEAR.DAY 10.4<br />

YEAR.DAY.MONTH 10.4<br />

YEAR.MONTH.DAY 10.4<br />

subtracting 10.9, 10.21<br />

DAY.WITHIN.WEEK function 10.7, 10.21<br />

DAY.WITHIN.YEAR function 10.7, 10.21<br />

DAYS<br />

<strong>PPL</strong> function 10.20<br />

Decimal places function 6.10<br />

DECREASE<br />

<strong>PPL</strong> instruction 2.8, 2.21<br />

DEFINE.ARRAY 8.7, 8.28<br />

DELETE<br />

<strong>PPL</strong> instruction 2.11, 2.21<br />

DES<br />

in MODIFY command 3.2<br />

DIALOG 12.19, 12.31<br />

DIF<br />

<strong>PPL</strong> function 6.8, 6.22<br />

DIF function 6.9<br />

Difference function 6.8<br />

Digit extraction function 6.10<br />

Distribution functions 7.4<br />

inverse 7.5<br />

Division opera<strong>to</strong>r / 2.9<br />

DO loops 5.1–5.11<br />

<strong>PPL</strong> instruction 5.22<br />

Double slash<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.4<br />

DOWN<br />

in LOCATE.GROUPS command 12.30<br />

in SUBFILES command 12.23, 12.29<br />

DROP<br />

<strong>PPL</strong> instruction 2.1, 2.3, 2.21<br />

DROP.ARRAY 8.7, 8.28<br />

DROP.P.VECTOR 8.29<br />

Dummy variables<br />

creating 6.4, 6.15<br />

recoding in<strong>to</strong> one variable 6.5<br />

E<br />

ECHO 12.14<br />

Econometrics<br />

LAG, DIF functions 6.8<br />

Enclosures<br />

in MATCHES opera<strong>to</strong>r 9.21<br />

ENDDO<br />

<strong>PPL</strong> instruction 5.23<br />

ENDIF<br />

<strong>PPL</strong> instruction 5.25<br />

ENDMACRO 12.1, 12.11, 12.28<br />

ENDSUBFILES 12.23, 12.23<br />

summary 12.30<br />

EQ<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.23<br />

Escape characters<br />

in MATCHES opera<strong>to</strong>r 9.21<br />

Escape codes, passing 9.14<br />

Exact string comparisons 2.12<br />

EXITDO<br />

<strong>PPL</strong> instruction 5.22<br />

EXP<br />

<strong>PPL</strong> function 6.3, 6.20<br />

EXPAND<br />

<strong>PPL</strong> function 6.11<br />

Exponentiation function 6.3<br />

Exponentiation opera<strong>to</strong>r ** 2.9<br />

F<br />

F distribution 7.4<br />

inverse 7.5<br />

FACTORIAL<br />

<strong>PPL</strong> function 6.3, 6.20<br />

FILE<br />

in SHOW.MACROS command 12.11,<br />

12.29<br />

FILE.IN 1.1<br />

FILES 12.15


v Index<br />

Filtering a file using <strong>PPL</strong> 2.11<br />

FIRST<br />

<strong>PPL</strong> function 6.23, 8.2, 8.10, 8.29<br />

FIRST.GOOD<br />

<strong>PPL</strong> function 6.7, 6.23, 9.3, 9.30<br />

FISCAL.QUARTER function 10.6, 10.21<br />

FISCAL.YEAR function 10.6, 10.21<br />

Floor function 6.2<br />

FOLD<br />

in LIST command 2.20<br />

FONT<br />

in TEXTWRITER command 11.21,<br />

11.36<br />

FONT1-FONT9<br />

in TEXTWRITER command 11.21<br />

Fonts<br />

changing<br />

in TEXTWRITER 11.22<br />

FRAC<br />

<strong>PPL</strong> function 6.2, 6.20<br />

Fractional portion function 6.2<br />

FREQUENCIES<br />

in LOCATE.GROUPS command 12.30<br />

in SUBFILES command 12.24<br />

FULL.MACRO.ARGS 12.11, 12.28<br />

FUZZ<br />

command 7.8, 7.12<br />

Fuzzy arithmetic 7.6, 7.12<br />

G<br />

GENERATE<br />

in DO loop 5.13, 5.23<br />

<strong>PPL</strong> instruction 2.1, 2.9, 2.21, 9.1<br />

GOOD<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.24, 9.32<br />

GOTO<br />

<strong>PPL</strong> instruction 3.7, 3.17<br />

GROUPS<br />

in LOCATE.GROUPS command 12.25<br />

in SUBFILES command 12.25<br />

GT opera<strong>to</strong>r 2.12<br />

H<br />

HEX<br />

<strong>PPL</strong> function 7.7, 7.11<br />

I<br />

IF<br />

<strong>PPL</strong> instruction 2.1, 2.11, 2.17, 2.21, 3.7,<br />

9.2<br />

IF-THEN-ELSE 5.14–5.18, 5.24<br />

INCREASE<br />

<strong>PPL</strong> instruction 2.8, 2.22<br />

INDEX<br />

in COLLECT function 8.21<br />

in SPLIT function 8.16<br />

INQUIRE<br />

determine existence of P-<strong>STAT</strong> system<br />

file 12.23, 12.31<br />

INQUIRE.EXTERNAL 12.22, 12.31<br />

INRANGE<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.24<br />

INT<br />

<strong>PPL</strong> function 6.2, 6.20<br />

Integer function 6.2<br />

interaction with IF 6.9<br />

INVBIN<br />

<strong>PPL</strong> function 7.5, 7.11<br />

INVCHI<br />

<strong>PPL</strong> function 7.5, 7.11<br />

Inverse probability functions 7.5<br />

binomial distribution 7.5<br />

chi-square distribution 7.5<br />

F distribution 7.5<br />

normal distribution 7.6<br />

Poisson distribution 7.6<br />

t distribution 7.6<br />

INVF<br />

<strong>PPL</strong> function 7.5, 7.11<br />

INVNORM<br />

<strong>PPL</strong> function 7.6, 7.11<br />

INVPOIS<br />

<strong>PPL</strong> function 7.6, 7.11<br />

INVT<br />

<strong>PPL</strong> function 7.6, 7.11<br />

IVAL<br />

<strong>PPL</strong> function 9.14, 9.27<br />

J<br />

JUSTIFY<br />

in TEXTWRITER command 11.7


Index vi<br />

K<br />

KEEP<br />

<strong>PPL</strong> instruction 2.1, 2.3, 2.5, 2.22<br />

L<br />

LABELS<br />

in TEXTWRITER command 11.8, 11.32<br />

Labels<br />

for <strong>PPL</strong> statements 3.7<br />

LAG<br />

<strong>PPL</strong> function 6.8, 6.22<br />

LAG function 6.9<br />

Lagging function 6.8<br />

LANDSCAPE<br />

in TEXTWRITER command 11.21,<br />

11.36<br />

LAST<br />

<strong>PPL</strong> function 6.23, 8.2, 8.10, 8.29<br />

LAST.GOOD<br />

<strong>PPL</strong> function 6.7, 6.23, 9.3, 9.30<br />

LE<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.12<br />

LEADBLANK<br />

in TEXTWRITER command 11.7, 11.32<br />

LEFT<br />

<strong>PPL</strong> function 9.27<br />

LEFT.EDGE<br />

in TEXTWRITER command 11.36<br />

LENGTH<br />

<strong>PPL</strong> function 9.8, 9.27<br />

LIST<br />

identifiers<br />

FOLD 2.20<br />

MAX.PLACES 6.10<br />

MIN.PLACES 6.10<br />

LOC<br />

<strong>PPL</strong> function 6.3, 6.20<br />

LOCATE.GROUPS 12.24<br />

identifiers<br />

BY 12.30<br />

DOWN 12.30<br />

EXACT 12.30<br />

FREQUENCIES 12.30<br />

OUT 12.26, 12.31<br />

UP 12.31<br />

summary 12.30<br />

Location function 6.3<br />

LOG<br />

<strong>PPL</strong> function 6.3, 6.20<br />

LOG10<br />

<strong>PPL</strong> function 6.3, 6.20<br />

Logarithm functions 6.3<br />

Logical opera<strong>to</strong>rs 2.23<br />

date/time 10.16<br />

LOWER<br />

<strong>PPL</strong> function 9.7, 9.27<br />

LPAD<br />

<strong>PPL</strong> function 9.27<br />

LRPAD<br />

<strong>PPL</strong> function 9.28<br />

LRTRIM<br />

<strong>PPL</strong> function 9.12, 9.29<br />

LT<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.12<br />

LTRIM<br />

<strong>PPL</strong> function 9.12, 9.29<br />

M<br />

MACRO 12.1, 12.11<br />

summary 12.27<br />

MACRO.PAD 12.11, 12.29<br />

Macros<br />

activating 12.2<br />

arguments 12.5<br />

default values 12.6<br />

keyword 12.4<br />

positional 12.4<br />

block 12.2<br />

executing 12.2<br />

calling o<strong>the</strong>r macros 12.7<br />

comments 12.3<br />

correcting in edi<strong>to</strong>r 12.11<br />

format of 12.1<br />

in stream 12.1<br />

in subcommands 12.8<br />

in <strong>the</strong> edi<strong>to</strong>r 12.11<br />

Scratch variable usage 12.14<br />

s<strong>to</strong>ring 12.2<br />

temporary files 12.15<br />

using RUN 12.14<br />

MAKE 1.1<br />

MAKE.CHARACTER 9.26


vii Index<br />

MAKE.DATE function 10.4, 10.20<br />

MAKE.NUMERIC 9.27<br />

MARGIN<br />

in TEXTWRITER command 11.7, 11.32<br />

MASK<br />

in case, variable selection 2.6<br />

in SPLIT instruction 8.15<br />

Masks<br />

Complex for GENERATE 5.13<br />

for RENAME and GENERATE 5.24<br />

MATCHES<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.18, 9.32<br />

meta-characters 9.20<br />

MAX<br />

<strong>PPL</strong> function 6.21<br />

MAX.GOOD<br />

<strong>PPL</strong> function 6.21<br />

MEAN<br />

<strong>PPL</strong> function 6.21<br />

MEAN.GOOD<br />

<strong>PPL</strong> function 6.21<br />

Meta-characters<br />

in MATCHES opera<strong>to</strong>r 9.20, 9.21<br />

MIN<br />

<strong>PPL</strong> function 6.21<br />

MIN.GOOD<br />

<strong>PPL</strong> function 6.21<br />

MISSING<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.24, 9.33<br />

MISSING1, MISSING2, MISSING3<br />

<strong>PPL</strong> opera<strong>to</strong>rs 2.13<br />

MOD<br />

<strong>PPL</strong> function 3.6, 6.9, 6.22<br />

MODIFY 1.3, 1.6, 3.2<br />

identifiers<br />

DES 3.2<br />

OUT 3.2, 3.16<br />

TEMPLATE 3.3, 3.16<br />

summary 3.16<br />

Modular function 6.9<br />

MONTH.CASE 10.15, 10.24<br />

MONTH.LENGTH 10.24<br />

MONTH.NAMES 10.15, 10.24<br />

MONTH.YEAR.DAY 10.3<br />

Multiplication opera<strong>to</strong>r * 2.9<br />

MATCHES meta-character 9.21<br />

N<br />

NAMES<br />

in SHOW.MACROS command 12.28<br />

NCOT<br />

<strong>PPL</strong> function 4.1, 4.14, 6.22<br />

NE<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.23<br />

NEAR<br />

<strong>PPL</strong> logical opera<strong>to</strong>r 7.8<br />

NEXTDO<br />

<strong>PPL</strong> instruction 5.23<br />

NO LEADBLANK<br />

in TEXTWRITER command 11.7<br />

NO SHOWPAGE<br />

in TEXTWRITER command 11.21<br />

NO SPREAD<br />

in TEXTWRITER command 11.7<br />

No-break character<br />

in TEXT.WRITER command 11.2<br />

Normal distribution 7.4<br />

inverse 7.6<br />

Normal random number 7.1<br />

NOTAMONG<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.15, 2.24, 9.33<br />

NOTNEAR<br />

<strong>PPL</strong> logical opera<strong>to</strong>r 7.8<br />

NTOKEN<br />

<strong>PPL</strong> function 9.10, 9.29<br />

NUMBER<br />

<strong>PPL</strong> function 9.13, 9.27<br />

NUMBER.E<br />

<strong>PPL</strong> function 9.13<br />

NUMBER.W<br />

<strong>PPL</strong> function 9.13<br />

NUMEX<br />

<strong>PPL</strong> function 6.10, 6.23<br />

O<br />

OR<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.24<br />

OUT<br />

in LOCATE.GROUPS command 12.26,<br />

12.31<br />

in MODIFY command 3.16<br />

in TEXTWRITER command 11.8, 11.32<br />

OUTRANGE


Index viii<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.25<br />

P<br />

P vec<strong>to</strong>r 1.2, 8.5<br />

PAD<br />

<strong>PPL</strong> function 9.27<br />

Paren<strong>the</strong>ses<br />

MATCHES meta-character 9.21<br />

Permanent vec<strong>to</strong>r<br />

see P vec<strong>to</strong>r<br />

Phrases<br />

<strong>PPL</strong> 2.1<br />

PLACES<br />

<strong>PPL</strong> function 6.10, 6.23<br />

Poisson distribution 7.4, 7.6<br />

PORTRAIT<br />

in TEXTWRITER command 11.20,<br />

11.36<br />

POSITION<br />

<strong>PPL</strong> function 9.8, 9.28<br />

Positional notation, variables 2.4, 2.8, 6.3<br />

POSTSCRIPT<br />

in TEXTWRITER command 11.20,<br />

11.36<br />

POSTSCRIPT.SETUP 11.21, 11.28<br />

<strong>PPL</strong> 1.1, 2.1, 3.1, 4.1, 6.1, 8.1, 9.1<br />

case, variable selection 2.1<br />

character variables 9.1<br />

comments 3.20<br />

concatenation, on-<strong>the</strong>-fly 3.4<br />

Date and time summary 10.20<br />

DO loops 5.1–5.11, 5.22<br />

exact comparisons of characters 2.12<br />

generating variables 2.7<br />

IF tests 3.7<br />

introduction 1.1<br />

logical selection 2.11, 2.17<br />

modifying variables 2.7, 3.1<br />

order of numeric opera<strong>to</strong>rs 2.10<br />

permanent vec<strong>to</strong>r 8.5<br />

phrases with <strong>PPL</strong> clauses 2.1<br />

scratch variables 8.3<br />

size constraints 2.3<br />

standalone commands 1.3, 1.6, 3.12, 3.17<br />

summary 1.5, 2.20, 3.16, 4.14, 5.22, 6.20,<br />

8.28, 9.24<br />

wildcard notation 2.6, 8.15, 8.24<br />

<strong>PPL</strong> command 1.6<br />

<strong>PPL</strong> Functions 2.10<br />

char and numeric<br />

COLLECT 8.19–8.27<br />

COUNT.GOOD 6.7, 9.3<br />

EXPAND 6.11<br />

FIRST 8.2, 8.10<br />

FIRST.GOOD 6.7, 9.3<br />

LAST 8.2, 8.10<br />

LAST.GOOD 6.7, 9.3<br />

RECODE 4.3<br />

SPLIT 8.12–8.19<br />

SPLIT * 8.24<br />

VARNAME 3.11<br />

XRECODE 4.5<br />

character 9.6<br />

BLANK 9.10<br />

CAPS 9.7<br />

CENTER 9.7<br />

CHANGE 9.10<br />

CHARACTER 9.14<br />

CHAREX 9.14<br />

CLAG 9.23<br />

COMPRESS 9.11<br />

CVAL 9.14<br />

IVAL 9.14<br />

LENGTH 9.8<br />

LOWER 9.7<br />

LPAD 9.12<br />

LRPAD 9.12<br />

LRTRIM 9.12<br />

LTOKEN 9.10<br />

LTRIM 9.12<br />

NTOKEN 9.10<br />

NUMBER 9.13<br />

NUMBER.E 9.13<br />

NUMBER.W 9.13<br />

PAD 9.12<br />

POSITION 9.8<br />

RIGHT 9.7<br />

RPAD 9.12<br />

RTOKEN 9.10<br />

RTRIM 9.12<br />

SIZE 9.8<br />

SUBSTRING 9.9


ix Index<br />

TOKEN 9.9<br />

TRIM 9.12<br />

UPPER 9.7<br />

VARNAME 9.17<br />

VERIFY 9.8<br />

XBLANK 9.10<br />

XCHANGE 9.10<br />

XPOSITION 9.8<br />

complex, nested 9.16<br />

Date/time<br />

ADD dates and times 10.9, 10.21<br />

CHANGE dates and times 10.11,<br />

10.22<br />

CURRENT.DATE 10.4<br />

DAY.MONTH.YEAR 10.3, 10.20<br />

DAY.WITHIN.WEEK 10.7, 10.21<br />

DAY.WITHIN.YEAR 10.7, 10.21<br />

DAY.YEAR.MONTH 10.3, 10.20<br />

DAYS 10.5<br />

DIF - compare dates and times 10.23<br />

Difference between dates/times 10.12<br />

EXTRACT dates and times 10.10,<br />

10.22<br />

FISCAL.QUARTER 10.6, 10.21<br />

FISCAL.YEAR 10.6, 10.21<br />

MAKE.DATE 10.4, 10.20<br />

MONTH.DAY.YEAR 10.20<br />

MONTH.YEAR.DAY 10.3, 10.4,<br />

10.20<br />

QUARTER 10.7, 10.21<br />

REFORMAT.DATE 10.5, 10.20<br />

SECONDS 10.6, 10.21<br />

SECONDS.MIDNIGHT 10.6, 10.21<br />

<strong>STAT</strong>US.DATE 10.5, 10.20<br />

SUBTRACT dates and times 10.9<br />

UNDO.DAYS 10.6, 10.21<br />

UNDO.SECONDS 10.6, 10.21<br />

WEEK.WITHIN.YEAR 10.7<br />

YEAR.DAY.MONTH 10.4, 10.20<br />

YEAR.MONTH.DAY 10.4, 10.20<br />

numeric<br />

ABS 6.2<br />

ACOS 6.3<br />

ASIN 6.3<br />

ATAN 6.3<br />

CEIL 6.2<br />

COMBINATIONS 6.11<br />

COS 6.3<br />

DIF 6.8<br />

EXP 6.3<br />

FACTORIAL 6.3<br />

FLOOR 6.2<br />

FRAC 6.2<br />

HEX 7.7, 7.11<br />

INT 6.2<br />

INVBIN 7.5<br />

INVCHI 7.5<br />

INVF 7.5<br />

INVNORM 7.6<br />

INVPOIS 7.6<br />

INVT 7.6<br />

LAG 6.8<br />

LOC 6.3<br />

LOG 6.3<br />

LOG10 6.3<br />

MOD 3.6, 6.9<br />

NCOT 4.1<br />

NUMEX 6.10<br />

PLACES 6.10<br />

PROBCHI 7.4<br />

PROBF 7.4<br />

PROBIN 7.4<br />

PROBIT 7.6<br />

PROBNORM 7.4<br />

PROBPOIS 7.4<br />

PROBT 7.5<br />

RANBIN 7.1<br />

RANNORM 3.7, 7.1<br />

RANTABLE 7.1<br />

RANUNI 7.1<br />

ROUND 6.2<br />

SIN 6.3<br />

SQRT 6.3<br />

STEP.DOWN 7.7, 7.12<br />

STEP.UP 7.7, 7.11<br />

STEPS 7.7, 7.12<br />

TAN 6.3<br />

<strong>PPL</strong> Instructions<br />

CASES 2.1, 2.3<br />

using ranges, TO 2.4<br />

with MASK 2.6<br />

CURRENT 1.4, 1.6


Index x<br />

DECREASE 2.8<br />

DELETE 2.11<br />

DO loops 5.1–5.11, 5.22<br />

DROP 2.1, 2.3<br />

using ranges, TO 2.4<br />

with MASK 2.6<br />

with wildcard 2.6<br />

ENDDO 5.23<br />

EXITDO 5.22<br />

GENERATE 2.1, 2.8, 5.23, 9.1<br />

GOTO 3.7<br />

IF 2.1, 2.11, 2.17, 3.7, 9.2<br />

missing data 2.18<br />

T F M prefixes 2.18<br />

IF-THEN-ELSE 5.24<br />

INCREASE 2.8<br />

KEEP 2.1, 2.3, 2.5<br />

using ranges, TO 2.4<br />

with MASK 2.6<br />

with wildcard 2.6<br />

NEXTDO 5.23<br />

PREVIOUS 1.3, 1.7<br />

PUT 3.8, 3.9, 11.2<br />

PUTL 3.11, 11.2<br />

QUITCOMMAND 3.14<br />

QUITFILE 3.14<br />

QUITRUN 3.14<br />

RENAME 2.19, 2.23, 5.23<br />

REPEAT 3.6, 7.2<br />

RETAIN 2.11<br />

SET 2.1, 2.7, 9.2<br />

<strong>PPL</strong> Opera<strong>to</strong>rs<br />

character 9.3<br />

// concatenate 9.4<br />

/// squeeze concatenate 9.5<br />

&& concatenation of character constants<br />

9.23<br />

CONTAINS 9.4<br />

MATCHES 9.18<br />

XAMONG 2.16<br />

XCONTAINS 9.4<br />

XEQ 2.12, 9.5<br />

XMATCHES 9.18<br />

XNOTAMONG 2.16<br />

logical 2.12<br />

ALL 2.16<br />

AMONG 2.15<br />

AND 2.13<br />

ANY 2.16<br />

DATE.EQ 10.16<br />

DATE.GE 10.16<br />

DATE.GT 10.16<br />

DATE.LE 10.16<br />

DATE.LT 10.16<br />

DATE.NE 10.16<br />

EQ 2.12, 7.7<br />

fuzzy 7.8<br />

GE 7.7<br />

GOOD 2.12<br />

GT 2.12, 7.7<br />

INRANGE 2.16<br />

LE 2.12, 7.7<br />

LT 2.12, 7.7<br />

MISSING 2.12<br />

NE 2.12, 7.7<br />

NEAR 7.8, 7.12<br />

NOTAMONG 2.15<br />

NOTNEAR 7.8, 7.12<br />

OR 2.13<br />

OUTRANGE 2.16<br />

numeric 2.9<br />

- 2.9<br />

* 2.9<br />

** 2.9<br />

/ 2.9<br />

+ 2.9<br />

<strong>PPL</strong> System Variables<br />

.ALL. 3.8, 3.11, 11.14, 11.34<br />

.CHARACTER. 2.6, 6.17<br />

.DATE. 6.19<br />

.e. 6.14<br />

.FILE. 6.18, 8.2<br />

.G. 2.12, 6.14<br />

.HERE. 3.6, 6.16<br />

.M. 2.12, 6.15<br />

.M1., .M2., .M3. 6.15<br />

.N. 3.6, 6.16<br />

.NDATE. 6.19<br />

.NEW. 2.6, 6.5<br />

.NUMERIC. 2.6, 6.17<br />

.NV. 6.15<br />

.ON. 2.3


xi Index<br />

.OTHERS. 2.6, 6.15<br />

.PAGE. 6.19<br />

.PI. 6.14<br />

.PUT. 3.9, 6.18, 8.11<br />

.REPEAT. 6.19<br />

.RESPONSE. 12.21<br />

.SUBFILECASES. 12.26<br />

.SUBFILEMAX. 12.26<br />

.SUBFILEPASS> 12.26<br />

.TIME. 6.19<br />

.USED. 6.16<br />

.XINQUIRE. 12.22<br />

PREVIOUS<br />

<strong>PPL</strong> instruction 1.3, 1.7<br />

Probability functions 7.4<br />

PROBBIN<br />

<strong>PPL</strong> function 7.4, 7.10<br />

PROBCHI<br />

<strong>PPL</strong> function 7.4, 7.10<br />

PROBF<br />

<strong>PPL</strong> function 7.4, 7.10<br />

PROBIT<br />

<strong>PPL</strong> function 7.6<br />

Probit distribution 7.6<br />

PROBNORM<br />

<strong>PPL</strong> function 7.4, 7.10<br />

PROBPOIS<br />

<strong>PPL</strong> function 7.4, 7.10<br />

PROBT<br />

<strong>PPL</strong> function 7.5, 7.10<br />

PROCESS 1.3, 1.6, 3.9, 3.13<br />

summary 3.17<br />

P-<strong>STAT</strong> <strong>Programming</strong> <strong>Language</strong><br />

see <strong>PPL</strong><br />

P-<strong>STAT</strong> system file<br />

previous or current version 1.3<br />

PUT<br />

in TEXTWRITER command 11.2<br />

<strong>PPL</strong> instruction 3.8, 3.9, 3.19<br />

PUTL<br />

in TEXTWRITER command 11.2<br />

<strong>PPL</strong> instruction 3.11<br />

PUTL.CHARS<br />

in TEXTWRITER command 11.32<br />

Q<br />

QUARTER function 10.7, 10.21<br />

QUITCOMMAND<br />

<strong>PPL</strong> instruction 3.14, 3.18<br />

QUITFILE<br />

<strong>PPL</strong> instruction 3.14, 3.18<br />

QUITRUN<br />

<strong>PPL</strong> instruction 3.14, 3.18<br />

R<br />

RANBIN<br />

<strong>PPL</strong> function 7.1, 7.9<br />

Random<br />

assignment 7.3<br />

data generation 7.1<br />

number functions 7.1<br />

sampling 7.2<br />

with replacement 7.2<br />

RANNORM<br />

<strong>PPL</strong> function 3.7, 7.1, 7.9<br />

RANTABLE<br />

<strong>PPL</strong> function 7.1, 7.9<br />

RANUNI<br />

<strong>PPL</strong> function 7.1, 7.9<br />

RECODE<br />

arguments 4.6<br />

complex 4.6<br />

exact matches with XRECODE 4.12<br />

<strong>PPL</strong> function 4.3, 4.14, 6.24<br />

tests 4.6<br />

REFORMAT.DATE function 10.5, 10.20<br />

RENAME<br />

<strong>PPL</strong> instruction 2.22, 5.23<br />

variables 2.19, 2.23<br />

REPEAT<br />

<strong>PPL</strong> instruction 3.6, 3.18, 7.2<br />

Report writing 3.9<br />

RETAIN<br />

<strong>PPL</strong> instruction 2.11, 2.22<br />

RIGHT<br />

<strong>PPL</strong> function 9.7, 9.28<br />

ROUND<br />

<strong>PPL</strong> function 6.2, 6.21<br />

Rounding function 6.2<br />

RTOKEN<br />

<strong>PPL</strong> function 9.10, 9.29


Index xii<br />

RTRIM<br />

<strong>PPL</strong> function 9.12<br />

RUN 12.2, 12.11, 12.28<br />

S<br />

Scratch variables 1.2, 3.12, 8.3<br />

in SUBFILES command 12.23<br />

SDEV<br />

<strong>PPL</strong> function 6.21<br />

SDEV.GOOD<br />

<strong>PPL</strong> function 6.22<br />

SECONDS function 10.6, 10.21<br />

SECONDS.MIDNIGHT function 10.6, 10.21<br />

SET<br />

<strong>PPL</strong> instruction 2.1, 2.7, 2.22, 9.2<br />

SHOW.ARRAYS 8.7, 8.28<br />

SHOW.MACROS 12.11, 12.28, 12.29<br />

identifiers<br />

FILE 12.11, 12.29<br />

NAMES 12.28<br />

SHOWPAGE<br />

in TEXTWRITER command 11.21,<br />

11.37<br />

SIN<br />

<strong>PPL</strong> function 6.3, 6.21<br />

Sine function 6.3<br />

SIZE<br />

<strong>PPL</strong> function 9.8, 9.28<br />

Size constraints<br />

<strong>PPL</strong> modifications 2.3<br />

Slash<br />

division opera<strong>to</strong>r 2.9<br />

Slash, back<br />

in MATCHES opera<strong>to</strong>r 9.21<br />

Slash, double<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.4<br />

Slash, triple<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.5<br />

SORT<br />

in COLLECT function 8.21<br />

SPLIT<br />

example 9.18<br />

<strong>PPL</strong> function 8.30<br />

CARRY option 8.13<br />

CREATE option 8.14<br />

CYCLE option 8.17<br />

INDEX option 8.16<br />

SPLIT * 8.24<br />

STEP option 8.17<br />

USE option 8.14<br />

SPREAD<br />

in TEXTWRITER command 11.7, 11.32<br />

SPSS.IN 1.1<br />

SQRT<br />

<strong>PPL</strong> function 6.3, 6.21<br />

Square root function 6.3<br />

Standalone <strong>PPL</strong> commands 1.3, 1.6, 3.12,<br />

3.17<br />

<strong>STAT</strong>US.DATE function 10.5, 10.20<br />

STEP<br />

in SPLIT function 8.17<br />

STEP.DOWN<br />

<strong>PPL</strong> function 7.7, 7.12<br />

STEP.UP<br />

<strong>PPL</strong> function 7.7, 7.11<br />

STEPS<br />

<strong>PPL</strong> function 7.7, 7.12<br />

STREAM<br />

in TEXTWRITER command 11.7, 11.32<br />

SUBFILES 12.23, 12.23<br />

identifiers<br />

BY 12.23, 12.29<br />

DOWN 12.23, 12.29<br />

EXACT 12.29<br />

FREQUENCIES 12.24, 12.29<br />

GROUPS 12.25, 12.29<br />

UP 12.23, 12.29<br />

use of scratch variables 12.23<br />

SUBSTRING<br />

<strong>PPL</strong> function 9.9, 9.28<br />

SUBTRACT dates and times 10.21<br />

Subtraction opera<strong>to</strong>r - 2.9<br />

Subtraction sign<br />

in MATCHES opera<strong>to</strong>r 9.21<br />

SUM<br />

<strong>PPL</strong> function 6.22<br />

SUM.GOOD<br />

<strong>PPL</strong> function 6.22<br />

System files<br />

previous or current version 1.3<br />

template 3.3<br />

System variables 6.1


xiii Index<br />

.M. 2.24<br />

T<br />

t distribution 7.5<br />

inverse 7.6<br />

Tabled random number 7.1<br />

TAN<br />

<strong>PPL</strong> function 6.3, 6.21<br />

Tangent function 6.3<br />

TEMPLATE<br />

in MODIFY command 3.3, 3.16<br />

TEXTFILE.IN 1.1<br />

TEXTWRITER 1.3, 1.6, 11.1<br />

comments 11.6<br />

control words 11.8<br />

@ 11.9, 11.33<br />

@BEFORE 11.10, 11.33<br />

@BLACK 11.28<br />

@BLUE 11.28<br />

@BOTTOM 11.27<br />

@CINCH 11.24<br />

@CINCH.U 11.25<br />

@COMMAS 11.11, 11.34<br />

@DOWN 11.27<br />

@DRAW.BOX 11.26<br />

@DRAW.H 11.26<br />

@DRAW.U 11.26, 11.38<br />

@DRAW.V 11.26<br />

@EQUAL 11.13, 11.34<br />

@FLUSH 11.26<br />

@FONT1-@FONT9 11.22, 11.37<br />

@GREEN 11.28<br />

@INDENT 11.10, 11.34<br />

@JUST 11.10, 11.34<br />

@L.MARGIN 11.27<br />

@LABEL 11.14<br />

@LEADING 11.27<br />

@LINCH 11.25<br />

@LINCH.U 11.25<br />

@LINEWIDTH 11.27<br />

@MINUS 11.9, 11.35<br />

@MISS 11.14, 11.34<br />

@MOVETO 11.26<br />

@NEXT 11.10, 11.35<br />

@NOCOLOR 11.28<br />

@NOUNDERLINE 11.29<br />

@ORANGE 11.28<br />

@PAGE 11.10, 11.35<br />

@PARA 11.10, 11.35<br />

@PINCH 11.25<br />

@PINCH.CHAR 11.26<br />

@PINCH.U 11.26<br />

@PLACES 11.12, 11.35<br />

@PLUS 11.8, 11.9, 11.35<br />

@R.MARGIN 11.27<br />

@RED 11.28<br />

@RINCH 11.25<br />

@RINCH.U 11.25<br />

@SKIP 11.8, 11.35<br />

@SPREAD 11.13, 11.35<br />

@TOP 11.27<br />

@TRIM 11.10, 11.35<br />

@UNDERLINE 11.29<br />

@UP 11.27<br />

@VIOLET 11.28<br />

@WIDTH 11.10, 11.36<br />

@X1 11.26<br />

@X2 11.26<br />

@Y1 11.26<br />

@Y2 11.26<br />

@YELLOW 11.28<br />

identifiers<br />

BLANKS 11.7, 11.32<br />

BOTTOM.EDGE 11.21, 11.36<br />

CASE 11.6, 11.32<br />

FONT 11.21, 11.36<br />

FONT1-FONT9 11.21, 11.36<br />

JUSTIFY 11.7, 11.32<br />

LABELS 11.8, 11.32<br />

LANDSCAPE 11.21, 11.36<br />

LEADBLANK 11.7, 11.32<br />

LEFT.EDGE 11.21, 11.36<br />

MARGIN 11.7, 11.32<br />

NO LEADBLANK 11.7<br />

NO SHWPAGE 11.21<br />

NO SPREAD 11.7<br />

OUT 11.8, 11.32<br />

PORTRAIT 11.20, 11.36<br />

POSTSCRIPT 11.20, 11.36<br />

PUTL.CHARS 11.32<br />

RIGHT.EDGE 11.21<br />

SHOWPAGE 11.21, 11.37


Index xiv<br />

SPREAD 11.7, 11.32<br />

STREAM 11.7, 11.32<br />

TOP.EDGE 11.21, 11.37<br />

WIDTH 11.8, 11.33, 11.36<br />

justification 11.2<br />

no-break character 11.2<br />

PUT instructions 11.2<br />

summary 11.31<br />

Time functions 10.1–10.13<br />

TITLES<br />

system variables, use of 6.26<br />

TOKEN<br />

<strong>PPL</strong> function 9.9, 9.10, 9.29<br />

TOP.EDGE<br />

in TEXTWRITER command 11.37<br />

TRIM<br />

<strong>PPL</strong> function 9.29<br />

Triple slash<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.5<br />

U<br />

UNDO.DAYS function 10.6, 10.21<br />

UNDO.SECONDS function 10.6, 10.21<br />

Uniform random number 7.1<br />

UP<br />

in SUBFILES command 12.23<br />

UPPER<br />

<strong>PPL</strong> function 9.7, 9.30<br />

USE<br />

in SPLIT function 8.14<br />

using /* and */ 3.20<br />

V<br />

V vec<strong>to</strong>r 1.2<br />

Variables<br />

3 types 1.2<br />

accessing names 3.11<br />

across command (P Vec<strong>to</strong>r) 8.5<br />

across-case (scratch) 8.3<br />

generating names 5.8<br />

in P-<strong>STAT</strong> file 1.2, 1.5<br />

positional notation 2.4, 2.8, 6.3<br />

reordering 2.5<br />

scratch 1.2, 1.5<br />

system 1.2, 1.5<br />

VARNAME<br />

<strong>PPL</strong> function 9.17, 9.30<br />

Vec<strong>to</strong>rs<br />

dynamic 1.3, 1.5<br />

P 1.2, 1.5<br />

V 1.2, 1.5<br />

VERIFY<br />

<strong>PPL</strong> function 9.8, 9.30<br />

W<br />

WEEK.WITHIN. YEAR function 10.7<br />

WEEKDAY.CASE 10.15, 10.24<br />

WEEKDAY.LENGTH 10.24<br />

WEEKDAY.NAMES 10.15, 10.24<br />

Weighting<br />

integer 3.7<br />

WIDTH<br />

in TEXTWRITER command 11.8, 11.33,<br />

11.36<br />

Wildcard<br />

in MATCHES meta-characters 9.19<br />

in <strong>PPL</strong> instructions 8.15, 8.24<br />

in SPLIT instruction 8.15<br />

in variable selection 2.6<br />

X<br />

XAMONG<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.24, 9.3, 9.32<br />

XBLANK<br />

<strong>PPL</strong> function 9.10, 9.24<br />

XCHANGE<br />

<strong>PPL</strong> function 9.10, 9.25<br />

XCONTAINS<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.4, 9.32<br />

XEQ<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.12, 2.23, 9.2, 9.5, 9.32<br />

XGE<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.3<br />

XGT<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.3<br />

XLE<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.3<br />

XLT<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.3<br />

XMATCHES<br />

<strong>PPL</strong> opera<strong>to</strong>r 9.18, 9.33<br />

XNE


xv Index<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.23, 9.3<br />

XNOTAMONG<br />

<strong>PPL</strong> opera<strong>to</strong>r 2.16, 2.24, 9.3, 9.33<br />

XPOSITION<br />

<strong>PPL</strong> function 9.8, 9.28<br />

XRECODE 4.12<br />

<strong>PPL</strong> function 4.5, 4.15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!