A Guide to the Language (PPL) P-STAT Programming - P-STAT, Inc.

P-STAT 

A Guide to the 

P-STAT Programming 

Language (PPL) 

® 

$C.1 

P-STAT 

®

P-STAT: A Guide to the P-STAT Programming Language (PPL), 

Second Edition January 2013 

This publication corresponds to P-STAT Version 3, January 2013. This publication is designed for those 

already familiar with the P-STAT system, either from the menu or the command language interface and is 

intended to be a complete description of the programming language. 

Please direct any questions to: 

P-STAT, Inc. 

230 Lambertville-Hopewell Rd. 

Hopewell, New Jersey 08525-2809 

U.S.A. 

Telephone: 609-466-9200 

Fax: 609-466-1688 

Internet: support@pstat.com 

Web Page URL: http://www.pstat.com 

All rights reserved. Except as permitted under the United States Copyright Act of 1976, no part of this 

publication may be reproduced or distributed in any form or by any means, electronic or mechanical, 

including photocopying, recording, or any information storage and retrieval system without the prior written 

permission of P-STAT, Inc. 

P-STAT is a registered trademark of P-STAT, Inc. Windows is a registered trademark of MicroSoft Corp. 

Copyright © 1972-2013 P-STAT, Inc. Printed in the US. Published by P-STAT, Inc.

PPL: Introduction 

i 

CONTENTS 

THE VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2 

VECTORS AND ARRAYS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2 

THE COMMANDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.3 

P-STAT SYSTEM FILE: CURRENT OR PREVIOUS. . . . . . . . . . . . . . . . . . . . . . .1.3 

ORGANIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.4 

PPL: Basics of the Programming Language 

CASE AND VARIABLE SELECTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 

Case Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 

Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 

Using Ranges in Selection Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.4 

Multiple Variable Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.5 

Reordering Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.5 

Masks and Wildcards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.6 

MODIFYING AND GENERATING VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . .2.7 

Modifying Variables with SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.7 

Using INCREASE and DECREASE Instead of SET . . . . . . . . . . . . . . . . . . . . .2.8 

Creating New Variables with GENERATE. . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.9 

Numeric Operators and their Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.9 

Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.10 

LOGICAL SELECTION OF CASES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.11 

Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.12 

The Special Operators MISSING and GOOD. . . . . . . . . . . . . . . . . . . . . . . . . .2.12 

AND and OR Relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.13 

Common Errors in Complex Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.14 

AMONG and NOTAMONG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.15 

MISSING DATA with AMONG and NOTAMONG . . . . . . . . . . . . . . . . . . . .2.16 

INRANGE and OUTRANGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.16 

ANY and ALL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.16 

INSTRUCTIONS AFTER IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.17 

Conditional Case Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.18

Conditional Modification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.18 

Three-Way Logic of IF Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.18 

Renaming Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.19 

PPL: MODIFY, PROCESS and PUT 

FILE MODIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 

How Modifications Are Processed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 

Temporary Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2 

Permanent Modifications and the MODIFY Command . . . . . . . . . . . . . . . . . . .3.2 

TEMPLATE Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3 

On-the-Fly Concatenation of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.4 

Repeating Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.6 

OTHER INSTRUCTIONS AFTER IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.7 

GOTO To Process Modifications Selectively . . . . . . . . . . . . . . . . . . . . . . . . . . .3.7 

Cleaning Data With PUT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.8 

Report Writing Using PUT and PUTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.9 

STANDALONE PPL COMMANDS AND PROCESS . . . . . . . . . . . . . . . . . . . . . .3.12 

Scratch Variables and Standalone PPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.12 

The PROCESS Command and More PUT Information . . . . . . . . . . . . . . . . . .3.13 

COMMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.14 

QUITTING A PROCESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.14 

PPL: NCOT and RECODE 

The NCOT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.1 

The RECODE Function: Single Argument Usage . . . . . . . . . . . . . . . . . . . . . . .4.3 

COMPLEX RECODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6 

RECODE: The Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6 

The RECODE Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.6 

Defining a Set of Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.8 

The Result Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.9 

RECODE or IF/SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.10 

RECODE Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.11 

XRECODE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.12 

PPL: DO LOOPS and IF-THEN-ELSE Blocks 

DO LOOPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.1 

DO USING a Variable List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.2 

DO Stepping Through a Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.4 

ii

DO Loops: Other Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.6 

GENERATE AND RENAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.7 

Using GENERATE in DO Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.8 

Using RENAME in DO Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.9 

Masks for RENAME and GENERATE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.9 

IF-THEN-ELSE BLOCKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.14 

IF-THEN-ELSE: Other Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.15 

IF-THEN-ELSE: Another Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.18 

PPL: Functions and System Variables 

ONE-EXPRESSION FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.1 

Rounding Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2 

Floor and Ceiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2 

Exponential and Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.3 

The Factorial Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.3 

Creating Dummy Variables with the LOC Function . . . . . . . . . . . . . . . . . . . . . .6.3 

Creating a Single Variable from Dummy Variables . . . . . . . . . . . . . . . . . . . . . .6.5 

LIST FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.6 

Numeric List Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.6 

Character and Numeric List Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.7 

SPECIAL FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.7 

The LAG and DIF Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.8 

Modular (Remainder) Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.9 

Setting PLACES in Specific Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.10 

Extracting Digits Using NUMEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.10 

COMBINATIONS of N things, K at a time . . . . . . . . . . . . . . . . . . . . . . . . . . .6.11 

EXPAND ONE OR MORE VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.11 

Overall Syntax of a PPL EXPAND Statement . . . . . . . . . . . . . . . . . . . . . . . . .6.12 

Numeric Input Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.12 

Character Input Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.12 

The GENERATE or GEN phrase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.12 

Options With Several Input Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.13 

Options When the Input Variables Are Character . . . . . . . . . . . . . . . . . . . . . . .6.13 

SYSTEM VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.14 

Referencing Good and Missing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.14 

Selecting Variables with .NEW. and .OTHERS.. . . . . . . . . . . . . . . . . . . . . . . .6.15 

Referencing the Number of Variables in the File . . . . . . . . . . . . . . . . . . . . . . .6.15 

Referencing the Current Case Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.16 

Referencing Numeric and Character Variables . . . . . . . . . . . . . . . . . . . . . . . . .6.17 

iii

Accessing the PUT Counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.18 

File, Date, Page and Line References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.18 

Random Number and Distribution Functions 

RANDOM NUMBER FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.1 

Normal and Uniform Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.2 

Binary and User's Tabled Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.3 

DISTRIBUTION FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.3 

Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.4 

Inverse Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.5 

THE FUZZY EQUALS PROBLEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.6 

The Fuzzy Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.7 

Fuzzy Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.7 

How Fuzzy Operators Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.8 

FUZZY Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7.8 

PPL: Across-Case Modifications 

BASIC ACROSS-CASE AGGREGATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.2 

Accessing FIRST and LAST Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.2 

Scratch Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.3 

The Permanent Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.5 

User-defined Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.7 

Interaction of FIRST, LAST and Other PPL . . . . . . . . . . . . . . . . . . . . . . . . . . .8.10 

Example: Checking a List of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.11 

Example: Selecting a Block of Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12 

THE SPLIT FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12 

Splitting a Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.12 

CARRYing Identifying Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.13 

Selecting Variables To USE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.14 

Defining New Variables with CREATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.14 

Wildcard Notation and Masks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.15 

INDEXing Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.16 

Ordering Variables with STEP and CYCLE . . . . . . . . . . . . . . . . . . . . . . . . . . .8.17 

How SPLIT Interacts With Other PPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.19 

THE COLLECT FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.19 

Collecting BY Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.20 

CARRYing Common Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.21 

Ordering Cases with INDEX and SORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.21 

COLLECT System Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.27 

iv

PPL: Modification of Character Variables 

BASIC CHARACTER PROCEDURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.1 

Generating New Character Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.1 

Modifying Existing Character Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.2 

Logical Selection of Character Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.2 

Locating Non-Missing Character Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.3 

CHARACTER OPERATORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.3 

The CONTAINS and XCONTAINS Operators . . . . . . . . . . . . . . . . . . . . . . . . .9.4 

The Concatenate Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.4 

The Trim Concatenate Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.5 

Exactly Equal Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.5 

CHARACTER FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.6 

Centering and Justifying Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.7 

Changing the Case of Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.7 

Length and Size of Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.8 

Locating Strings Within Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.8 

Extracting Substrings and Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.9 

Blanking Out and Changing Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.10 

Squeezing Out Specified Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.11 

Trimming Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.12 

Padding Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.12 

Converting Numbers to Characters and Vice Versa . . . . . . . . . . . . . . . . . . . . .9.13 

Character/Integer Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.14 

Complex Character Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.16 

Using the Name of a Variable as a Character Value . . . . . . . . . . . . . . . . . . . . .9.17 

The MATCHES and XMATCHES Operators. . . . . . . . . . . . . . . . . . . . . . . . . .9.18 

MATCHES: Meta-Characters and Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.20 

CLAG: A Lag using a character argument . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.23 

CONCATENATION OF CHARACTER CONSTANTS . . . . . . . . . . . . . . . . . . . .9.23 

PPL: Date and Time Commands and Functions 

DATE ANDTIME FUNCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.1 

Functions Which create or Use Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.1 

Six Simple Date Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.3 

DATE and TIME function details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.4 

DATE AND TIME COMMANDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.13 

The DATE.LANGUAGE Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.14 

The DATE.ORDER Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.14 

v

Changing the Case and Length of names. . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.15 

Month and Weekday Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.15 

DATE LOGICAL OPERATORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.16 

FORMAT.DATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10.17 

TEXTWRITER: Report Writing 

OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1 

Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2 

The “No-Break” Character . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2 

PPL INSTRUCTIONS PUT AND PUTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2 

Character Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3 

Values of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3 

Expressions and Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4 

A Sample Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.4 

Comments in PPL Clauses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.6 

OPTIONAL IDENTIFIERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.6 

CASE and STREAM: The Modes of Operation . . . . . . . . . . . . . . . . . . . . . . . .11.6 

JUSTIFY, BLANKS, PUTL.CHAR and SPREAD. . . . . . . . . . . . . . . . . . . . . .11.7 

MARGIN, LEADBLANK and WIDTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.7 

Optional Files: LABELS and OUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.8 

CONTROL WORDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.8 

Control Words to Produce a Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.8 

Positioning Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.9 

Positioning Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.10 

Positioning Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.10 

Labeling Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.13 

Specifying Missing Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.14 

A Complex Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.15 

Control Word Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.18 

COMPARING TEXTWRITER AND OTHER COMMANDS . . . . . . . . . . . . . .11.19 

OPTIONAL IDENTIFIERS: PostScript. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.20 

PostScript Page Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.21 

Setting the Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.21 

TEXTWRITER Control Words: The Fonts. . . . . . . . . . . . . . . . . . . . . . . . . . .11.22 

Control Words: Positioning the Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.24 

Indenting Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.27 

Colors in PostScript Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.28 

Underlining Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.29 

vi

P-STAT MACROS 

MACRO FORMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.1 

Types of Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.1 

Storing and Activating Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.2 

Comments Within a Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.3 

Macros With Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.3 

Using Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.5 

Default Values for Arguments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.6 

Nested Instream Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.7 

Instream Macros in a Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.8 

Instream Macros in Subcommands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.8 

Using Lots of Instream Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.10 

MACRO COMMANDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.11 

CORRECTING MACROS IN THE EDITOR . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.11 

BLOCK MACROS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.12 

Executing a Block Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.12 

Macro Substitution Using Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.14 

Scope of Temporary Scratch Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.14 

Scratch Variables and Nested Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.14 

Temporary Files in Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.15 

Subcommands in Macros. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.15 

Conditional Execution of Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.18 

DIALOG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.19 

Format of the DIALOG command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.21 

Does the File Exist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.22 

SUBFILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.23 

SUBFILES Optional Identifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.23 

SUBFILES Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.25 

SUBFILES System Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12.26 

vii

viii

ix 

FIGURES 

Basic Types of Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 

Format of the SET Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 

Format of the IF Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 

AND and OR: Evaluations of Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.13 

IF and Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.17 

Permanent Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 

Template Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 

Renaming All the Variables in a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 

Repeating Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 

Using GOTO and PUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 

Using PUT To Produce a Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 

Accessing the Variable Name Within a Report . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 

PROCESS: Counting Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13 

NCOT: Numeric Recodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 

Multi-Variable RECODE With Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 

RECODE or IF/SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 

EQ and NE Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12 

Simple DO Loop with a List of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 

DO With Two Scratch Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 

DO: Range and Stepsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 

DO Loops: An Example of Each Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 

Labelled DO, EXITDO and NEXTDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 

Rename Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 

GENERATE: Generated Versus Original . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 

Dynamic Array, Wildcard, Prefix and GENERATE . . . . . . . . . . . . . . . . . . . . . . . 5.12 

Complex MASK: Generate Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13 

IF or IF-THEN-ELSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14 

IF-THEN with F.ELSE and M.ELSE in a Simple Hot Deck Example . . . . . . . . . 5.16 

IF-THEN-ELSE: The Data and the Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.18

IF-THEN-ELSE Block with Nested IF and a DO Loop . . . . . . . . . . . . . . . . . . . . . 5.19 

Calculating Variable Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 

Using LAG and DIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 

Interaction of LAG and IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 

EXPAND Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.14 

Showing the Differences Between .N., .HERE. and .USED. . . . . . . . . . . . . . . . 6.17 

FIRST and LAST with Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 

Using Scratch Variables and FIRST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 

Creating a Summary Case with FIRST and LAST . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 

Moving Values Between Files with the P Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 

DEFINE.ARRAY and SHOW.ARRAYS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8 

One-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.9 

Two-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.10 

Checking Variables Using PUT and Scratch Variables . . . . . . . . . . . . . . . . . . . . . 8.11 

Using CARRY in the SPLIT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.13 

Selecting Variables for SPLIT with USE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.14 

Naming the New Variables with CREATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.15 

Multiple CREATE Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.16 

Producing an Index Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.17 

Using STEP and CYCLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.18 

A Simple COLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.19 

Collecting BY Group Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.20 

Collecting Cases in a Specified Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.21 

Sorting the Collected Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.22 

A Complex Modification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.23 

A Second Complex Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.25 

Before and After COLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.26 

The XEQ Operator for Tests that Respect Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 

The CVAL Function for Bells and Whistles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.15 

Nesting Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.16 

Using VARNAME, SPLIT and COLLECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.18 

File of Character Data for MATCHES and XMATCHES . . . . . . . . . . . . . . . . . . . 9.19 

MATCHES and XMATCHES: Meta-Characters . . . . . . . . . . . . . . . . . . . . . . . . . 9.21 

DATE Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.16 

x

FORMAT.DATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.18 

FORMAT.DATE Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.19 

Producing a Report: The Input Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 

Producing a Report: The TEXTWRITER Command . . . . . . . . . . . . . . . . . . . . . . 11.5 

Producing a Report: The Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 

A Form Letter: The Input File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9 

A Form Letter: The TEXTWRITER Command . . . . . . . . . . . . . . . . . . . . . . . . . 11.11 

A Form Letter: One Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.12 

TEXTWRITER: Displaying all the Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.14 

A Complex Report: The Input and Labels Files . . . . . . . . . . . . . . . . . . . . . . . . . 11.15 

A Complex Report: The Report (Two Pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.16 

A Complex Report: The TEXTWRITER Command . . . . . . . . . . . . . . . . . . . . . 11.17 

PostScript Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.21 

Justification in PostScript Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.23 

Changing Fonts Text in a PostScript Paragraph . . . . . . . . . . . . . . . . . . . . . . . . . . 11.23 

Font Changes in a Justified PostScript Paragraph . . . . . . . . . . . . . . . . . . . . . . . . 11.24 

TEXTWRITER: Tabular Ouput with PostScript . . . . . . . . . . . . . . . . . . . . . . . . . 11.25 

PostScript: Tables with Proportional Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.26 

Indenting the Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.28 

Underlining the Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.30 

Activating Three Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 

Block Macro With Keyword Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 

Block Macro With Positional Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 

Macro With Positional Arguments and Default Values . . . . . . . . . . . . . . . . . . . . . 12.6 

Macros Can Call Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 

Instream Macros in Subcommand Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.9 

Lots of Instream Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.10 

Defining a Block Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.12 

The RUN Command and Partial Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.13 

Macros: Temporary File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.15 

Macros: Supplying Subcommands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.16 

Macro with Conditional Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.17 

Macros: Reversing the Order of Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.18 

Macros: DIALOG Provides an Interactive Front End . . . . . . . . . . . . . . . . . . . . . 12.19 

Macro With SUBFILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.23 

The SUBFILE Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.24 

xi

1 

PPL: 

Introduction 

P-STAT accepts information in many different forms. Information may be numeric, as is average yearly rainfall 

or total automobile production, or it may be text or character, as is a name or an address. P-STAT accepts information 

from a variety of sources, including disk, tape, or the users terminal and holds this information in a 

compressed rectangular format called a “P-STAT system file”. This file is composed of rows (cases or records) 

which contain one or more variables (fields). The first step in using P-STAT is to convert your data into P-STAT 

system file format. The commands which create a P-STAT system file are described in “P-STAT Introductory 

Manual” and “P-STAT: Utility Commands”. They include: 

1. MAKE when the data are in ASCII format on an external disk or tape, or when full 

screen capabilities are not available on the terminal. 

2. TEXTFILE.IN when the data are in ASCII (text) format delimited by tabs, commas or blanks. 

The first row of data may contain variable labels. 

3. FILE.IN primarily used when the data in an external file are in a binary format. 

4. SPSS.IN when the data are in SPSS export format. 

The P-STAT Programming Language, called PPL for short, is a language within the P-STAT program. Both 

simple and complex manipulations may be done using PPL instructions, operators, functions and system variables. 

PPL permits logical testing and selection of cases and variables, modification of existing variables, and generation 

of new variables. 

PPL may be used to modify any P-STAT system file as that file is read by any command. Modifications are 

temporary unless a new output file is produced. Both numeric and character variables may be tested, selected and 

modified. Most of the basic PPL instructions and operators are applicable to either numeric or character variables. 

However, there is a class of functions such as SQRT, the square root function, which only applies to numeric data, 

and there is another class of functions such as MATCHES, the string matching function, which only applies to 

character data. 

An important concept in understanding how PPL works is that an input file is not changed in any way by the 

programming language statements. When you request a given P-STAT command such as LIST or MODIFY: 

1. the P-STAT executive routines determine which command is required and passes control to that 

command; 

2. the command prepares to do its job and then, when it is ready, asks the executive routines for a row 

of data from the input file; 

3. the executive routines determine if there is any PPL. It is these routines which create new variables, 

recode existing variables, and process any logical selections; 

4. if, during the processing of the PPL, the executive routines determine that the current case has failed 

in some way and is not needed by the command, it stops processing that case and reads the next case. 

Thus the command that is currently executing has no knowledge of the original input case. It knows 

only about those cases which survive the PPL, and it knows about those cases only in their post-PPL 

form.

1.2 PPL: Introduction 

When a command like MODIFY produces an output file, any PPL that is done to the input file is permanent in 

the sense that the output file reflects that PPL. 

While PPL is most frequently used to modify a case of data in an existing P-STAT system file, there are provisions 

for passing data between cases within a file, between files, and even between P-STAT commands. This 

makes it possible to get summary information, to do conditional execution of the PPL within a command, and also 

to change the direction of a job stream depending on the data that are found or the results of a previous 

computation. 

1.1 THE VARIABLES 

There are three types of variables. The first type is a variable in a P-STAT system file. The variables in a P-STAT 

file may be numbers or character strings. Every case (row) of a file contains 1 or more such variables. Each variable 

has a name. The name of a variable can contain letters, digits, dots, underscores and, if starting with a tag, 

two colons. It has at most 64 characters and must start with a letter. If a tag is supplied, it may be 1 to 16 characters 

long and MUST be followed by the double colon (::). 

The variables in a given P-STAT system file can only be modified when the rows of that file are rad by a 

P-STAT command. P-STAT system variables and scratch variables, described below, can only be 16 characters 

long and do not have a tag. 

The second type of variable is a P-STAT system variable. These variables are not part of a P-STAT file. Instead, 

they reside in memory. They contain values such as the current date and the current page number. These 

variables, some of them numeric and some of them character strings, are created and maintained by the P-STAT 

executive routines and are available for your use. For example, .DATE. is the system variable for the current date. 

Most of the system variables except for .PAGE., the current page number, cannot be changed by a user. System 

variable names look like regular variable names except that they always begin and end with a decimal point. 

Scratch variables also exist in memory rather than in a P-STAT system file. Scratch variables, which can be 

either numeric or character, are created by you as you need them. Scratch variables come in two flavors which 

are distinguished by the way they are named. A scratch variable with a name that begins with a single pound sign 

(#) only exists for the duration of the current command or macro. This temporary form of scratch variable is usually 

used either to hold an intermediate results in a series of computations or to pass information between cases in 

a P-STAT file. 

A scratch variable with a name that begins with two pound signs (##) exists from the time it is created until 

the end of the P-STAT session. This permanent form of scratch variable allows information to be passed between 

files and between commands. Because a permanent scratch variable exists between commands, it can be created 

and changed even when there is no active P-STAT file. 

1.2 MATCHING NAMES 

With the longer names there is an increasing need to be able to refer to them in some abbreviated manner. Wildcards 

are one way to do this. They can be used in Version 3, anywhere that a variable name is referenced. A 

wildcard reference contains at least one question mark (?). Wildcards can be used in both commands and subcommands. 

They are discussed in datail in the next chapter. 

1.3 VECTORS AND ARRAYS 

A vector is a one dimensional array of values. The variables that are represented in a case of data can be thought 

of as an array. This array is referenced as the V vector. Using the V vector, the variables in a case can be addressed 

with array notation. The variable V(1) is the first variable in the case. The variable V(23) is the twenty third variable 

in the case. The V vector has a dimension that corresponds to the number of variables in the current P-STAT 

system file. The V vector can only be referenced as the P-STAT system file is being read into a command. 

There is a second vector that is also available for your use. This vector is know as the P vector. It contains 

as many double precision numeric elements as the maximum number of variables in a file. In most versions of

PPL: Introduction 1.3 

P-STAT the P vector has 6000 elements. The elements of the P vector are initialized to missing when the P-STAT 

run begins and remain missing until you change them. The P vector provides an easy way to pass a large number 

of values across cases or between commands. Since the P vector exists in memory rather than in a P-STAT system 

file it can be referenced even when there is no active P-STAT system file. 

A third type of vector, which uses the variables in your P-STAT system file, is also available. If you wish a 

group of variables to be addressed with vector notation, you must name them in such a way that all the variables 

to be included in the vector and only those variables have the same prefix or suffix. This prefix or suffix, combined 

with the wildcard character “?”, is used to denote the members of a vector that can be addressed with a 

subscript. This feature is usually used either to simplify the instructions when selecting variables with the KEEP 

or DROP instruction, or in conjunction with DO loops which provide a powerful mechanism for creating 

subscripts. 

Similar to the dynamic vectors are multi-dimensional user-defined arrays which can hold either numbers or 

characters. These are discussed in full in Chapter 8 “PPL: Across-Case Modifications”. 

1.4 THE COMMANDS 

PPL can be used any time that a P-STAT system file is read by any P-STAT command. The input file remains 

unchanged, but the cases that are processed by the command reflect the modifications. There are five commands 

which do not have a statistical or display function but which are specifically associated with PPL. These commands 

are covered in detail in the following chapters. 

The MODIFY command is used to read an existing P-STAT system file and produce a new file which contains 

the cases after the PPL is applied. The MODIFY command is especially useful when you are preparing a 

new study for analysis and need to clean the data. 

The COMPARE command takes two files and compares the contents. A major use of COMPARE is to compare 

the input and output from a MODIFY command as a check that the resulting output file is as expected. 

The CHECK command examines an existing P-STAT file for problems and stores the results in system variables 

that can then be tested or printed. The CHECK command should always be used when there has been a 

power failure or system crash while a P-STAT file was being processed. It is also useful when you need to know 

if a file has any remaining cases after a MODIFY. 

The TEXTWRITER command is a vehicle for PPL, with additional controls to format the printed page. 

The PROCESS command has a P-STAT system file as input but has no output file and does no computation. 

PROCESS is used to store the information in the P vector, arrays, or in permanent scratch variables which can then 

be accessed by subsequent commands. 

In addition, a number of PPL operators can be used as standalone commands. They can be used with system 

variables, scratch variables, the P vector and the user-defined arrays. These standalone PPL commands are: IF, 

SET, INCREASE, DECREASE, GENERATE, PUT, PUTL, IF-THEN-ELSE, DIALOG, BRANCH and DO 

loops. 

PUT .DATE. $ 

GEN ##Project:C40 = 'ABC, Inc. January 2008 Report' $ 

GEN ##Constant = SQRT ( 43.265 ) $ 

1.5 P-STAT SYSTEM FILE: CURRENT OR PREVIOUS 

P-STAT keeps track of the previous and current versions of each P-STAT system file that you create. You supply 

a file name of sixteen or fewer characters, and P-STAT adds the extension (suffix) “.PS1” or “.PS2”. As that file 

is modified, the extension name alternates. However, at all times, P-STAT knows which file is the current one 

and which is the previous one. You use only the name you gave the file: 

PLOT Cells;


for example, and P-STAT inputs the current version to PLOT. However, if you want the prior version for some 

reason, use the PPL instruction PREVIOUS 

PLOT Cells [ PREVIOUS ] ; 

The PPL instruction PREVIOUS is enclosed in square brackets and follows directly after the file name. It must 

be the first PPL clause. Additional PPL clauses may follow. The comparable instruction CURRENT is also available. 

When neither is used, it is assumed that the current file is the desired one. 

1.6 ORGANIZATION 

This manual contains chapters describing the details of the programming language and the commands specifically 

associated with PPL. 

• “PPL: Basics of the Programming Language” covers PPL punctuation, case selection, variable selection 

and simple logical selection with “IF”. 

• “PPL: The Commands” explains more about temporary and permanent modifications and covers the 

MODIFY, PPL, and PROCESS commands in detail. 

• “PPL: NCOT and RECODE” covers the NCOT and RECODE functions, including multi-variable 

recodes. 

• “PPL: DO LOOPS and IF-THEN-ELSE Blocks” covers those functions in detail. This chapter includes 

the use of DO loops to generate and rename lists of variables. 

• “PPL: Functions and System Variables” covers the numeric functions and many of the P-STAT system 

variables. 

• Random Number and Distribution Functions also covers the “Fuzzy equals problem” and the functions 

to protect against this problem. 

• “PPL: Across Case Modification” covers the use of the SPLIT and COLLECT functions as well as 

uses of the P vector, permanent scratch variables and user-defined arrays. 

• “PPL: Modification of Character Variables” covers the character functions including the MATCH 

function. MATCH provides string matching capabilities similar to those found in the Unix commands 

lex and yacc. 

• PPL: Date and Time Commands and Functions. 

• TEXTWRITER: A Vehicle for PPL 

• MACROS


VARIABLES 

There are three different types of variables available in P-STAT 

Fields in a P-STAT system file 

SUMMARY 

may be numeric or character strings. Variable names have 1-64 characters composed only of letters, 

numbers, underscores and decimal points. The first character must be a letter. Variable names may begin 

with a tag of 1-16 characters followed by 2 colons (::). These variables are only available when a P-STAT 

system file is read by a P-STAT command 

System variables 

may be numeric or character. These variables are created and maintained by the P-STAT system itself 

to contain information such as the current date, current file name, or the results of the most recent command. 

These variables, which always have names that both begin and end with a period, for example 

.DATE. , are stored in memory and can be used (printed, interrogated, etc.) but not changed by the user. 

Scratch variables 

may be numeric or character. These variables, which are created by the user as needed, reside in memory. 

Temporary scratch variables, which only exist for the duration of a command or macro, have names that 

begin with a single pound (#) sign. Permanent scratch variables exist for the remainder of the P-STAT 

session and have names that begin with two pound (##) signs. Scratch variables are limited to 16 characters 

starting with a letter and containing letters, numbers, underscores and decimal points. 

VECTORS AND ARRAYS 

Groups of related variables may be considered a vector of values. These are typically used in DO loops. 

V vector 

P vector 

The V vector references the current row (case) of data in a P-STAT system file. Variables may be refereed 

to by their names (Age, Q1, Density, etc.) or by their position in the file, for example: v(3) or V(#j). 

The subscript may be a constant or an expression (such as a scratch variable) that evaluates to a position. 

This vector is only available when a file is being read by a P-STAT command. 

The P vector is a numeric vector whose size depends on the maximum number of variables allowed in 

the version of P-STAT that is being used. The values in the P vector are set to missing when the P-STAT 

session begins. They are available for use in PPL and allow values to be passed between cases in a file 

and between commands. 

Dynamic vector 

Dynamic vectors depend on the naming of the variables in the P-STAT system file. Any group of variables 

with the same prefix or suffix can be referenced as a vector by combining the prefix or suffix with 

the wildcard character, the question mark (?). Thus Q1? refers to all variables in the file beginning with 

the characters “Q1”.


User-defined arrays 

Arrays, one-dimensional and multi-dimensional, for numeric and character data can be defined and used. 

They are described in full in Chapter 8 “PPL: Across-Case Modificiations”. 

COMMANDS 

The P-STAT Programming Language can be used with any P-STAT command. However, there are 6 commands 

of particular importance when PPL is considered. 

MODIFY 

CHECK 

COMPARE 

PROCESS 

takes an input P-STAT system file and applies PPL to produce an output P-STAT system file that is 

changed in some way. 

MODIFY Myfile [ here goes PPL ], OUT Newfile $ 

examines a P-STAT system file and reports on its status. It is very useful when a system crash has occurred. 

It is also useful for obtaining information such as the number of cases in the file. The information 

from CHECK is stored in system variables which may be tested in subsequent PPL. 

takes two P-STAT system files and compares their contents. The resulting differences are stored in a new 

P-STAT system file. 

uses a P-STAT system file but produces neither an output file nor a printed report. Is is used to accumulated 

information about the file and store it in the P vector or permanent scratch variables for use in 

subsequent commands. 

TEXTWRITER 

is a vehicle for PPL and the PUT function. It has additional features for formatting the output such as 

justification of the text, indenting, paragraph controls, and font changes for postscript output. 

STANDALONE PPL COMMANDS 

Many PPL operators can be used as standalone commands. These standalone PPL commands are: IF, 

SET,INCREASE, DECREASE, GENERATE, PUT, PUTL, IF-THEN-ELSE, DIALOG, BRANCH and 

DO loops. 

PPL INSTRUCTIONS 

PPL instructions are enclosed in brackets and immediately follow the filename. 

CURRENT 

COMPARE Myfile [ CURRENT ] Myfile [ PREVIOUS ], OUT mydiffs $ 

CURRENT selects the more recently created version of the P-STAT system file. CURRENT may be 

used with other PPL.


PREVIOUS 

PREVIOUS selects the previous version of the P-STAT system file. PREVIOUS may be used with other 

PPL

2 


Basics of the Programming Language 

This chapter explains the syntax and punctuation of the P-STAT Programming Language. Case and variable selection 

is covered in detail. In addition, generating new variables, recoding existing variables and logical selection 

using a simple “IF” are explained. 

2.1 CASE AND VARIABLE SELECTION 

PPL begins with a left bracket “[” and ends with a right bracket “]”. Individual clauses are terminated with either 

a semicolon “;” or a right bracket “]”. Clauses within brackets are separated by semicolons as in the first example 

below. If the right bracket is used, the next clause (if any) must begin with another left bracket. The following 

two command phrases are functionally equivalent.: 

SURVEY Patients [ CASES 1 TO 10 ; 

KEEP Age Sex Race ], 

SURVEY Patients [ CASES 1 to 10 ] 

[ KEEP Age Sex Race ], 

Each is a single phrase that contains two modification clauses. The command name is SURVEY. Its argument is 

the filename Patients and both of the modification clauses which are to be applied to that file. In this example, the 

modifications are a case selection, indicated by the word CASES (ROWS is a synonym) in the first modification 

clause, and a variable selection, indicated by the word KEEP in the second modification clause. 

The first word in each clause tells P-STAT what kind of modification is involved. P-STAT recognizes CAS- 

ES or CASE as the keyword for case selection and either KEEP or DROP as keywords for variable selection. IF 

is the keyword for logical selection. SET is the keyword for recoding or setting an existing variable to a new value. 

GENERATE is the keyword for generating or creating a new variable. Figure 2.1 contains examples of the basic 

types of modifications — case selection, variable selection, logical selection, recoding of existing variables, and 

creation of new variables. File Dogs contains five variables and three cases. The results of each modification 

clause are shown on the right. 

Many modification clauses may be used within the single command phrase which describes an input file. 

Each clause is used in turn to modify the cases of the file as it is read. The command itself is executed after the 

modifications have taken place. A comma following a right bracket means that the PPL for that file is finished, 

and some totally different command clause is about to begin. Therefore, you should NOT put commas between 

sets of PPL brackets. 

LIST Dogs [ KEEP Name Sex; IF Sex EQ 2, RETAIN ] $ is correct 

LIST Dogs [ KEEP Name Sex ] [ IF Sex EQ 2, RETAIN ] $ is correct 

LIST DOGS [ KEEP Name Sex ], [ IF Sex EQ 2, RETAIN ] $ is an ERROR 

The comma in a command is a signal that the next word is an identifier, a keyword, recognized by the command. 

The string “[ IF ...” is part of the PPL and not an identifier for the LIST command. It is easy to avoid this 

error if you use brackets only for major pieces of PPL and use the semicolon as the terminator for individual 

clauses.

2.2 PPL: Basics of the Programming Language 

__________________________________________________________________________ 

Figure 2.1 Basic Types of Modifications 

File Dogs: Before Modifications File Dogs: After Modifications 

Name Sex Age Wt Ht Diet 

Max 1 2 15 12 1 

Spot 2 7 24 18 1 

Rags 1 4 10 - 2 

CASES to select cases: Name Sex Age Wt Ht Diet 

LIST Dogs Max 1 2 15 12 1 

[ CASES 1 3 ] $ Rags 1 4 10 - 2 

KEEP to select variables: Name Diet 

LIST Dogs Max 1 

[ KEEP Name Diet ] $ Spot 1 

Rags 2 

DROP to omit variables: Name Sex Age Ht Diet 

LIST Dogs Max 1 2 12 1 

[ CASE 1 ; DROP Wt ] $ 

IF for logical selection: Name Sex Age Wt Ht Diet 

LIST Dogs Spot 2 7 24 18 1 

[ IF Sex EQ 2, RETAIN ] $ 

SET to modify existing variables: Name Sex Age Wt Ht Diet 

LIST Dogs Max 1 2 15 1.0 1 

[ SET Ht = Ht / 12 ] $ Spot 2 7 24 1.5 1 

Rags 1 4 10 - 2 

GENERATE to create new variables: Name Ratio 

LIST Dogs Max .80 

[ GEN Ratio = Ht / Wt ; Spot .75 

KEEP Name Ratio ] $ Rags - 

__________________________________________________________________________

PPL: Basics of the Programming Language 2.3 

Modification clauses are part of the P-STAT command structure. As such, they are free-format and may be 

continued on successive lines. However, each individual word or label must fit entirely on one line; it must not 

be broken across lines. 

There is a limit to the number of modifications which can be done at one time. This limit varies with the size 

of P-STAT that is being used. The size of the PPL workspace, measured in 4-byte words, is: 

Whopper II = 250,000 Whopper IV = 1,500,000 

An error message is printed when the limit is exceeded. The data modification area is adequate for most uses. 

However, if the space should prove too small to do a particular series of modifications in a single pass of the data 

file, the modifications may be done using the MODIFY command several times, creating temporary intermediate 

files. 

2.2 Case Selection 

Cases in a P-STAT system file are synonymous with rows in a file, despite the fact that the data for each case may 

have originally been collected on multiple records or may list on a terminal or printer over several lines. Case 

selection takes the following form: 

[ CASES 125 TO 199 345 ] 

It is indicated by the word CASES or CASE immediately following the left bracket. (Either ROWS or ROW may 

also be used.) 

Case selection uses the position of the case in the file to determine which cases are selected. Case references 

must be in ascending order whenever P-STAT files are accessed sequentially. Each of the following is a legal 

case selection clause: 

[ CASES 33 49 105 TO 200 223 300 TO 305 700 .ON. ] 

[ CASE 1 ] 

[ CASE 3 .ON. ] 

The use of the system variable .ON. in the first and third examples means “continue selecting cases from the current 

case onward until all the cases have been read”. You can tell that “.ON.” is a system variable because of the 

name. System variables have names that look like legal P-STAT names except that they always begin and end 

with a decimal point. 

A case may not be repeated in a case selection clause. (However, there are other ways to include a case more 

than once. See the REPEAT instruction later in this manual.) Case selection acts as a filter on the file and is done 

before any other modifications take place, regardless of the position of the CASE clause among the other modifications. 

If ten cases are selected from a file with 2000 cases, the tenth of the selected cases is processed as if it 

were the last case in the file. A file may be modified by no more than one case selection clause. 

A major reason for using case selection is for test runs. If you have a large file and are doing transformations, 

it is prudent to do a trial run, selecting a few cases and printing the results so they can be examined before the final 

run is made. When a trial run looks correct, the case selection is removed and the final 

run is done. 

2.3 Variable Selection 

There are two keywords which indicate variable selection: 1) KEEP, which is followed by a list of variables to be 

used, and 2) DROP, which is followed by a list of variables to be omitted. Variables may be selected by referencing 

either their names or their positions in the file. 

These are selections of variables by their names (variable labels): 

LIST Myfile [ KEEP Sex Age Education ] $


SORT Myfile [ DROP Income Rent ] , 

BY Education, OUT SortEduc $ 

LIST File3 [ KEEP Id Sex Age .ON. ] $ 

Each selection clause begins with either KEEP or DROP. These keywords identify a P-STAT variable selection. 

Each continues with a list of variable names, which are separated from each other by blanks. “.ON.” used for variable 

selection has the same meaning as it does for case selection. When used for variable selection it means 

“starting with the current variable do the KEEP or DROP to all the remaining variables”. 

It is often convenient to refer to a variable by position rather than by name, particularly when the variable 

name is long. There are some situations in which, by definition, a number can only refer to a position. There are 

other situations where a number could represent either a constant or a variable position. To distinguish between 

these two situations, the convention in P-STAT is that a constant is a number by itself, and a variable position is 

referenced with the notation V(n), where “n” is the position. V(1) is the variable in position 1 of the file. V(33) is 

the variable in position 33 of the file. With reference to the example in Figure 2.1, 

LIST Dogs [ KEEP V(1) V(2) V(6) ] $ is the same as 

LIST Dogs [ KEEP Name Sex Diet ] $ 

Variable names and variable positions can be used in the same variable selection clause. The position of a variable 

is always the “current” position of that variable in the file. After variable selection or reordering, the initial positions 

of the variables may change. For example, this command: 

PLOT Tree [ KEEP V(10) V(3) TO V(6) ] ; 

inputs cases with five variables to the PLOT command. The variables are ordered as specified. A subsequent 

subcommand to plot variable 10 by variable 3: 

P V(10) * V(3) ; 

yields an error message, because there are only five variables in the file given to the PLOT command. The variable 

that previously was in position 10 is now in position 1; the variable that was in position 3 is now in position 2, and 

so on. 

2.4 Variable Selection With WIldcards 

Consider a variable with the following name: 

A wildcard like 

age.oldest.surviving.child 

age?sur?ch? 

might be the most efficient way to refer to it. When compared to the above name, 'age' matches, the '?sur' says 

accept anything until 'sur' is found, the '?ch' says from there accept anything through a 'ch', and the the final '?' 

says accept anything at all after that, if indeed there is anything else. Thus, 

age.oldest.surviving.child 

is matched by age sur ch 

A wildcard usage can be thought of as a template for name matching. Differences in case do not matter. A 

wildcard template can be used to specify which variable (or, in some situations, variables) are to be used. The 

template is matched against the name of each variable in the file. The template uses single (?) or double (??) question 

marks to indicate how the matching should be done. 

A wildcard template contains at least one single (?) or double (??) question mark, and at least one string. The 

question marks serve as 'move until' operators. A string consists of one or more ordinary characters that can be 

found in names. String matching ignores case. A template successfully matches a name when each template element 

progressively matches a part of the name, with the entire name being matched when the template is done.


If a template starts with a string, the name being matched must also begin with that string. If the template 

ends with a single or double question mark, and the match has been successful so far, the rest of the name is accepted, 

and a match has occurred. 

A single question mark, followed by a string, matches the name through the NEXT remaining occurrence of that 

string. If no such string is found, the match fails. A double question mark, followed by a string, matches the name 

through the LAST remaining occurrence of that string. If no such string is found, the match fails. 

___________________________________________________________________________ 

Figure 2.2 Examples of Wildcard Matching 

qq? will match any name that starts with 'qq'. 

?qq? will match any name that contains 'qq' anywhere. 

??qq will match any name that ends with 'qq' . 

ab?cde will match abxxcde, and also abcde. 

ab?cde will NOT match abcdecde, whereas 

ab??cde will. 

a?o?c will NOT match age.oldest.child . 

a?o?c? will because the final ? moved to the name's end 

a?ld will NOT, because it ends on the ld in old. 

a??ld will, because the ?? moved to the last ld. 

___________________________________________________________________________ 

A wildcard can be used anywhere that the name of a variable could be used. A single match is usually what 

is expected. However: 

1. In KEEP or DROP phrases, and in LIST functions like SUM, a wildcard usage can have multiple 

matches, in which case all will be used. 

1. For example [KEEP ??income] 

2. There can be, in PPL expressions, multiple matches to a wildcard usage if a subscript follows, in parentheses, 

to show which of the matches should be accessed at that point in the execution of the PPL. 

This permits looping through the matches. 

SORT xxx, BY pulse??pre pulse??post, OUT zzz $ 

In the BY phrase each template should match one name, and the sort will be 

done on those two BY variables. 

[ SET tot? TO ?11?inc? + ?11?div? ] 

In the above, the actual variable names could be something like total_income_all_sources, year_2011.income 

and year_2011.dividends . 

[ KEEP ??income ] 

Wildcards can be used in KEEP or DROP phrases. The phrase shown above keeps all of the variables whose 

names end with INCOME. There can be one or more matches. 

[ SET total TO SUM( ??income) ] 

Wildcards may be used as input to the various LIST functions, which include sum, mean, max, first.good and 

such. The phrase shown above sets TOTAL to the sum of the variables whose names end with INCOME. There 

can be one or more matches. SET, INCREASE and DECREASE can be followed by a subscripted wildcard, as 

can the various operands in the rest of the expression.


[ DO #j = 1,5; SET ??income(#j) = 0; ENDDO ]. 

Here, ??income remembers the positions of the variables whose names end with 'income'. In this example, there 

are presumably five of them. Using ??income(#j) when #j=2 accesses the second of them, wherever in the file that 

variable may actually be. The example would set the 5 variables whose names end with 'income' to zero. 

2.5 Using Ranges in Selection Clauses 

Lists may contain single variables, many variables, and/or ranges of variables: 

[ KEEP Siblings TO Children Occup.Mother Race Age ] 

The cases received by the individual P-STAT commands contain all the variables from the variable named Siblings 

through the variable named Children, plus the three variables, Occup.Mother, Race and Age. In this case, 

there will be an error message if the variable named Children has a position in the file before that of the variable 

named Siblings, or if Occup.Mother, Race or Age have positions in the file between Siblings and Children. Other 

than these situations, the order of the individual variables or ranges does not matter. 

Other valid selections are: 

[ CASES 1 10 TO 50 56 ] 

[ DROP V(13) TO V(16) Occupation ] 

[ KEEP V(1) Education TO V(23) Region V(3) ] 

The system variable .ON. may be used to make the referencing and typing of variable selections easier. The 

clause: 

[ KEEP V(6) Children .ON. ] 

instructs P-STAT to use the sixth variable in the file and all the variables from the one named Children through 

the last variable in the file. .ON. means “from here on through the end.” This is particularly useful if you have 

added a number of new variables to the file and are not certain just how many you currently have. 

The use of: 1) TO to indicate a range, and 2) .ON. to indicate “from the current item on through the last item,” 

are valid in both variable and case selection clauses. 

2.6 Multiple Variable Selections 

DROP and KEEP, unlike CASES, may be used in more than one modification clause. Variable selections take 

place in a sequential and cumulative manner. An initial variable selection often winnows out the unnecessary variables. 

A second selection occurs after all the modifications are done and selects only those variables actually 

needed as input for a given command: 

LIST Dept 

[ KEEP Name TO Race Test1 Test2 ; 

GENERATE Pass = 1; 

GENERATE Test.Average = ( Test1 + Test2 ) / 2 ; 

IF Test.Average LT 65, SET Pass = 0 ; 

DROP Test1 Test2 ] $ 

Variables should not be selected out of the file before they are used. The following command causes an error 

because the variable named Year is not available when the IF clause is processed: 

LIST Produce [ DROP Year ; 

IF Year EQ 2002, RETAIN ] $ 

The correct order of the variable selection clauses is:


LIST Product [ IF Year EQ 2002, RETAIN ; 

DROP Year ] $ 

The value of the variable Year is tested and the case retained when it is 2002, and then variable Year is dropped 

from each case of the file as it is passed to the LIST command. 

DROP and KEEP require a great deal of overhead because both the variable names and the data are rearranged 

for each DROP or KEEP clause. Your run will be more efficient if you limit the number of DROP and KEEP 

clauses in any one command. For example: 

[ DROP V(1) ] [ DROP V(1) ] [ DROP V(1) ] 

is less efficent than 

[ DROP V(1) TO V(3) } 

even though they do the same thing. 

RETAIN keeps the case. It is then passed to the next PPL clause or, if there are no more clauses, to the current 

command. DELETE does not pass a case to any subsequent PPL clauses or to the current command. PPL does 

not change the contents of the input file. PPL only affects what is sent to the current command. If the command 

creates an output file, the changes are “permanent”. If the command does not create an output file, the changes 

are temporary. 

2.7 Reordering Variables 

The KEEP instruction may be used to reorder variables For example, any of the following clauses reorders the 

variables Rent, Sex and Age in listings of File1: 

LIST File1 [ KEEP Age Sex Rent ] $ 

LIST File1 [ KEEP Sex Age Rent ] $ 

LIST File1 [ KEEP Rent Age Sex ] $ 

Often the rearrangement is done to place one or two variables at the left of the file. These two clauses are 

equivalent: 

[ KEEP V(16) V(23) V(1) TO V(15) 

V(17) TO V(22) V(24) .ON. ] 

[ KEEP V(16) V(23) .OTHERS. ] 

.OTHERS. is a system variable meaning all the variables which are not mentioned elsewhere in the KEEP 

clause. System variables are set by P-STAT. Most of the system variables cannot be changed by the user but are 

available for use and testing in PPL statements. System variable names always begin and end with a decimal point. 

Since variable names must begin with a letter, system variable names will never conflict with legal variable names. 

.NEW., .CHARACTER. and .NUMERIC. are system variables which can be used after KEEP to select or reorder 

variables. .NEW. refers to any new variables which have been created in the previous PPL clauses. 

.CHARACTER. refers to all the character variables and .NUMERIC. refers to all the numeric variables. 

LIST Dept 

[ KEEP Name Test.1 Test.2 ; 

GENERATE Pass = 1 ; 

GENERATE Test.Average = ( Test.1 + Test.2) / 2 ; 

IF Test.Average LT 65, SET Pass = 0 ; 

KEEP Name .NEW. ] $ 

In this example, Name, the original variable, and the new variables created in this command, are included in the 

list. .NEW. and .OTHERS. can be used both to rearrange and to select variables:


[ KEEP .NEW. .OTHERS. ] 

[ KEEP .OTHERS. Age .NEW. Race ] 

2.8 Masks and Wildcards 

Masks and wildcards are shortcuts that make it easier to refer to variables that have either a pattern to their order 

in the input file or a common prefix or suffix in their names. Masks are strings of characters that mean “yes” and 

“no”. In this example, the inner parenthesis contains a mask (MASK 100) with a length of three: 

[ KEEP Test1 .ON. (MASK 100) ] 

A variable or case selection mask is a string of digits that are either zeros or ones. Variable selection starts 

with the variable named Test1 and continues through the last variable in the file (.ON.), applying the mask to successive 

groups of three variables. Variables corresponding to mask values of 1 are kept (“yes”) and variables 

corresponding to mask values of 0 are dropped (“no”). In this example, if the variable named Test1 is in position 

6 in the file, variables 6, 9, 12, and so on, are selected. Variables 1 to 5, 7, 8, 10, 11, 13, 14, and so on, are not 

selected. 

Masks are particularly useful when the file contains repeating groups of variables and only some are needed 

for a particular analysis. Given these variables: 

Date1, Grade1, Date2, Grade2, .... Date9, Grade9 

this variable selection clause selects the Grade variables: 

[ KEEP Grade1 TO Grade9 (MASK 10) ] 

The following variable selection clause could be used to reorder the variables so that all those whose names begin 

with “Date” are followed by those whose names begin with “Grade”: 

[ KEEP Date1 TO Grade9 (MASK 10) 

Date1 TO Grade9 (MASK 01) ] 

Masks may also be used in case selection clauses: 

[ CASES 5 .ON. ( MASK 1000 ) ] 

The question mark “?” is used as a wildcard, that is, to refer to any variables with a common prefix or suffix 

in their names. (The question mark replaces the asterisk, used in earlier versions of P-STAT, as the wildcard character. 

This avoids any possible confusion of the wildcard with the symbol for multiplication, which is the 

asterisk.) This selection clause: 

[ KEEP Grade? ] 

keeps all variables beginning with the character string “Grade”. This clause: 

[ KEEP ?Batch ] 

keeps all variables ending with the character string “Batch”. Wildcard notation can be used to reorder the variables 

so that all the variables beginning the “Date” are followed by all the variables beginning with “Grade” and then 

by any other variables in the same order that they occur in the input file. 

[ KEEP Date? Grade? .OTHERS. ] 

The prefix or suffix used with the wildcard ? must be unique to the desired variables. This KEEP instruction: 

[ KEEP Family.ID Income.Male.HH Income.Fem.HH Income.Total ] 

may be shortened to: 

[ KEEP Family.ID Income.? ] 

However, if the file also contains the variables Income.Last.Yr and Income.Child, they will also be kept. Sometimes, 

an error situation results because the wildcard reference is not unique:


[ KEEP Family.ID Income.? Income.Total ] 

This KEEP instruction includes the variable Income.Total twice, once in the middle of the file and a second time 

as the right-most variable. This is an error because each variable in a P-STAT system file must have a unique 

name. 

Case is ignored in wildcard selection. Variable ax and bx are both selected by ?x, or for that matter, by ?X. 

2.9 MODIFYING AND GENERATING VARIABLES 

Values are changed using the SET instruction, which “sets” an existing variable to new values. New variables are 

created using the GENERATE instruction, which “generates” a new variable with the specified values. If SET is 

used with a name that is not the name of a variable in the file, an error message is printed. If GENERATE is used 

with a name that already belongs to a variable in the file, an error message is printed. In general it is a good practice 

to generate all the variables that will be needed before any recodes or logical selections. 

2.10 Modifying Variables with SET 

The keyword SET indicates to P-STAT that modification is to be done to an existing variable. There are four elements 

to a SET clause: 

1. The keyword SET; 

2. The name or position of the variable that is to be modified; 

3. An equal-sign (=); 

4. The value or expression to be used as the new value of that variable. 

The format of the SET instruction is illustrated in Figure 2.3. 

__________________________________________________________________________ 

Figure 2.3 Format of the SET Instruction 

SET Var Operator Expression 

[ SET Score = Test ] 

[ SET Score = Test1 + Test2 ] 

[ SET Score = SQRT ( Score ) ] 

[ SET Notes = 'Late 2 days' ] 

[ SET V(1) = V(1) + Test ] 

[ SET V(1) = 1 + V(1) ] 

A variable may be referred to by its name or by its position. Note that in a SET clause, constants are often 

used. Character constants must be enclosed in quotes. There is often no way to infer from the context whether a 

number is a constant or the position of a variable. Therefore, the PPL syntax rule is that a number by itself is a 

constant, and a number indicated with the V(n) notation refers to a variable position. 

In addition to distinguishing between constants and variable positions, the “V” notation references the vector 

containing the values for the current case. The subscript (the contents of the parenthesis) pointing into the V vector 

may be a number or an expression. V(17) points to the value of the variable in the 17th position of a given case, 

in other words to its 17th variable. V(Region) points to the value of the first variable if Region is equal to 1 and 

to the value of variable 33 if Region is equal to 33. Calculation of variable positions is discussed in detail later in 

this manual. 

If the variable that follows the SET instruction is not found in the file, an error occurs:


ERROR... Variable Bad.Label, used in a PPL phrase, 

is not found in the file. 

In an interactive session, control is given to the internal P-STAT editor so that the error may be corrected and execution 

can continue. 

A series of SET instructions can be separated into individual clauses or grouped together in the same modification 

clause: 

LIST File [ SET Score = SQRT (Score), 

SET Test = Test + 1, 

SET Inches = Feet / 12 ] $ 

2.11 Using INCREASE and DECREASE Instead of SET 

These usages of SET increase or decrease the value of an existing variable either by a constant or by an expression: 

[ SET Count = Count + 1 ] 

[ SET Total = Total + Score ] 

[ SET Used = Used - 3 ] 

SET clauses like these may be expressed more simply using the instructions INCREASE and DECREASE: 

[ INCREASE Count ] 

[ INCREASE Total BY Score ] 

[ DECREASE Used BY 3 ] 

When BY is omitted, BY 1 is assumed. INCREASE may be abbreviated to INC and DECREASE may be abbreviated 

to DEC. Wearing new pants in the rain is an example of DECREASE. 

2.12 Creating New Variables with GENERATE 

The GENERATE instruction indicates that a new variable is to be created. It may be abbreviated to GEN. The 

format is like that of the SET instruction. GENERATE is immediately followed by the name of the variable to be 

created. This name must be one that does not already exist within the file. If a question mark (?) is used instead 

of a name, P-STAT generates a variable name. This name is the position of the variable in the file with the prefix 

VAR: 

LIST File [ GENERATE Total = Score1 + Score2 ] $ 

LIST File [ GENERATE ID:C = Last.Name ] $ 

LIST File [ GENERATE ? = MEAN ( XA TO XE ) ] $ 

Character variables need “:C” or “:Cnn”, where nn is the maximum number of characters, directly after their 

names. When the number is not supplied, 16 is assumed. The following creates a new variable which can contain 

up to 30 characters: 

[ GENERATE ?:C30 = 'generated character variable'; 

Once generated, the variables are referenced by just their names. The expression following the “=” in GENER- 

ATE is exactly like that following the “=” in SET. The MEAN function in the example above computes the means 

of the variables in the list following the function name. 

The difference between SET and GENERATE is that the variable referenced by SET must already exist while 

the variable referenced by GENERATE must not yet exist. If the variable referenced by GENERATE does exist, 

an error occurs: 

Error... Attempting to GENERATE a new variable named var4, 

but the name already exists in position 4. 

The variable name (label) and the position it currently occupies (n) are both supplied in the error message.


The expression which follows the “=” in SET and GENERATE instructions can be another variable, a constant, 

or a complicated expression involving variables, constants and functions. These are all valid expressions 

after an equal-sign: 

Age 

3.33 

'Sarah Wilson' 

SQRT ( V(3) + Age / 12 ) 

RECODE ( Age, 80 TO 99 = 80 ) 

The “+” and “/” are numeric operators, whereas SQRT and RECODE are functions. SQRT is the square root function. 

The RECODE function allows individual values of a variable to be changed; it is discussed in detail later in 

this manual. 

If the GENERATED variable is not set to anything as in: 

[ GENERATE abc ] 

it is set to Missing 1. 

2.13 Numeric Operators and their Order 

In the example above, the expression “Age / 12” is a numeric expression which requests the value of the variable 

Age, divided by the constant 12. The slash (/) is the symbol for division. The numeric or arithmetic operators are: 

+ for addition 

- for subtraction 

* for multiplication 

/ for division 

** for exponentiation 

A series of unparenthesized numeric operations may not necessarily be performed from left to right. All exponentiation 

at a given parenthesis level is done first, followed by all multiplication and division, followed by all 

addition and subtraction. If there is a series of additions and subtractions, they are performed from left to right. 

If there is a series of multiplications and divisions, they are also performed from left to right. A series of exponentiations, 

however, is done from right to left. Therefore: 

A - B + C is done as ( A - B ) + C 

A / B * C is done as ( A / B ) * C 

A ** B ** C is done as A ** ( B ** C ) 

A + B * C is done as A + ( B * C ) 

A * B ** C is done as A * ( B ** C ) 

If this order of execution is not the desired order, parentheses may be used to enclose portions of a numeric 

expression. Operations within a pair of parentheses are performed before operations outside, regardless of the order 

defined above. Thus, 

( A + B + C ) / 3 

would take the sum of A, B, and C and divide the result by 3. Without the parentheses, the result would be C 

divided by 3 plus A and B, that is, A + B + ( C / 3 ). 

2.14 Functions 

P-STAT functions are special expressions which transform variables according to particular rules. For instance, 

the SQRT function calculates the square root of a variable or an expression. A variable or expression used by a 

function is called the “argument” of that function. 

Functions require at least one argument. Arguments follow the function name and are enclosed in parentheses. 

Some of the functions, like the square-root function SQRT, require only a single argument, an expression


which is to be used by the function. This expression can be a variable name or position, a constant, another function 

with its arguments, or a combination of such elements: 

[ SET Score = SQRT ( Score ) ] 

[ SET Score = SQRT ( 55 ) ] 

[ SET Score = SQRT ( Score + 33 ) ] 

[ SET Score = SQRT ( V(1) * .5 ) ] 

Functions like the square root function SQRT are called “numeric functions”. The argument for a numeric 

function is a single expression. If any of the elements in the expression is a missing value, the result is a missing 

value. If the expression yields a good value which is legal for the function, the function will produce an appropriate 

result. If the argument is invalid, like SQRT(-3), the result is set to missing. 

A number of functions, such as the MEAN function, operate on a list of arguments: 

[ SET AVERAGE = MEAN ( V(2) Test1 Test2 Test3 ) ] 

[ SET AVERAGE = MEAN ( V(2) Test1 TO Test3 ) ] 

Each argument is a numeric variable name or position. The function takes the list of arguments and yields a single 

value. For instance, the MEAN function illustrated above calculates the mean of the variables in the list. 

The functions, which are covered in detail later in this manual, can be broadly classified as: 

1. numeric functions such as SQRT 

2. list functions which operate on a list of variables such as MEAN 

3. character functions such as UPPER, LOWER and CAPS 

4. special functions. For example, RECODE and NCOT are used to recode the values of one or more 

variables. SPLIT and COLLECT are used for cross case data manipulation. Date and time functions 

have a chapter of their own later in this manual. 

2.15 LOGICAL SELECTION OF CASES 

Cases in a P-STAT system file may be selected or deleted from processing by logical testing. This is sometimes 

referred to as “filtering.” IF is the keyword that precedes all logical selections and modifications The following 

is a discussion of the simple logical IF. Full IF-THEN-ELSE blocks are discussed later in this manual. 

__________________________________________________________________________ 

Figure 2.4 Format of the IF Clause 

Logical 

IF Exp 1 Operator Exp 2 Action 

[ IF Test1 EQ Test2, DELETE ] 

[ IF Test1 LT 3, RETAIN ] 

[ IF Test1 - 3 GE Test4 * .5, SET .... ] 

[ IF Test1 - V(3) GT SQRT (Test3), SET .... ] 

[ IF SQRT (Test1) GT .2, SET .... ] 

[ IF School EQ 'Longwood', SET .... ] 

__________________________________________________________________________ 

The IF itself is usually composed of five parts: 

1. the keyword IF 

2. an expression


3. an operator indicating the relationship between the expressions 

4. a second expression 

5. one or more action instructions to be taken. 

The format of the IF clause is illustrated in Figure 2.4. Expressions may be as simple as a variable name or a 

constant, or they may be complex numeric or character expressions combining variables, constants and functions. 

Character constants need to be enclosed in single or double quotes. 

The V(n) notation provides a consistent means for differentiating between a constant and a variable position. 

Test1 - 3 means the value of the variable named Test1 minus the constant 3; Test1 - V(3) means the value of the 

variable named Test1 minus the value of the variable located in position 3. 

Any of the expressions in the IF can be complex and refer to variables, constants or functions. The functions 

themselves can call functions: 

[ IF INT ( SQRT (XA ) ) GT SQRT (XC) + 5, SET XF = 1 ] 

Here the square root of XA is computed. Then that result is truncated to an integer using the INT function. Finally, 

the result is compared with the result of the second expression, namely, the sum of the square root of XC and the 

constant 5. If the comparison is evaluated as true, that is, if the value of the first expression is greater than the 

value of the second expression when both expressions are non-missing, the specified action (SET XF = 1) occurs. 

The action taken after the evaluation of an IF clause is typically the modification of a variable’s value (SET), 

the keeping of a case (RETAIN), or the exclusion of a case ( DELETE). 

RETAIN keeps a case and passes it to the next PPL clause or, if there are no more clauses, to the current 

command. The case is passed through to the current command, unless it is deleted in a subsequent PPL clause. 

DELETE does not pass a case to any subsequent PPL clauses or to the current command. It deletes the case 

from any further modification or testing, and from the current command. In other words, the processing of PPL 

ceases for that case. The next case is read and PPL is restarted with the new case. 

The action that follows the IF test is usually taken only if the expression is true. 

IF Test GE 65, SET Pass = 'true', is the same as: 

IF Test GE 65, T.SET Pass = 'true', 

Any action can be prefaced by any combination of the letters “T” for true, “F” for false, and “M” for missing to 

control how the results of the IF test are to be evaluated. 

IF Test LT 65, MF.SET Pass = 'false', T.SET Pass = 'true'; 

The action section can contain multiple actions, each one prefaced with appropriate “TMF” combinations. 

2.16 Logical Operators 

The basic logical operators are the following: 

Meaning Symbol 

equal EQ 

not equal NE 

less than LT 

less than or equal LE 

greater than GT 

greater than or equal GE 

Each expression in an IF clause is analyzed and a value is computed. The expressions are then compared according 

to the logical operator that was used. If the logical operator correctly describes the relationship between 

the expressions, the IF statement is evaluated as true. If it is incorrect, the IF statement is false. If either expression 

is missing so that the comparison cannot be made, the IF is evaluated as missing. 

The logical operators may be prefaced with “X” for eXact comparisons of character strings:


[ IF Symptom XEQ 'a', RETAIN ] 

When an exact comparison is specified, the case of the character string must be exactly the same as that of the test 

string for the IF statement to be true. In the example above, the value of Symptom must be a lower case “a” for 

the case to be retained; an upper case “A” would be evaluated as false. When the logical operators are not prefaced 

with “X”, the case of character strings is not relevant. 

In addition to these operators there are 6 logical operators which are used to compare dates and times. They 

are described in Chapter 10, “PPL: Date and Time Commands and Functions”. 

2.17 The Special Operators MISSING and GOOD 

The values .M. and .G. are the system values for missing and good. Missing can be further specified as .M1., .M2. 

and .M3. . Note that names for system values and system variables look much like variable names except that they 

begin and end with a decimal point. This: 

[ IF Age EQ .M., DELETE ] 

may be used to delete any case with a missing value of Age. Note that when an IF statement is used to explicitly 

test for missing or good values, it has only a true or false result. 

The two special operators MISSING and GOOD can also be used to test whether or not missing data are present 

in an expression: 

[ IF Test1 MISSING, is the same as 

[ IF Test1 EQ .M., 

[ IF Test1 GOOD, is the same as 

[ IF Test1 EQ .G., 

The special operators MISSING and GOOD combine the “EQ” (=) operator and the system value .M. or .G. 

into a single keyword. MISSING1, MISSING2, and MISSING3 can be used in the same way as MISSING to test 

specifically for the individual types of missing. 

[ IF Age MISSING3, DELETE ] 

Here, a case is deleted if Age equals the system value for missing type 3. 

2.18 AND and OR Relationships 

An IF may consist of a series of logical relationships linked by AND or OR. For example: 

[ IF Age GE 14 AND Sex EQ 1, SET Membership = 2 ] 

[ IF Age LT 14 AND Sex EQ 1 OR V(1) EQ 77, DELETE ] 

There can be many ANDs and ORs and they can be nested. Parentheses control the order in which the parts of the 

expression are evaluated: 

[ IF 

( Age GT 21 OR ( Voter EQ 2 AND Married EQ 1 ) ) 

AND 

( Education GT 12 OR ( Job EQ 4 AND Income GT 20000 ) ), 

RETAIN ] 

This example illustrates the types of complex expression that are possible. However, a frequent cause of an empty 

file (no cases found) is an IF with expressions so complex that the user cannot follow the logic.


__________________________________________________________________________ 

Figure 2.5 AND and OR: Evaluations of Expressions 

In the following table, the evaluations of the expressions are: 

t for true, f for false and m for missing. 

EXPRESSIONS: EVALUATIONS: 

Exp1 Exp2 Exp1 AND Exp2 Exp1 OR Exp2 

t t t t 

t f f t 

t m m t 

f t f t 

f f f f 

f m f m 

m t m t 

m f f m 

m m m m 

__________________________________________________________________________ 

Unless parentheses indicate otherwise, ANDs are done before ORs. 

Parentheses determine clusters of logic that get evaluated as a piece. In the previous example, if the value of 

Age is not greater than 21 or if Age is missing, the expression: 

( Voter EQ 2 AND Married EQ 1 ) 

needs to be evaluated. If this expression is also not true, the entire modification clause cannot be true, and the rest 

of the clause does not need to be processed. However, if this expression is true, the next expression: 

( Education GT 12 OR ( .... ) ) 

is evaluated in the same manner. If Education is greater than 12, there is no need to complete the evaluation of 

the expression. A true result is returned and the case continues to the next PPL clause. However, if Education is 

not greater than 12 or is missing, the evaluation proceeds because the expression following the OR might be true. 

Figure 2.5 contains a table which shows the interaction of true, false and missing evaluations with the AND 

and OR operators. The following example illustrates OR with three different evaluations: 

( IF Occupation EQ 40 OR Education EQ 12 ) 

Occupation Education Evaluation 

43 (f) 12 (t) true 

43 (f) 16 (f) false 

43 (f) - (m) missing 

In the third example, the first expression is false (Occupation is not 40), but the second expression is neither 

false nor true because the value for Education is missing. 

Because AND has precedence over OR some of the parentheses in the previous example can be omitted.


can be written as 

[ IF 

( Age GT 21 OR ( Voter EQ 2 AND Married EQ 1 ) ) 

[ IF ( Age GT 21 OR Voter EQ 2 AND Married EQ 1 ) 

The full statement reduces to: 

[ IF ( Age GT 21 OR Voter EQ 2 AND Married EQ 1 ) 

AND 

( Education GT 12 OR Job EQ 4 AND Income GT 20000 ), 

However, the use of the parentheses is recommended whenever the logic is complex with a mixture of AND and 

OR phrases. 

2.19 Common Errors in Complex Expressions 

The most common errors in constructing a complex expression occur because the relationship that follows the IF, 

OR, or AND is not complete. For example: 

[ IF Occupation EQ 3 OR Occupation EQ 4, .... is correct 

[ IF Occupation EQ 3 OR 4, .... is incorrect 

In the second example above, “Occupation EQ 3” is complete. It has the proper three parts with “Occupation” as 

the first expression, “EQ” as the operator, and “3” as the second expression. However, 

[ IF Occupation EQ 3 OR 4 EQ Occupation, 

is also allowed. Since a number on either side of the operator is a valid expression, the 4 following the OR (in the 

earlier example): 

[ IF Occupation EQ 3 OR 4, .... 

is interpreted as the first expression in a clause. The “,” which follows it is not a legal operator. Error messages 

indicate what was expected in the clause and what was found: 

LIST Patients [ IF Occupation EQ 3 or 4, RETAIN ] $ 

ERROR... Expected a logical operator like EQ 

RETAIN ] $ 

A second common source of error is to include the IF for each relationship in the complex statement. The 

following is correct: 

This is incorrect: 

[ IF Occupation EQ 3 OR Occupation EQ 4, .... 

[ IF Occupation EQ 3 OR IF Occupation EQ 4, .... 

In this example, an error message results because IF is a legal name for a variable: 

LIST KK [ IF Occupation EQ 3 OR IF Occupation EQ 4, 

RETAIN ] $ 

ERROR... Expected a logical operator like EQ 

Occupation EQ 4, 

Since IF is a legal variable name, and a variable is a valid expression, P-STAT is expecting the next character 

string to be an operator such as EQ or LT. The variable name Occupation is not a legal operator and an error condition 

occurs.


2.20 AMONG and NOTAMONG 

Two other logical operators, AMONG and NOTAMONG, simplify the specification of logical relationships. 

They follow an initial expression and require a list of values and variables (not a second expression) as their 

argument. 

The argument list for AMONG and NOTAMONG contains individual values and ranges of values. This logical 

clause: 

[ IF Test.Score AMONG ( 90 TO 100 ), 

SET High = 1 ] 

produces exactly the same result as: 

[ IF Test.Score GE 90 AND 

Test.Score LE 100, SET High = 1 ] 

AMONG is easier to type and to understand. 

NOTAMONG is used similarly: 

[ IF Test.Score NOTAMONG ( 90 TO 100 ), DELETE ] 

Any cases with values on Test.Score that are below 90 or over 100 are deleted. Cases with missing values are 

not deleted. Prefixing the consequence: 

[ IF Test.Score NOTAMONG ( 90 TO 100 ), TM.DELETE ] 

deletes cases with missing values as well. The system variables for missing values (.M.) may not be included in 

the argument list for AMONG or NOTAMONG. 

AMONG and NOTAMONG are particularly powerful when multiple values are specified. Thus: 

[ IF Religion EQ 1 OR ( Religion GE 3 

AND Religion LE 5 ) OR Religion EQ 7 

OR Religion EQ 9, SET Protestant = 1 ] 

is exactly the same as: 

[ IF Religion AMONG ( 1, 3 TO 5, 7, 9 ), 

SET Protestant = 1 ] 

The arguments for the operators AMONG and NOTAMONG are lists of values (constants) and variables; 

they cannot be complex expressions but they can be scratch variables. The use of commas separating the AMONG 

values is optional. In this example, the arguments for NOTAMONG are variable names: 

[ SET Low.Score = MIN ( Test1 TO Test10 ); 

SET High.Score = MAX ( Test1 TO Test10 ); 

IF Final.Exam NOTAMONG ( Low.Score TO High.Score ), 

RETAIN ] 

The MIN function yields the minimum value of a list of variables, which can include ranges, wildcards and 

.ON. . The MAX function yields the maximum value. Here these functions are used to find the lowest and highest 

scores on a series of tests. If the value of the variable named Final.Exam is less than the lowest value or above the 

highest value, the case is retained. The retained cases are students who have done either better or worse than expected, 

given their scores on Test1 to Test10. 

AMONG and NOTAMONG may be prefaced with “X” for eXact comparisons of character strings. When 

exact comparisons are specified, the string must be identical and the case (upper, lower or mixed) must also be 

identical: 

[ IF Symptom XAMONG ( 'a' 'A' 'Aa' ), RETAIN ] 

For example, cases with “aa”, “AA” and “aA” as values of Symptom would not be retained.


2.21 MISSING DATA with AMONG and NOTAMONG 

If the value being tested is a missing value, the result of the IF will be missing unless it matches a missing value 

in the argument list. If variable TestScore has a value of MISSING2 the following PPL: 

[ IF TestScore AMONG ( 60 TO 100 ) T.SET Grade = 'Pass', 

F.SET Grade = 'Fail'. 

M.SET Grade = 'Incomplete' ] 

as expected, produces the missing result of “incomplete”. 

[ IF TestScore AMONG ( 60 TO 100, .M1. ) .... 

also produces the missing result when variable TestScore is MISSING2. However, the statement: 

[ IF TestScore AMONG ( 60 TO 100, .M2. ) .... 

produces a result of true and variable Grade has a value of “Pass” for that case. 

2.22 INRANGE and OUTRANGE 

INRANGE and OUTRANGE can be used when the test to be done is for a single range of values. 

[ IF TestScore INRANGE ( 60, 100 ), SET Grade = 'Pass' ] 

[ IF Age OUTRANGE ( 13, 19 ), DELETE ] 

These two examples can also be done using AMONG with the keyword “TO”. The reason for including functions 

INRANGE and OUTRANGE is because the names of these functions are more intuitive for some situations than 

AMONG and NOTAMONG. 

2.23 ANY and ALL 

There are two other logical operators that may follow an IF: ANY and ALL. They must be followed by a list of 

variables. ANY is equivalent to a series of ORs. This example: 

is the same as: 

[ IF Q11 GT 10 OR Q12 GT 10 OR Q13 GT 10 OR Q14 GT 10, DELETE ] 

[ IF ANY ( Q11 TO Q14 ) GT 10, DELETE ] 

ALL is equivalent to a series of ANDs. This example: 

is the same as: 

[ IF Q11 GOOD AND Q12 GOOD AND Q13 GOOD 

AND Q14 GOOD, RETAIN ] 

[ IF ALL ( Ql1 TO Ql4 ) GOOD, RETAIN ] 

The argument list which follows ALL and ANY may contain variable names or variable positions. The variable 

positions are indicated by V(n). A common use of ANY or ALL selects cases with good (non-missing) data 

on all of the variables. Either of these statements does this: 

[ IF ALL ( V(1) .ON. ) GOOD, RETAIN ] 

[ IF ANY ( V(1) .ON. ) MISSING, DELETE ] 

2.24 INSTRUCTIONS AFTER IF 

The IF statement is incomplete by itself. It must be followed by an instruction that describes the action to be taken 

as a consequence of the IF test. A comma (,) is the punctuation that separates the IF and the instruction which 

follows. The three most common instructions that follow an IF are: DELETE and RETAIN for conditional case


selection, and SET for variable recoding. Functions such as LAG and DIF which work across cases should seldom 

be used following an IF. See the discussions of LAG and DIF for an example. 

__________________________________________________________________________ 

Figure 2.6 IF and Missing Data 

Case 

Num Age Race 

Given these The cases below are the ones 

five cases: 1 29 2 given to a P-STAT command 

2 31 4 as a result of evaluation of 

3 - 2 the IF clauses on the left: 

4 - 4 

5 32 - 

Case 

Num Age Race 

1. IF Age LE 30, RETAIN 1 29 2 

2. IF Age GT 30, DELETE 1 29 2 

3 - 2 

4 - 4 

3. IF Age GOOD AND Race GT 3, RETAIN 2 31 4 

4. IF Age MISSING OR Race LE 3, DELETE 2 31 4 

5 32 - 

__________________________________________________________________________ 

2.25 Conditional Case Selection 

Cases may be retained or deleted as the result of logical evaluations. Figure 2.6 shows four conditional case selections 

which appear straightforward. In the first IF clause, there is no ambiguity. Only the first of the five cases 

in the figure has a value for variable Age which is both non-missing and less than or equal to 30. Since action is 

taken only when the result of an IF is true, only that one case is retained. 

The second IF in Figure 2.6 looks like the first IF. The second and fifth cases, which have non-missing values 

greater than 30, are deleted. However, the third and fourth cases, which were not retained in the first IF because 

of missing values on Age, are not deleted in the second IF for the same reason. When a value is missing, the result 

of an IF is missing rather than true. Unless explicitly specified otherwise, actions that follow an IF are done only 

when the result of the IF is true. If the result is false or missing, the action is not done. 

The fourth IF in Figure 2.6 is similarly affected because there is a missing value of Race in case 5. Since the 

result of the IF will be missing for that case, it is not deleted from the file. 

2.26 Conditional Modification 

The keyword SET can either begin a modification clause or it can be used as an instruction following an IF. In 

each of these examples, the IF expression is evaluated first: 

-------- THE IF -------- -------- THE SET -------- 

[IF Age EQ 1, SET Age = 99 ]


[IF Income MISSING, SET Income = 0 ] 

[IF Test3 LT Test1, SET Sum = Test2 / 2 ] 

[IF Total EQ V(1) + 3, SET Sum = Count * 2 ] 

[IF Sum - .5 LT 0, F.SET V(1) = V(3) ] 

[IF MEAN (T1 TO T3) LT 65, SET Sum = .M. ] 

Except in the fifth example, the SET is done only if the expression is true. In the fifth example, F.SET causes the 

SET to occur only if the expression is false. 

In some situations, an IF test may have more than one desired consequence. Multiple instructions may directly 

follow the IF, within the same clause. In the following example, if Work.Status equals 3, four instructions 

follow — a RETAIN, a and three SETS: 

[ IF Work.Status EQ 3, RETAIN, 

SET Current.Job = 0, 

SET Current.Income = 0, 

SET Total.Hours = .M1. ] 

2.27 Three-Way Logic of IF Statements 

Three-way (true, false and missing) logic in the evaluation of IF statements is powerful and gives precise control 

over data: 

[ IF Age GE 18, T.SET Voter = 1, 

FM.SET Voter = 0 ] 

However, to use this power and obtain the expected results, consideration must be given to the treatment of missing 

data. 

This is especially true with logical selection of cases. DELETE is occasionally useful, but RETAIN is better 

because its treatment of missing data is more natural. The action which follows the IF is normally done only if 

the result of the IF is true. However, it is possible to direct the action explicitly by using the prefixes T, F and M 

before the action instruction. The consequence DELETE actually means T.DELETE or delete if true. TM.DE- 

LETE deletes a case if the result of the IF is either true or missing and yields the expected result. Thus: 

[ IF Age GT 30, TM.DELETE ] is the same as 

[ IF Age LE 30, T.RETAIN ] which is the same as 

[ IF Age LE 30, RETAIN ] 

Similarly, F.DELETE means delete if the result of the IF is false. 

There may be multiple consequences of a given IF. he following are possible combinations of instructions: 

[ IF logical expression, T.SET ..., F.SET ..., M.SET ... ] 

[ IF logical expression, TM.SET ..., F.SET ... ] 

[ IF logical expression, TFM.SET ..., FM.SET ... ] 

All combinations of T, F and M, in any order, are permitted as prefixes to the consequences of an IF. TFM.SET 

causes the action to occur, whatever the result of the IF. 

If a prefix is not given, T is always assumed no matter what prefix was used in the previous consequence: 

[ IF Age GT 18, F.SET Minor.Child = 1, 

SET Voter = 1 ] 

The variable Voter is set to 1 if the expression is true. 

In this example, the consequences are more complex: 

[ IF Sex EQ 1 AND Work.Status GE 2, 

T.SET Occupation = Last.Occup, 

F.SET Occupation = Current.Occup,


M.DELETE ] 

The evaluation still returns a single logical result of true, false or missing, and the various actions are done 

accordingly. 

2.28 Renaming Variables 

RENAME is the PPL instruction that is used to rename individual variables. 

[ RENAME Test1 TO Math121; 

RENAME V(2) TO Chem34 ] 

RENAME requires the existing name, TO, and the new name, which must be a unique name in the file. If you 

wish to rename most of the variables in the file with names that have no particular pattern you can use a MODIFY 

with an on-the-fly concatenation of files which is described in the chapter “PPL:MODIFY, PROCESS and PUT”.. 

If you wish to rename a group of variables using a pattern such as a prefix, suffix, or sequence number, see the 

chapter “PPL:DO LOOPS and IF-THEN-ELSE Blocks”.


PPL 

SUMMARY 

Programming language modifications may be used in the MODIFY command or in any other P-STAT 

command. PPL statements begin with a left bracket and end with a right bracket. They follow the input 

file directly (with no intervening punctuation): 

LIST Patients [ DROP Hospital ; 

IF Age GE 65, RETAIN ] , 

BY Diagnosis, MEAN Length.of.Stay $ 

Modifications are done in the programming language using: 

• Instructions 

• Operators 

• Functions 

• System variables 

Both character and numeric variables may be modified. Some instructions, operators and functions apply 

to both types of variables, and others apply only to one type. Some system variables take on both character 

and numeric values, and others take on only one type of value. 

Wildcards may be used anywhere that the name of a variable could be used 

[ KEEP ?test Weight?] 

or to request that P-STAT supply a name for a new variable. For example: 

[ GENERATE ? = V(2) / V(3) ] 

[ GEN ?:C = 'No comments' ] 

The “question mark” (?) is the wildcard character. Wildcards may be used in lists — in KEEP, DROP, 

SPLIT, COLLECT and DO loop instructions, after ANY and ALL operators, and following list functions 

such as MEAN, SUM, MAX, MIN and SDEV. 

Comments may be interspersed among PPL clauses: 

[/* Selecting cases with outstanding balances */ ; 

IF Amount.Owed GT 0, RETAIN ] 

The whole comment is a PPL clause following either a left bracket or a semicolon. The comment text 

follows “/*” and is followed by “*/”. PPL comments document modifications within a command. 

The C.TRANSPOSE command may be used to rotate a newly-modified file, 

C.TRANSPOSE File12 [ CASES 1 TO 10 ], OUT File12.Chr $ 

producing an output file containing character representations of the data in the original file. In the transposed 

file, the variables (columns) are Variable, Case.1, Case.2, Case.3 and so on. The cases (rows) are 

the names and values of all of the variables. Thus, the first 10 or so cases in the file may be examined in 

a concise printout — use LIST with FOLD, if necessary. FOLD causes long character variables to be 

broken into pieces and printed on several lines. 

nn=number variable name/position vn=variable name exp=expression


PPL Instructions 

The following instructions may begin a modification clause. CASE selections are done before other 

modifications. 

CASES nn nn 

specifies a list of the positions (in ascending order) of cases to be selected: 

[ CASES 2 5 11 TO 99 333 .ON. ] 

Either CASE or CASES may be used for case selection. (ROW and ROWS are synonyms.) Case selections 

are done before all other PPL modifications. 

DECREASE vnp 

recodes an existing numeric variable by decreasing its value by 1 or a specified amount: 

[ DECREASE Counter ; 

DEC Days BY 7 ; 

DEC Profit BY Expenses ] 

DEC is an abbreviation for DECREASE. 

DROP vnp vnp 

DELETE 

specifies a list of variables, by name or position, to be dropped: 

[ DROP Income V(4) TO V(10) V(26) .ON. ] 

Unspecified variables in the input file are kept. .ON. means “on through the end of the variables in the 

file.” Either KEEP or DROP may be used for variable selection. Wildcards, .NUMERIC., .CHARAC- 

TER., and .NEW. can be used. 

specifies that the current case not pass to any subsequent PPL clauses or to the command in use. Cases 

not deleted are retained. DELETE is used as a consequence of an IF test. 

GENERATE vn = exp 

creates a new numeric or character variable: 

[ GENERATE Average = MEAN ( Score1 Score2 ) ; 

GEN Current.Age = Year - Birth.Year ; 

GEN Area.Code:C = '609' ] 

GENERATE requires a new variable name. If the new variable is a character variable, the name must be 

followed by “:C” ,“:Cnn”, “:nn” or “:cnn”, where nn is a number indicating the maximum number of 

characters in the variable. When the number (nn) is not supplied, 16 is assumed. GENERATE may be 

abbreviated to GEN. 

IF exp op exp, consequence 

specifies a logical selection. The format of an IF clause is: 

[ IF exp logical operator exp , consequence ] 

[ IF Age LE 65 , RETAIN ] 

[ IF City EQ 'Miami' , DELETE ] 

[ IF (V(4) + 1) EQ V(5) , DEC V(4) ] 

[ IF 'yes' EQ Answer.4 , DELETE ] 

vn=variable name exp=expression nn=number variable name/position


The expressions may be simple or complex numeric or character expressions. Character strings must be 

enclosed in single or double quotes. The logical operators may be any of those in the subsequent section: 

PPL Logical Operators. The consequences may be any of these PPL instructions: DELETE, RETAIN, 

SET, INCREASE or DECREASE. (Additional instructions which may be used as consequences of an 

IF are explained and summarized in the second of the PPL chapters.) 

Consequences may be prefixed with T, F, or M, singly (T.SET ... , F.DELETE ... ) or in combination 

(FM.SET ... , TFM.INCREASE ... ), to direct whether the consequence should be performed when the 

result of the IF is true, false or missing. T is assumed if no prefix is supplied. 

INCREASE vnp 

recodes an existing numeric variable by increasing its value by 1 or a specified amount: 

[ INCREASE Counter ; 

INC Days BY 7 ; 

INC Profit BY Sales ] 

INC is an abbreviation for INCREASE. 

KEEP vnp vnp 

specifies a list of variables, by name or position, to be kept or simply reordered: 

[ KEEP Name V(13) TO V(44) Education V(49) ] 

Unspecified variables in the input file are dropped. The system variables .NEW. and .OTHERS. may be 

used with KEEP to refer to variables newly generated in this command and any variables not explicitly 

mentioned: 

[ KEEP .NEW. ID.Number .OTHERS. ] 

Either KEEP or DROP may be used for variable selection. 

Variables may be referenced with a subscript-type notation: V(2) means the variable in the second position 

from the left of the file. KEEP may also be followed by a wildcard to reference variables with a 

common prefix or suffix, and the system variables .NUMERIC., .CHARACTER, .NEW., .ON., and 

.OTHERS. . 

[ KEEP V(3) Score.? .CHARACTER. ] 

[ KEEP .NUMERIC. .OTHERS. ] 

RENAME vn TO vn 

RETAIN 

renames an existing variable with a new variable name 

[ RENAME VAR1 TO Age; RENAME VAR2 TO Income ] 

specifies that the current case pass to the next PPL clause, or if there are no additional clauses, to the current 

command. Cases not retained are deleted. RETAIN is generally used as a consequence of an IF test. 

(CONTINUE is a synonym for RETAIN.) 

SET vnp = exp 

recodes an existing numeric or character variable: 

[ SET Height = Height / 12 ; 

SET City = 'Princeton' ; 

SET V(2) = V(1) ] 

The expression following the equal-sign may be a simple or complex numeric or character expression. 

Character constants must be enclosed in single or double quotes. 



PPL Operators: Logical 

Logical operators are used in logical selection (IF) clauses. They permit comparisons between two expressions. 

The expressions may be either both numeric or both character expressions. The evaluation of 

the comparison is true, false or missing: the evaluation is missing when one of the expressions is missing. 

Either the character representation of the logical operator ( EQ ) or the equivalent symbol ( = ), where it 

exists, may be used. 

The logical operators may be prefaced with “X” for eXact comparisons of character strings — these comparisons 

respect the case (upper, lower or mixed) of the string as well as the literal characters. See 

Chapter 9, “PPL:Date and Time Commands and Functions” for a description of the 6 date/time logical 

operators. 

EQ = equal 

[ IF City EQ 'Tray', SET City = 'Troy' ] 

The result may be true, false or missing. Comparisons of character strings are case independent: “troy” 

equals “Troy”. Leading blanks are characters: “ Troy” does not equal “Troy”. Other operators that are 

supported are: LE, LT, GE, and GT. 

XEQ exactly EQ 

[ IF Initial XEQ 'R', RETAIN ] 

Comparisons of character strings respect case when the logical operator is prefaced with “X”. Other operators 

that are supported are XLE, XLT, XGE, and XGT. 

NE ^= not equal 

[ IF Zip NE 11234, DELETE ] 

Missing values of the variable Zip, in the example above, are not deleted. If the consequence is TM.DE- 

LETE rather than DELETE, deletion occurs when the consequence is either true or missing. 

XNE exactly NE 

[ IF Accept.Reject XNE 'F', SET Score = 'Pass' ] 

ALL (vnp list) 

tests all the values of the variables in the list: 

[ IF ALL ( Test.1 TO Test.5 ) GOOD, RETAIN ] 

All the relationships must be evaluated as true for the clause to be true. ALL is equivalent to a series of 

ANDs. 

AMONG (list of values and variables) 

tests whether the value of the specified variable is among the values in the list: 

[ IF Area AMONG ( 201 609 908 ), SET State = 'NJ' ] 

[ IF Name AMONG ( 'A' TO 'Mz' ), TM.DELETE ] 

The system variables for missing values (.M., .M1., etc.) may be included in the list of values following 

AMONG. 


XAMONG (list of values and variables) 

AND 

respects case in testing character values: 

[ IF Symptom XAMONG ( 'a', 'A', 'Aa' ), RETAIN ] 

links two logical relationships: 

[ IF Sex EQ 1 AND Age GE 21, RETAIN ] 

Both relationships must evaluate as true, or the entire clause is false or missing. 

ANY (vnp list) 

GOOD 

tests the values of the variables in the list, until one relationship is evaluated as true: 

[ IF ANY ( Test.1 TO Test.5 ) LT 65, RETAIN ] 

ANY is equivalent to a series of ORs. 

tests for good (non-missing) values. GOOD combines “=” with .G.. , the system value for good values. 

The following are equivalent: 

[ IF ID GOOD , RETAIN ] 

[ IF ID EQ .G. , RETAIN ] 

INRANGE ( exp, exp ) 

MISSING 

tests whether an expression is within the range expressed by the first (low) value and the second (high 

value). 

[ IF TestScore INRANGE [ 91, 100 ], SET Grade = 'A' ] 

tests for missing (non-good) values. MISSING combines “=” with .M. , the system value for missing or 

non-good values. The following are equivalent: 

[ IF ID MISSING , DELETE ] 

[ IF ID EQ .M. , DELETE ] 

NOTAMONG (list of values and variables) 

tests whether the values of the specified variable are not among the values in the list: 

[ IF Age NOTAMONG ( 5, 7 TO 10 ), DELETE ] 

[ IF Sex NOTAMONG ('f', 'female'), DELETE ] 

In the above examples, cases with values not among the specified values are deleted. Cases with missing 

values are retained. 

XNOTAMONG (list of values and variables) 

OR 

respects case in testing character string values. 

links two logical relationships: 

[ IF Sex EQ 2 OR Age LT 21, DELETE ] 

Only one of the relationships need be true for the entire clause to be true.


OUTRANGE ( exp, exp ) 

tests whether an expression is outside that range specified by the two expressions. 

[ IF Age OUTRANGE ( 19, 65 ), DELETE ] 

PPL Operators: Numeric 

Numeric (arithmetic) operators are used between numeric values. Parentheses (as well as nested parentheses) 

indicate the desired order of operations. When parentheses are not used, the precedence or order 

of operations is: exponentiation, multiplication and division, addition and subtraction. When there is a 

series of additions and subtractions or multiplications and divisions, they are performed from left to right. 

When there is a series of exponentiations, they are performed from right to left. 

** exponentiation 

[ SET Type = Code ** 2 ] 

* multiplication 

[ GENERATE Circumference = Pi * Diameter ] 

/ division 

[ IF V(4) NE 0, SET V(6) = 56089 / V(4) ] 

+ addition 

[ GENERATE F = ( ( 9/5 ) * C ) + 32 ] 

- subtraction 

[ SET Commission = .25 * Sales - 5 ] 




3 


MODIFY, PROCESS 

and PUT 

The previous PPL chapter covered the basics of data modification using the P-STAT Programming Language 

(PPL). This chapter provides information about: 

• using the MODIFY command to save the results 

• “on-the-fly” concatenation and modification of multiple files 

• repeating cases in a file using the REPEAT instruction 

• the instructions GOTO, PUT, PUTL, QUITFILE, QUITCOMMAND, QUITRUN. 

• use of the PROCESS and standalone PPL commands. 

3.1 FILE MODIFICATION 

Files are typically modified to “clean data” — that is, to detect and correct errors, and to select and possibly transform 

the variables needed for analysis. While it is theoretically possible to code and enter data and to make a file 

that is satisfactory for a series of runs, it is unlikely to happen. Sometimes there are variables in the data which 

contain more information than is needed. Other times, the variables desired are logical transformations of one or 

more of the original variables. 

If the data values are modified during the initial stages, the original information is no longer readily available. 

Thus, it is generally best to enter all the original data and then, using data modifications and selections, create a 

second file that contains only the necessary modified variables. It is easier to search the first P-STAT file for any 

information that is needed later, than it is to go back to the original input records or coding sheets. 

A common sequence in readying a file for analysis is to make a P-STAT file, examine it, and then to clean it 

up by modifying it as necessary. Appropriate variables are selected and changed into the desired form. New variables 

are generated. Consistency checks for possible errors are made. 

For example, the variable Age, coded in years, could be collapsed into five-year age groups. At the same time, 

a new variable, Age.Sex.Groups, can be generated with four categories: men under 30, men 30 and over, women 

under 30, and women 30 and over. A consistency check can be made to see that no males have had pregnancies, 

and inconsistent data can be converted to one of the missing values. 

The goal is to obtain a good file that can be saved and used as the basis for the rest of the analyses. After data 

cleaning, the number of transformations and selections needed for any given analysis is minimized. For example, 

you may wish to select only women for some runs and only respondents with good (non-missing) values on particular 

variables for other runs. Or you may want to use the natural log of the variable Income. These selections 

and transformations may be done “on-the-fly”, as the analysis proceeds. 

3.2 How Modifications Are Processed 

When a P-STAT command program reads a case of data, it calls a system routine that reads P-STAT files. This 

routine reads a case of data, applies any specified modifications to the case, and sends the modified case of data 

to the calling program. In this example, each case of data received by the COUNT command contains only two 

variables, Age and Sex, in positions 1 and 2, respectively:

3.2 PPL: MODIFY, PROCESS and PUT 

COUNT Survey [ KEEP Age Sex ] $ 

The file Survey, which contains 50 variables including Age in position 45 and Sex in position 46, remains 

unchanged. 

Modification clauses are executed in the order in which they appear, except for case (row) selection which is 

processed first. Variable selection, followed by additional modification clauses, should include all variables needed 

by the clauses which follow. If they are not included, an error message states that the variable was not found. 

Modifications apply to input files, not to output files. The modification clauses (enclosed in brackets) follow 

directly after the input filename: 

MODIFY TestFile [ KEEP ID TO Occupation ], OUT TestNew $ 

LIST TestNew [ CASES 1 TO 10 ] $ 

When an output file is produced, any modifications made to the input cases are reflected in it. File TestNew has 

only the variables ID through Occupation. The listing of TestNew shows only the first ten cases even though all 

the cases from file TestFile are represented in file TestNew. 

3.3 Temporary Modifications 

In P-STAT, temporary modifications may be done to any file when it is read by any P-STAT command. However, 

unless output files are created, these modifications are not saved in new P-STAT files. They are not available for 

use during the remainder of the run or in a subsequent run. For example: 

SURVEY S1099 

[ SET Income = Income / 100 ] ; 

When the cases of data in file S1099 are sent to the SURVEY command, the values of the variable Income are 

divided by 100. SURVEY uses these new values for Income when processing the data. However, the values in 

the input file S1099 remain in their original form. If you wish to do another operation with Income similarly modified, 

the modification must be done again: 

LIST S1099 

[ SET Income = Income / 100 ] $ 

These modifications are sometimes called “on-the-fly” modifications because they are done at the spur of the moment 

or just as they are needed. This on-the-fly modification: 

SURVEY Families 

[ GEN Family.Income = Fathers.Income + Mothers.Income] ; 

STUB Social.Class, BANNER Children, MEANS Family.Income $ 

creates the new variable, Family.Income, which exists only as each case of the file is passed to the SURVEY command. 

It is not available for use after exiting from SURVEY. 

3.4 Permanent Modifications and the MODIFY Command 

If a file with the same modifications is to be used over and over, it makes sense to do the modifications only once 

and save the results as a new P-STAT file. An output file reflects the modifications done to the input file or files 

used to create it. This is true whether the output file comes from the MODIFY, CONCAT, SORT or LOOKUP 

commands, or any other P-STAT command which produces output files. 

The modification procedure is the same, regardless of which command is used. However, only the MODIFY 

command processes the specified commands without doing anything else but producing an output file. (The 

SORT command sorts the cases in addition to producing an output file; the CONCAT command joins several files 

in producing the output file, and so on.) 

The MODIFY command produces an output file when the identifier OUT is used. It also produces a description 

file when the identifier DES is used. It describes the modified or output file. MODIFY usually requires the

PPL: MODIFY, PROCESS and PUT 3.3 

name of the input file to be modified, various modification clauses, and the identifier OUT followed by a name 

for the new output file. The modifications are permanent because their results are contained in the output file, 

which can be used throughout the remainder of the run and in subsequent runs. 

The MODIFY command reads and writes cases of data. Since it receives each case after any modifications 

have been done, the cases that are written out reflect all changes and selections. The first phrase begins with the 

command name MODIFY and includes the input filename and all the modification clauses which are to be applied 

to that file. It is the comma following the final modification clause which signals that the phrase is completed. 

The format of a phrase is: 

MODIFY FileName [ ; ; ; ] [ ; ], 

Figure 3.1 illustrates using the MODIFY command to produce a new output file, which is a permanent modification 

of the input file. The MODIFY command has two phrases. The first is the command MODIFY and its 

argument — the name of the input file followed by the modification clauses. The second is “OUT S1099B”, 

which supplies the name to be given to the output file. 

__________________________________________________________________________ 

Figure 3.1 Permanent Modifications 

MODIFY S1099 

[ GENERATE Coded.Age; 

SET Occupation = Occupation / 100 ; 

SET Coded.Age = INT ( Age / 10 ) ; 

KEEP Occupation Sex Coded.Age Race Children Siblings ], 

OUT S1099B $ 

__________________________________________________________________________ 

The modification clauses create the new variable Coded.Age, modify Occupation and Coded.Age, and select 

specific variables. The cases written in the new output file S1099B contain only six variables, including the new 

values for variable Occupation and the newly created variable Coded.Age. At this point, files S1099 and S1099B 

are both available to any P-STAT commands that follow. S1099 contains all the original data. S1099B contains 

the modified data. 

3.5 TEMPLATE Files 

A template file may be given to the MODIFY command to select the desired variables and, perhaps, to specify a 

changed ordering of those variables. It has much the same effect as a KEEP phrase ending the PPL. 

MODIFY Class89 

[ IF ANY ( V(1) .ON. ) MISSING, DELETE ], 

TEMPLATE Class88, 

OUT Classes $ 

Figure 3.2 shows an input file, a template file, a MODIFY command and the resulting output file. The output 

file contains all the variable in the template file in the order of the template file. Because variable “b” in file Testfile 

is not one of the variables in the template, it is not moved to the output file. Because variable “d” is a template 

file variable that is not present in file Testfile, it is set to missing for all the cases in the output file. 

The IF test deletes any case that is missing on any variable in the input file. Therefore, the second case is not 

written to the output file even though the only missing value is on variable “b” which is not one of the variables 

in the template file.


__________________________________________________________________________ 

Figure 3.2 Template Files 

FILE Testfile 

a b c 

1 1 1 

2 - 2 

3 3 3 

MAKE Temp, VARS a c d; 

$ 

----------------MAKE completed---------------- 

| P-STAT file temp has been created. | 

| It has 0 cases and 3 variables. | 

| | 

| Two delimiters were used: BLANK and COMMA. | 

---------------------------------------------- 

MODIFY Testfile 

[ IF ANY ( V(1) .ON. MISSING, DELETE ], 

TEMPLATE Temp, 

OUT Newtest $ 

FILE Newtest 

a c d 

1 1 - 

3 3 - 

__________________________________________________________________________ 

If a file does not already exist with appropriate variable names, a null file, a file with only variable names and 

no cases, may be created to serve as a template. This MAKE of the template file in Figure 3.2 shows both the 

command and the report from the MAKE command. 

Sometimes an additional copy of a file is all that is desired. Here, there are no modification clauses, so file B 

is an exact copy of A: 

MODIFY A, OUT B $ 

The number and names of the variables, as well as the data, are the same in both files. 

3.6 On-the-Fly Concatenation of Files 

Multiple files may be read by any P-STAT command. The “+” operator concatenates the files “on-the-fly” — as 

they are read by a command: 

MODIFY A + B + C + D, OUT ABCD $ 

The data from files A, B, C and D are passed to the MODIFY command, one case after the other.


__________________________________________________________________________ 

Figure 3.3 Renaming All the Variables in a File 

File F1 

VAR1 VAR2 VAR3 VAR4 

22 20100 40 f 

24 18400 36 f 

31 31000 40 m 

33 35000 49 f 

MAKE Names; 

VARS Age Income Hours Sex:c; 

$ 

MODIFY Names + F1, OUT F1 $ 

File F1 

Age Income Hours Sex 

22 20100 40 f 

24 18400 36 f 

31 31000 40 m 

33 35000 49 f 

__________________________________________________________________________ 

On-the-fly concatenation of files requires that the number of variables be the same in all the input files, and 

that corresponding variables be of the same data type (numeric or character). The variables may have different 

names. Therefore, it is up to you to ensure that the contents are the same. The first file may be a template file, 

that is, a file with no cases used only to supply variable names for the output file. Figure 3.3 illustrates the use of 

a file with no data records to rename the variables in an existing P-STAT system file. 

Often the modifications done to one file in on-the-fly concatenation are necessary for every file. The notation 

[ * ] is a shortcut specifying that the modifications done to the previous file be applied to the current file. This 

command: 

MODIFY A [ GENERATE Profit = Gross - Expenses ; 

KEEP Company TO Zip, Expenses ] 

+ B [ GENERATE Profit = Gross - Expenses ; 

KEEP Company TO Zip, Expenses ), OUT C $ 

may be shortened to this equivalent one: 

MODIFY A [ GENERATE Profit = Gross - Expenses ; 

KEEP Company TO Zip, Expenses ] 

+ B [ * ], OUT C $ 

In on-the-fly concatenation, when files are referenced without modification or with the [ * ] indicating exactly 

the same modifications, the files are treated as one single file. Thus, across-case functions, such as FIRST, LAST, 

SPLIT and COLLECT, operate across the files. If the files have different modifications, across-case functions


operate within only one file. For example, FIRST and LAST are reset as each file is processed. Across case modification 

is discussed in detail in the chapter “PPL: Across-Case Modifications”. 

Sometimes it is useful to include the cases in a file two or more times. For example, this permits estimation 

of the time it takes to process a large file without actually creating it. On-the-fly concatenation of a file with itself 

accomplishes this. The file is reread, in effect, joined to itself: 

MODIFY Y + Y + Y, OUT C $ 

The MODIFY command receives every case of file Y, followed by every case of Y a second time, followed by 

every case of Y a third time. 

3.7 Repeating Cases 

Cases in a P-STAT system file may be read more than once using the REPEAT instruction. REPEAT is followed 

by an integer, a variable that has an integer value, or an expression that reduces to an integer. This instruction 

repeats each case in the file five times: 

[ REPEAT 5 ] 

A case is repeated at the point REPEAT is encountered in a sequence of PPL instructions. Thus, some instructions 

may precede the REPEAT and some may follow it. The system variable .N., which is the case number, 

is not changed by repetition. The system variable .HERE., which is the count of cases processed at a given point, 

is changed by repetition. Thus, it is possible to test both the input case number and the output case number. Figure 

3.4 illustrates this procedure with a LIST command. 

__________________________________________________________________________ 

Figure 3.4 Repeating Cases 

LIST Subjects 

[ CASES 1 TO 3 ; REPEAT 2; 

GEN Input.Case = .N., GEN Output.Case = .HERE. ; 

IF MOD (Output.Case, 2) = 0, 

SET Test.1 = .M., SET Test.2 = .M., SET Test.3 = .M. ] $ 

Test Test Test Input Output 

ID Age Sex .1 .2 .3 Case Case 

785001 1 1 94 89 97 1 1 

785001 1 1 - - - 1 2 

785002 2 1 78 82 85 2 3 

785002 2 1 - - - 2 4 

785006 1 1 71 70 75 3 5 

785006 1 1 - - - 3 6 

__________________________________________________________________________ 

In Figure 3.4, the MOD function, which does modulo arithmetic, is used to test for an even number — that 

is, a second case. Test values are set to missing in these cases. Second semester test results could be added to 

these cases as they become available. (Functions and system variables are discussed fully in later chapters of this 

manual.) REPEAT may be used with any other PPL instructions and functions except after IF, and in the same 

command as SPLIT, COLLECT, FIRST or LAST. 

A file of random data may be generated by building a file with just one case and repeating it as many times 

as desired. Here, the single case in the file Random is repeated 100 times:


MOD Random 

[ REPEAT 100 ; 

SET Random.Number = ( RANNORM (0) * 2.8 ) + 24 ], 

OUT RandomX $ 

Then, the single variable Random.Number is set equal to a random number generated by the RANNORM function. 

The random number is multiplied by 2.8 and added to 24 to shift the standard deviation of 1 and the mean 

of 0 to these values. (Random numbers functions are discussed in more detail later in this manual.) 

Simple integer weighting of cases may be done using REPEAT. These instructions weight younger respondents 

(cases) twice as much as older ones: 

[ GEN #Wt = 1 ; 

IF Age LT 21, SET #Wt = 2 ; 

REPEAT #Wt ] 

The scratch (temporary) variable #Wt is generated equal to 1. It it reset to 2 for younger respondents. A case is 

repeated as many times as the value of #Wt for that case. (Scratch variables are explained in more detail in the 

discussion of across case modification later in this manual.) 

Weighting is often done when different population subgroups have been sampled and they are not representative 

of their real proportion in the population. Using REPEAT is not necessarily good weighting technique, 

because only integer weighting of each case is possible. Non-integer weighting is done using the WEIGHT identifier 

and a weighting variable in commands such as COUNTS and SURVEY. The WEIGHT identifier permits 

appropriate fractional weights to be applied to each case of data as it is processed by a command. Also, using 

WEIGHT is faster than using REPEAT to weight cases. See the description of the BALANCE command, which 

computes weights for sample balancing, in the manual “P-STAT: The SURVEY, BALANCE and SAMPLE 

Commands:”. 

3.8 OTHER INSTRUCTIONS AFTER IF 

Of the instructions that specify the action to take as a consequence of an IF test, RETAIN, DELETE, SET, IN- 

CREASE and DECREASE, are the most useful and common. (These are fully explained in the second of the PPL 

chapters.) However, there are additional instructions that may either follow an IF test as possible consequences, 

or, in some situations, used alone. 

3.9 GOTO To Process Modifications Selectively 

The GOTO instruction (GO TO, with a blank between the two words, is a synonym) permits modification clauses 

to be conditionally omitted or repeated. Clauses are generally omitted when control transfers to clauses after the 

current one (downward), and repeated when control transfers to clauses before the current one (upward). The only 

constraint is that the omitted section cannot contain phrases such as KEEP, DROP, or GENERATE that change 

the number or order of the variables. 

GOTO is generally the consequence of an IF test: 

[ IF Sex EQ 1, GOTO Male; 

GOTO is followed by the label of the clause to which control is to pass. If the value of Sex is 1, control passes to 

the modification clause beginning with the label “Male:”: 

Male: IF Live.Births GOOD, ... ; 

The label must be followed by a colon (:) and it must begin with a modification clause. Figure 3.5 illustrates using 

GOTO to execute different modification clauses for males and females. 

Labels may be followed by an instruction or they may be null labels (not followed by an instruction) as in “Next: 

]”, the last clause in Figure 3.5. Null labels often provide a destination. Labels may also be simply informative, as 

in “Female: ... ”, the third clause in Figure 3.5, and not a destination.


__________________________________________________________________________ 

Figure 3.5 Using GOTO and PUT 

MODIFY People 

[ IF Region = 3, RETAIN; 

IF Sex = 1, GOTO Male ] 

[ Female: IF Occupation AMONG ( 43, 56 TO 59, 72, 78 ), 

PUT 'Check Occupation of ' Name '. Occupation: ' Occupation ; 

IF Military.Service = 4, 

PUT 'Check Service Record of ' Name '.' ; 

GOTO Next ] 

[ Male: IF Live.Births GOOD, 

PUT 'Invalid Live.Births of ' Live.Births ' for ' Name '.', 

SET Live.Births = .M3. ; 

SET Military.Service = NCOT ( Military.Service, 2 ) ] 

[ Next: ], OUT People2 $ 

__________________________________________________________________________ 

Using GOTO makes for clearer logic when there are a series of modifications to be performed as the result of 

a specific IF test. In Figure 3.5, left and right brackets have been used within the PPL to emphasize the structure 

of an IF, a section for cases coded as female, a section for cases coded as male, and a final section. 

3.10 Cleaning Data With PUT 

The PUT instruction prints informative messages and the values of cited variables. This is useful when cleaning 

up data prior to an analysis or when constructing a report. There is a complete list of the PUT control words in 

the summary section at the end of this chapter. 

PUT prints text or error messages and the values of the variables specified: 

[ Female: IF Occupation AMONG ( 43, 56 TO 59, 72, 78 ), 

PUT Name Occupation ] 

If Occupation is equal to 57, for example, the text and variable values are printed. The message strings, with the 

values of the variables Name and Occupation inserted, appear on the current output device. 

The text is supplied as character strings enclosed in angle brackets or in single or double quotes. It is usually 

easier to check the text for a proper beginning and end when angle brackets are used instead of the quotes. You 

can see it better and so can the scanning program. Use of the angle brackets reduces the chances for error and is, 

therefore, highly recommended. 

When variable names are used outside of a text string, the values are substituted at that point. For example: 

Check occupation of Sandy Sweet. Occupation: 57 

Using .ALL. causes all the variables of a case to be written, one after the other. Trailing blanks (on the right end) 

of the text are removed from character values. Note that variable names are not enclosed in quotes or angle brackets. 

When you wish to use an expression that is more complex than a variable name enclose it in parentheses: 

[ PUT ( GNP / 1000. ); 

Figure 3.5 illustrates using PUT with IF tests and GOTOs to locate a series of cases with miscodings and obtain 

a printed list of the errors. A new output file, containing corrections and recoded values, is also produced. 

PUT is commonly used after an IF test, although it may also be used by itself.


See also the section later in this manual on using .PUT., the system variable which keeps a count of the puts. 

It is possible to create a new file containing only the cases with questionable values on the tested variables. The 

values in this error file may then be corrected and the UPDATE command used to produce a corrected master file. 

It is also possible to print text without creating any output file. The PROCESS command works just like MODI- 

FY, but produces no output file. It is used merely to process PPL instructions, such as those that produce text 

describing erroneous variable values. 

3.11 Report Writing Using PUT and PUTL 

The PUT instruction writes reports by putting text and variables at specified locations in the output line. PUT is 

used in PPL clauses; thus, a report may be produced any time a file is read by any command. 

Figure 3.6 illustrates the use of PUT for reports. PUT and any other PPL are enclosed in brackets which follow 

the filename. Since no output file is desired, the PROCESS command is used here to process the PPL 

instructions. 

Text to be put in the output line is enclosed in paired angle brackets: 

.... PUT @5 .... 

although single or double quotes can also be used: 

.... PUT @5 'The claim for ' .... 

The column pointer, the “at sign” (@), specifies a column location. “@5” specifies that the next string or 

expression is to be placed starting in column 5. When a column location is not given, the text begins in column 1. 

Expressions must be placed within parentheses. A variable name by itself or a scratch variable name (like 

#Count) is used without parentheses: 

[ .... (First.Name /// Last.Name) .... ; 

[ .... 'A check for $' Amt.Due .... ; 

The value of First.Name is concatenated with the value of Last.Name and placed in the output line. (The concatenated 

names are one expression, and thus need parentheses.) The value of Amt.Due is placed in the output line 

after the appropriate text. 

PUT may be an instruction by itself: 

PUT > ; 

or it may follow an IF test: 

IF Claim.Num MISSING OR Claim.Amt MISSING, 

PUT @5 (First.Name /// Last.Name ) 

>, GOTO End; 

The first PUT places a blank line in the output. The second PUT is done only when the IF test is true. If either 

Claim.Num or Claim.Amt is missing, the specified text is put at column 5 in the output line, and control passes to 

the PPL clause with the label “End:”. The GOTO is also a consequence of the IF because the PUT phrase ended 

with a comma rather than a semicolon. 

The text produced by PUT continues onto subsequent lines as needed. The current line is printed when a given 

PUT finishes, unless the PUT ended with an @. In this event, the next PUT continues on the same line: 

Thus, subsequent text: 

IF Deduct.Amt GT 0, 

T.PUT @5 Deduct.Amt @ , 

F.PUT @5 @ ) 

PUT > Policy.Num 

(First.Name /// Last.Name) @ ;


__________________________________________________________________________ 

Figure 3.6 Using PUT To Produce a Report 

File Insurance: 

First Policy Deduct Claim Claim 

Name Last Name Num Amt Num Amt 

Sharon Wilson 8090564 250 6024 654.25 

Claire Mc Donald 7035631 500 8122 - 

Neil Haroldson 7469421 0 1005 490.56 

The Commands 

OUTPUT.WIDTH 70 $ 

PROCESS Insurance 

[ GENERATE Amt.Due = Claim.Amt - Deduct.Amt ; 

PUT > ; 

IF Claim.Num MISSING OR Claim.Amt MISSING, 

PUT @5 ( First.Name /// Last.Name ) 

>, GOTO End ; 

The Report 

IF Deduct.Amt GT 0, 

T.PUT @5 Deduct.Amt @ , 

F.PUT @5 @ ; 

PUT > Policy.Num 

(First.Name /// Last.Name) @ ; 

PUT Amt.Due 

> Claim.Num 

> Claim.Amt ; 

End: ] $ 

There is a deductible amount of $250 on Policy Number 8090564, 

issued to Sharon Wilson. A check for $404.25 is required in payment 

of Claim Number 6024 for $654.25. 

The claim for Claire Mc Donald is awaiting additional 

information. 

There is no deductible on Policy Number 7469421, issued to Neil 

Haroldson. A check for $490.56 is required in payment of Claim 

Number 1005 for $490.56. 

__________________________________________________________________________


follows directly after this text with no intervening spaces. Column references may position the column pointer 

both backwards and forwards on the line. Thus, the column reference @40 may be followed by @1, and text will 

be placed in column 40 and then in column 1 of the same line. The reference @NEXT positions text at the beginning 

of the next line. 

System variables may be referenced with PUT: 

PUT 'Dated: ' .DATE. ; 

System variables do not need to be enclosed in parentheses. The current value of the system variable .DATE. is 

output in the report. Using an IF test for the first case results in the date being output only once, rather than for 

each case. Character values (other than quoted text) automatically have blanks trimmed from the right end. 

PUTL puts the variable name, as well as the variable value, in the output line: 

PUTL Policy.Num Last.Name Claim.Num ; 

The text is on one line: 

Policy.Num = 8090564 Last.Name = Wilson Claim.Num = 6024 

unless it extends to subsequent lines. Placement of the variables on separate lines, centered about the equal-sign 

in column 22: 

Policy.Num = 8090564 

Last.Name = Wilson 

Claim.Num = 6024 

may be requested using @EQUAL22: 

PUTL @EQUAL22 Policy.Num Last.Name Claim.Num ; 

The column location of the equal-sign follows directly after the @EQUAL. Use of @EQUAL22:50 places two 

labeled values per line. 

The following puts all values of the case, in variable name = value format, three per line, with the equals 

placed in positions 22, 44, and 66: 

PUTL @EQUAL22:44:66 .ALL.; 

__________________________________________________________________________ 

Figure 3.7 Accessing the Variable Name Within a Report 

PROCESS Rawdata[ CASE 1; 

PUT .file. 

@SKIP @3 ( VARNAME (1) ) @20 V(1) 

@NEXT @3 ( VARNAME (2) ) @20 V(2) 

@NEXT @3 ( VARNAME (3) ) @20 v(3) 

@NEXT @3 ( VARNAME (4) ) @20 V(4) @SKIP ]$ 

Values from Rawdata 

Age = 13 

Income = 1350 

Hours = - 

Sex = m 

__________________________________________________________________________ 

PUTL refers to variables (including scratch variables) by name. However, the VARNAME function may be 

used to label values, printed by PUT, that are referenced by position. Figure 3.7 illustrates the VARNAME function 

in a PUT using the PROCESS command to access the first case of data. When there are only 4 variables the


brute force usage in Figure 3.7 is possibly acceptable. However, the chapter “PPL: DO LOOPS and IF-THEN- 

ELSE Blocks” provides a much easier way when there are many repetitions to be done. 

3.12 STANDALONE PPL COMMANDS AND PROCESS 

Standalone PPL commands, which have neither an input nor an output file, are used to work with scratch variables, 

the P vector or user-defined arrays. The PROCESS command is used when you need information from a 

P-STAT system file but do not need an output file. This can also be done by using MODIFY with no output files, 

but MODIFY provides a brief report of its activity and PROCESS is silent. 

3.13 Scratch Variables and Standalone PPL 

The variable #Wt in the previous section was created as a “scratch” variable. This is a variable that does not exist 

in a file. Since it is independent of the file, it can be set and then used within a case, across cases or, in the case 

of a permanent scratch variable across commands. 

Scratch variable Permanent Scratch Variable 

Rules for name #scratch ##scratch 

Exists for a single command across commands 

The scratch variable can be either character or numeric 

GEN #Wt = SQRT ( Age/10 + Sex ); 

GEN ##Study:C23 = “Study 1034: August 1994” 

If the scratch variable is created for use in later commands, it must have the double ## as a prefix. Variables of 

this type are often used to move information between commands. A scratch variable can be moved into a file as 

a regular variable by including it in a KEEP. The initial # or ## is removed to create a legal name which must not 

conflict with the names of other variables in the file. 

The MODIFY command is designed to take an input file, modify it in some way, and produce an output file 

which reflects the modifications. The PROCESS command is designed to take an input file and use the values in 

the cases to create scratch variables for use in subsequent commands or as a vehicle for the PUT and PUTL commands 

to create a report. Standalone PPL commands are used to manipulate elements such as scratch variables 

and system variables that are not associated with a file. #Num works here because this is 1 standalone PPL. 

GEN #Num = 154362; 

PUT #Num > ( SQRT(#Num)) $ 

The PPL keywords that can be used as standalone PPL commands are: 

1. IF 

2. SET, INCREASE and DECREASE 

3. GENERATE 

4. PUT and PUTL 

5. BRANCH 

6. DIALOG 

7. IF-THEN-ELSE-ENDIF. 

8. DO LOOPS 

The last three are covered in the next chapters. The following are examples of standalone PPL commands 

PUT SQRT ( 13562 ) $ 

PUT .DATE. $ 

GENERATE ##COUNTER = 0 $


PUT SQRT ( 13562 ) $ 

PUT .DATE. $ 

GENERATE ##COUNTER = 0 $ 

3.14 The PROCESS Command and More PUT Information 

The process command is often used to accumulate summary information about the input file. This information is 

stored in scratch variables or the permanent vector and is then available for subsequent commands. 

Figure 3.8 shows the use of the PROCESS command to count the total number of cases in the file as well as 

the number of cases with non-missing data. Scratch variables ##cases and ##good are first created with GENER- 

ATE used as a stand-alone command. Both of these variables must be created as permanent scratch variables with 

the double pound (##) sign so that they will exist across commands. 

__________________________________________________________________________ 

Figure 3.8 PROCESS: Counting Cases 

File Testfile 

a b c 

1 - 1 

2 2 2 

3 3 3 

GEN ##cases = 0, GEN ##GOOD = 0$ 

. PROCESS Testfile 

[ INCREASE ##cases; 

IF ALL ( V(1) .ON. ) GOOD, INCREASE ##good; ] 

$ 

PUT ##CASES > 

@NEXT 

##good > $ 

File Testfile has 3 cases. 

There are 2 cases with no missing data. 

__________________________________________________________________________ 

The PROCESS command increases ##cases as each row is read. ##good is only increased when the IF test 

is true. When the PROCESS command is complete, PUT is used to write the results. Each time a PUT is executed 

it starts on a new line unless the “@” sign was used to end a previous PUT. Each PUT usually continues across 

lines until it is complete unless the @NEXT instruction is used to cause a line change. @SKIP may be used to 

cause a blank line. @PAGE may be used to cause a page change. Many of the controls that can be used with the 

TEXTWRITER command can also be used with the PUT instruction. See the chapter TEXTWRITER for more 

details.


3.15 COMMENTS 

Comments may be included either between commands or as phrases within the PPL. A comment begins with /* 

and ends with */ . For example. 

/* Feb. 16, 2010. Clean the data 

and generate new variables. 

*/ 

MODIFY Myfile [ 

/* Age recoded into 3 age groups */ 

GEN Coded.Age = 1; 

IF Age GT 20, SET Coded.Age = 2; 

IF Age GT 30, SET Coded Age = 3; 

/* Note: This assumes Age is never missing */ 

], OUT Myfile $ 

The first of the three comments in this example occurs between commands. The other two comments are in the 

PPL of the MODIFY command. Once the /* is found the P-STAT executive routines look for the terminating */ 

and then blank out the entire area including the /* and the */. Comments can extend across lines as in the example 

above or they can be part of a line. For example: 

/* List the output file */ LIST Myfile $ 

MODIFY Myfile [ 

GEN Coded.Age; /* 10 year age groups */ Gen Coded.Income; ] 

are both legal uses of comments. 

Because the comments are blanked out when a command is executed, they must be entered into a command 

stream using an external editor. If they are entered interactively they disappear when the command is executed. 

However, because they can be insert freely both within the PPL and between commands, they provide an excellent 

way to document a run. Any thing except the terminating characters can be entered in the comment: 

/* The following group of commands might better 

be packaged as a macro and executed by using 

RUN Mymacro $ 

with /* style comments to document the macro 

parameters. 

*/ 

3.16 QUITTING A PROCESS 

There are three instructions that cause the processing of data to stop: 

1. QUITFILE requests that processing of the current file stop 

2. QUITCOMMAND requests that processing of the current command stop 

3. QUITRUN requests that the entire P-STAT run stop. 

The QUIT instructions are typically used after an IF test, although they may be used in a DO loop or alone. 

QUITFILE causes processing of a file to stop. Only cases prior to this point are passed to the current command. 

QUITFILE causes processing to stop if the result of the IF test is true: 

LIST Bonded.Personnel 

[ IF Bonded EQ 'Yes' and Prison.Record GT 0, QUITFILE ] $ 

Only bonded employees with a value of 0 on Prison.Record are listed. If a value greater than 0 is found, only 

employees prior to that case are listed.


QUITCOMMAND causes a command to be aborted. If QUITCOMMAND was used in the prior example, 

no listing would be produced if any employee had a value greater than 0 on Prison.Record. The LIST command 

would stop without receiving any cases. 

QUITRUN causes an entire P-STAT run to stop. This is most useful when many commands are executed in 

succession, possibly from a transfer file or a macro, or in batch mode. Quitting the entire run, rather than continuing 

processing, may save resources if a grave error is encountered.



SUMMARY 

Data selection and modification may be done to any file as it is read by any P-STAT command. They 

may not be done to output files. When any P-STAT command is executed, each case of the input file is 

read and optionally modified before it is passed to the current command. The current command operates 

on the modified data while the input file remains unchanged. Thus, the modifications are temporary “onthe-fly” 

modifications. 

MODIFY 

Required: 

Permanent modifications are usually done using the MODIFY command but may be accomplished with 

any command that produces an output file incorporating the modifications. MODIFY does no particular 

statistical or file maintenance procedures, but it produces an output file of the data after all modifications 

and selections have been completed. 

MODIFY File 

[ KEEP ID Age Score ; 

GENERATE Coded.Age = 

RECODE ( Age, 1 TO 17 = 1, 18 TO 100 = 2 ) ], 

OUT New.File $ 

MODIFY Males [ KEEP Test.ID Time Dexterity ] 

+ Females [ * ], OUT Students $ 

Multiple files may be read by MODIFY, as well as to other commands, using the “+” operator. This produces 

“on-the-fly concatenation” of the files. The files should have the same number of variables with 

corresponding data types the same. If the names of the variables differ, the variable names in the first 

input file are used. Different PPL modification phrases may follow each of the input files. If the same 

modifications are desired, as in the second example above, an asterisk in brackets should follow the additional 

file or files. 

MODIFY fn 

supplies the name of the required input file. MODIFY is described in more detail in the chapter 

“PPL:MODIFY and COMPARE”. 

Optional Identifiers: 

OUT fn 

provides a name for the requested output file. The output file will reflect the input file after all selections 

and modifications are performed. 

TEMPLATE fn 

specifies an input file which indicates the variables to be selected for the output file. Additional variables 

are ignored: 

fn=file name vn=variable name exp=expression


MODIFY Diet2 [ ROWS 50 .ON. ], 

TEMPLATE Diet1, OUT Diet3 $ 

If a file does not already exist with appropriate variable names, a null file, a file with only variable names 

and no cases, may be created to serve as a template. 

STANDALONE PPL COMMANDS 

PUT @PAGE (CVAL(27)) 'G' 'Bold On' $ 

When PPL instructions do not require any information from a P-STAT file they can be used as standalone 

commands. No input file is required and no output file is produced. These commands are typically used 

to pass instructions to a printer or to set values in the permanent vector, scratch variable or user-defined 

arrays. These are often tasks that do not require a P-STAT file. In the example above the decimal value 

27 is an ASCII ESCAPE character. On some printers the combination of ESCAPE and the letter “G” is 

a signal to use a BOLD font. CVAL, a function which converts a number to its character equivalent is 

described in the chapter on character functions. 

The PPL instructions that can be used as commands are:IF, SET, INCREASE, DECREASE, GENER- 

ATE, PUT, PUTL, IF-THEN-ELSE, DIALOG, BRANCH and DO loops. 

PROCESS 

Required: 

PROCESS Hist123a 

[ IF Term.Paper MISSING, 

PUT Last.Name > 

Paper.Due.Date ] $ 

The PROCESS command processes PPL instructions. No output file is produced. It is typically used 

when the objective is printed text giving information about the values of the variables in the input file. 

PROCESS fn 

specifies the name of the required input file. 


The PPL instructions DECREASE, DELETE, DROP, GENERATE, IF, INCREASE, KEEP, RETAIN, 

ROWS and SET are explained in the second PPL chapter. The additional instructions GOTO, PUT, 

PUTL, QUITFILE, QUITCOMMAND, QUITRUN and REPEAT are summarized below. The list of instructions 

which may follow after an IF test includes: 

CONTINUE FOR INCREASE QUITFILE SET 

DECREASE GENERATE PUT QUITRUN 

DELETE GOTO PUTL QUITCOMMAND 

GOTO label 

directs that the PPL processor go either up or down to wherever the PPL clause with the specified label 

is located. The label must be at the beginning of a PPL clause, and it must be followed by a colon (:) . 

The bypassed phrases cannot change the number, order, or names of the variables (i.e., KEEP, DROP, 

exp=expression fn=file name vn=variable name


QUITFILE 

GENERATE), nor can it bypass REPEAT, SPLIT or COLLECT. GOTO may be used following an IF 

test: 

IF Sex EQ 1, GOTO Male; 

The label may be followed by an instruction or it may be a “null” label. 

[ GENERATE Factor; 

IF Treatment.Group EQ 'placebo', GOTO Not.Drug; 

SET Drug = RECODE ( Drug, 1 TO 3 = 1, G = 2 ); 

SET Factor = SUM ( Test1 TO Test2 ); 

GOTO Next.Test; 

Not.Drug: SET Drug = 0, SET Factor = 0 ; 

Next.Test: ; ..... ] 

specifies that processing of the current file stop. Only cases prior to this point are passed to the command 

processor. QUITFILE is commonly used following an IF test. 

QUITCOMMAND 

QUITRUN 

specifies that processing of the current command stop. The command is aborted at that point. QUIT- 

COMMAND is commonly used following an IF test. 

specifies that the P-STAT run stop. QUITRUN is commonly used following an IF test. The entire run 

ends at this point. 

REPEAT exp 

requests that each case be repeated the specified number of times. The argument for repeat should be an 

expression (constant, variable, function or combination of these) that reduces to an integer. A case is repeated 

at that point in the PPL in which the REPEAT instruction is encountered. REPEAT is useful in 

generating a set of random data: 

MOD Random 

[ REPEAT 100; 


OUT RandomX $ 

An initial file with one case (Random) is built and then it is modified to generate an output file with 100 

cases. The variable Random.Number is set equal to random numbers with mean 24 and standard deviation 

2.8. REPEAT may not be used as a consequent of an IF, within a DO loop, or within an IF-THEN- 

ELSE block. It also cannot be used in conjunction with SPLIT, COLLECT, FIRST or LAST. 

PUT AND PUTL CONTROL ELEMENTS 

The following printing elements can follow a PUT or PUTL: 

1. Age . The name of a variable. Its value will be printed. PUTL will also label it (Age = 22) 

2. #name or ##name. A scratch variable, also labelled by PUTL 

3. V(3). A variable reference with a constant subscript, also labelled by PUTL 

4. .ALL. All the values of a case, also labelled by PUTL 

5. V(#j+2). PUTL does not label this 



6. P(3) or P(#J+2) or (expression) of any complexity 

7. 'string' or “string” or . 

These control elements can follow PUT or PUTL. 

PUT 

@NEXT move to the next line. 

@SKIP=3 write the current line and then three blank lines. 

@PARA write the current line, write a blank line, and indent three positions in the new line. 

@20 moves the pointer (which is where the next value will be written) to that position. 

@PLUS=(5) move the pointer that far. This can be an expression. 

@MINUS=(3) move the pointer back that far. 

@ can be used as the last element in a PUT or PUTL. The line is not flushed, so the 

next PUT or PUTL statement adds to it instead of starting a new line. 

@BEFORE=40 causes the next value to be places so it ends at position 40. The string or value must 

be the next PUT element. 

@PLACES=3 causes succeeding numeric values to print with 3 places. 

@NOPLACES returns to the default mode, where integers print without places and fractional values 

get some number of places depending on the actual value. 

@COMMAS inserts commas into the integer part of numbers. 

@NOCOMMAS turns it off. Default is off. 

@LABEL turns PUTL mode on. (@NAME is synonym). 

@NOLABEL turns PUTL mode off. PUT default is off. PUTL default is on. 

@TRIM default. Trims blanks from the right end of a character value. 

@NOTRIM print it all 

@EQUAL=20 when a labelled value (like Age = 40) is about to be written, place the = at position 

20. @EQUAL=20:40 prints 2 values per line with equal signs at positions 20 and 40. 

@NOEQUAL turns it back off. 

@MISS='string' use the string instead of -, --, or --- to represent missing values. 

@NOMISS resets to -, --, or ---. 

positions text and variables at specified column locations in the output line. Text strings are enclosed in 

quotes and variables are simply cited. paired angle brackets, “” may also be used as string 

delimiters in PUT statements. 

PUT @3 'The client is ' Name '.' @ ; 

Locations are specified with @: 

@3 at column 3 

@NEXT at the start of the next line 

@SKIP write a blank line and move to the start of the next line 

@PAGE issue a page change and move to the start of the first line 

A final @, at the end of a PUT, holds the text output location, so that subsequent text may follow directly 

after. 

exp=expression fn=file name vn=variable name


PUTL 

PUT is often used after an IF test checking for erroneous data values. PUT specifies error messages to 

print: 

[ IF ID MISSING, 

PUT Last.Name 

SS.Num ] 

positions variable names as well as variable values in the output line. If @EQUAL22 is used: 

[ PUTL @EQUAL22 Name SS.Num ] 

the variable names and values are listed, one per line, centered on the equal-sign in column 22 (or any 

other specified column location). @EQUAL22:52 positions both variable names and values on one line, 

the first centered on the equal sign in column 22 and the second centered on the equal-sign in column 52. 

COMMENTS 

/* comments can be inserted in the command stream 

wherever a command can be found. The initial characters 

are the /*. The terminating characters are the asterisk 

followed by the slash 

*/ 

LIST Myfile $ 

Comments can also be used in the PPL of a command as long as each comment is positioned as a PPL 

phrase and not inserted in the middle of such a phrase. 

MODIFY Myfile [ /* generate coded variables */ GEN Coded.Age; 

GEN Coded.Income; 

/* Age will be recoded into 10 year groups */ 

SET Age = .... ] 


4 


NCOT and RECODE 

The previous PPL chapters covered the basics of data modification using the P-STAT Programming Language 

(PPL). This chapter provides information about the RECODE and NCOT commands. These commands often 

provide the easiest way to do complex recodes. 

Values may be changed to different values using either the RECODE or NCOT functions. Both numeric and 

character values may be recoded using RECODE; only numeric values may be changed with NCOT. RECODE 

permits any arbitrary changes, including the recoding of individual values, ranges of values, missing values, character 

strings and extra (“left over”) values. XRECODE permits case sensitive recodes of character data. NCOT 

recodes ranges of values, specified with “cutting points”, to consecutive constants. 

The RECODE function is usually used to test a single argument, which may be a variable, a constant or a more 

complex expression. However, it can also be used to test multiple arguments creating a result which is based on 

several different arguments such as setting: 

Group=1 when Age lt 30 and Sex eq male and Income lt 20000 

Group=2 when Age ge 30 and Sex eq male and Income lt 20000 

Group=3 when Age lt 30 and Sex eq female and Income lt 20000, etc. 

This multi-argument use of RECODE often replaces a lengthy series of complex IF’s with a single statement that 

is both easier to read and to understand. 

4.1 The NCOT Function 

NCOT recodes numeric variable values to numeric constants. It does an N-way dichotomization or division of the 

values to be recoded, using cutting points supplied in the NCOT instructions. The cutting points divide the values 

into groups or ranges of values. The ranges are recoded to consecutive integers. 

Because NCOT is a function, it begins with a left parenthesis and ends with a right parenthesis. The first element 

following the left parenthesis is the NCOT argument which must be a variable name or an expression. This 

is followed by additional arguments giving cutting points for the values. Each NCOT argument is separated from 

the next by a comma. NCOT is designed for use when a numeric variable is to be divided into groups based on a 

series of ascending values or cutting points. It does not work with character values and it cannot be used for complex 

recoding. 

The cutting points for NCOT can be fractional values. The one restriction is that the cutting points must go 

in ascending order, from low (which may be negative) to high. Given: 

[ SET Hours = NCOT ( Hours, 20, 25, 30, 35, 40, 45, 50 ) ] 

everything less than or equal to the first value (20) becomes a “1”, everything above the first value, but not above 

the second value (25) becomes a “2”, and so on. The final value includes all the numbers greater than the final 

cutting point. Thus, the number of possible values is always one more than the cutting points. 

The NCOT function instructions can be abbreviated further by providing a step size: 

[ SET Hours = NCOT ( Hours, 20, 50/5 ) ] 

The 20 is the first cutting point, 50 is the last cutting point and 5 is the step size. Thus, the cutting points are 20, 

25, 30, 35, 40, 45 and 50. The instructions:

4.2 PPL: NCOT and RECODE 

[ SET Hours = 

NCOT ( Hours, 20, 50/5, 100/10 ) ] 

create cutting points at 20, 25, 30, 35, 40, 45 and 50 (steps of 5), and also at 60, 70, 80, 90 and 100 (steps of 10). 

A value of 33, which is between the 3rd and 4th cutting point, becomes a “4” and a value of 85, which is between 

the 10th and 11th cutting point becomes an “11”. 

__________________________________________________________________________ 

Figure 4.1 NCOT: Numeric Recodes 

File RawData 

Age Income Hours Sex 

13 1350 - m 

22 20100 40 f 

24 18400 36 f 

31 31000 40 m 

33 35000 49 f 

37 27000 38 m 

42 20000 40 f 

49 45000 40 m 

50 61000 62 m 

55 31000 30 m 

62 24000 24 f 

73 16000 20 m 

MODIFY RawData [ GEN Coded.Age, GEN Coded.Income, GEN Coded.Hours; 

SET Coded.Age = NCOT ( Age, 25, 40, 55 ); 

SET Coded.Income = NCOT ( Income, 10000, 100000 / 10000 ); 

SET Coded.Hours = NCOT ( Hours, 20, 50/5, 100/10 ) ], 

OUT NewData $ 

File NewData 

Coded Coded Coded 

Age Income Hours Sex Age Income Hours 

13 1350 - m 1 1 - 

22 20100 40 f 1 3 5 

24 18400 36 f 1 2 5 

31 31000 40 m 2 4 5 

33 35000 49 f 2 4 7 

37 27000 38 m 2 3 5 

42 20000 40 f 3 2 5 

49 45000 40 m 3 5 5 

50 61000 55 m 3 7 8 

55 31000 30 m 3 4 3 

62 24000 24 f 4 3 2 

73 16000 20 m 4 2 1 

__________________________________________________________________________ 

Figure 4.1 illustrates NCOT with three different patterns. The NCOT of Age provides 3 cutting points and 

results in 4 values. Values on Age less than or equal to 25 are a 1 in Coded.Age. Values greater than 25 and less

PPL: NCOT and RECODE 4.3 

than or equal to 40 are a 2 in Coded.Age. Values greater than 40 and less than or equal to 55 are a 3 in Coded.Age. 

And finally any value on Age that is greater than 55 is a 4 in Coded.Age. 

Coded.Income is variable Income in groups of 10,000. Coded.Hours is a more complex pattern with cutting 

points at 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100. With 12 cutting points there is a possibility of codes between 

1 and 13. 

NCOT is a very parsimonious and clear way to recode numeric values when cutting points, either arbitrary or 

patterned is required. When the recodes to be done are not in ascending order, the RECODE function is the function 

to use. 

4.2 The RECODE Function: Single Argument Usage 

RECODE changes (recodes) numeric or character variables. XRECODE, for eXact recodes, respects the case of 

characters in recoding them to other characters or to numbers. This section describes simple recodes, ones with a 

single argument. Multiple-argument recodes are described later. 

Because RECODE is a function, it begins with a left parenthesis and ends with a right parenthesis. The first 

element following the left parenthesis is the RECODE argument which must be a variable name or an expression. 

This is followed by a series of recoding tests, separated by commas. 

[ SET Age = 

RECODE ( ROUND (Age), 0 TO 20 = 1, 21 TO 100 = 2 ) ] 

The format of the single argument RECODE function is: 

RECODE [ Argument, test, test, test, ..... ] 

The argument may be a variable name or a complex expression. Each recode test is composed of a list of one or 

more values followed by an “=” sign and the new value that replaces the values in the list. 

value.list = new.value, value.list = new.value, .... 

The list may be a single value such as “2”, a range of values such as “3 TO 5”, or a combination of single 

values and ranges such as “12 TO 15 33 M2”. The list is followed by an equal sign "=" and the new value to be 

used. Each recode test is separated from the next by a comma. After the recoding, the new values must all be of 

one type, either numeric or character, and they must be the same data type as the variable being set or generated. 

Single values may be recoded to new values: 

[ GEN Gender:c; 

SET Sex = RECODE ( Sex, 0 = 1, 1 = 2 ) ; 

SET Gender = RECODE ( Sex, 1 = 'male', 2 = 'female') ; 

SET Gender = 

RECODE ( Gender, 'male' = 'boy', 'female' = 'girl' ) ] 

The first recode changes the values of the numeric variable Sex; “zeros” become “ones” and “ones” become 

“twos”. The second recode provides a value for a new character variable, Gender. Its values are recodes of the 

numeric values of the existing variable Sex. “Ones” become “male” and “twos” become “female”. (Notice that 

character strings in the recode tests are enclosed in quotes.) The third recode changes the values of the character 

variable Gender; “male” becomes “boy” and “female” become “girl”. 

RECODE tests may recode values in any of these combinations: numeric to numeric, character to character, 

numeric to character, or character to numeric. However, the resultant values must all be one data type, and that 

type must correspond with that of the variable being recoded or generated. 

Ranges of values may be recoded to new values: 

[ SET Income = RECODE ( INT (Income), 

0 TO 50000 = 1, 50001 TO 100000 = 2 ) ]


This example recodes the integer portion of the values of Income; values from zero through 50,000 become 

“ones”, values from 50,001 through 100,000 become “twos”. 

Ranges can also be used with character variables: 

[ GEN Session = RECODE ( Last.Name, 

'A' TO 'MZZZ' = 1, 'N' TO 'ZZZZ' = 2 ) ] 

This example generates the numeric variable Session, whose values are based on those of the character variable 

Last.Name. Cases with last names from “A” through “MZZZ” have “ones” for the value of Session, and cases 

with names from “N” through “ZZZZ” have “twos”. In RECODE, case does not matter. Therefore, 'a' TO 'MZZZ' 

and 'A' TO 'mzzz' are equivalent. 

Any non-missing values left over after the recoding is complete, may be recoded using G (for Good). There 

may be only one G in a RECODE instruction: 

[ GENERATE Test.Score = 

RECODE ( ROUND ( (Correct / Total) * 100 ), 

65 TO 74 = 1, 75 TO 84 = 2, 85 TO 94 = 3, 

95 TO 100 = 4, G = 0 ) ] 

In this example, the RECODE argument is a complex expression — the number of correct items (Correct) divided 

by the total number of items (Total) and multiplied by 100. That value is rounded to a whole number (ROUND) 

and recoded to the specified values. If a non-missing value is not included in the recoding tests ( G ), Test.Score 

is zero. Thus, a value of 42 yields a Test.Score value of 0. The recode testing is done in a strict left to right order. 

Therefore, you should put any G= recode AFTER all recode tests that expect a good value. 

A common mistake in using RECODE with ranges of values is to specify the recode tests using integers, without 

provision for values which fall between the ranges. For example, the value 50.5 falls between 50 and 51: 

[ GENERATE Score = RECODE ( XA, 1 TO 50 = 1, 51 TO 100 = 2 ) ] 

Any such numbers are not recoded. To solve this problem, the recode tests may be specified as any of the 

following: 

[ GEN Score = 

RECODE ( XA, 1 TO 50 = 1, 51 TO 100 = 2, G = M3 ) ] 

[ GEN Score = 

RECODE ( XA, 1 TO 50 = 1, 50 TO 100 = 2, G = M ) ] 

[ GEN Score = 

RECODE ( XA, .01 TO 50 = 1, 50.0001 TO 100 = 2, G = 3 ) ] 

The first example uses G to detect non-missing values that fall between the ranges in the recode tests — 51.5 

yields a value of MISSING3 for Score. The second example uses overlapping ranges to avoid gaps in the 

ranges — values of 50 are recoded to 1, the first test in which 50 appears. The third example specifies all-inclusive 

ranges. 

It is a good idea to also use G in the recode tests to include any non-missing values that may have been overlooked. 

For example, if G is used to set any overlooked values to MISSING3 (M3), the user need only search for 

MISSING3 in the output to locate any values that have not been recoded. When the recoding transforms character 

to numeric variables or numeric to character variables, either all possible values must be recoded or G must be 

used to avoid an error situation. 

Missing values are not recoded unless the recode instructions specify how they should be recoded. (G refers 

to only non-missing extra values.) The three different types of missing values can be explicitly referenced using 

M1, M2 and M3:


LIST Kittens 

[ SET Sex = RECODE ( Sex, 

'm' = 'male', 'f' = 'female', 

G = M1, M2 = 'neuter' ) ] $ 

In this listing, values of Sex that are MISSING2 are recoded to “neuter”, and extra values are recoded to 

MISSING1. Any values of MISSING1 or MISSING3 remain the same. 

M by itself recodes any of MISSING1, MISSING2 or MISSING3 in the original value to a single new value. 

When M by itself is used as a new value, it is assumed to be MISSING1: 

LIST File1 

[ GENERATE New = RECODE ( Old, M = 0, 99 = M ) ] $ 

Old New 

- 0 

-- 0 

--- 0 

99 - 

77 77 

All types of missing are recoded as “zeros”. However, since a variable can be only one type of missing at a time, 

using 99=M is treated as 99=M1. 

The number “99” is recoded as M1. Since no test is given for 77 and since G= was not used, the 77 is not changed. 

(Note that the RECODE function references the system variables for the different types of missing data using a 

simplified notation. The regular notation for system variables may also be used — .M., .M1., .M2. and .M3.) 

XRECODE is an eXact recode — it works just like RECODE except that the case (upper, lower or mixed) 

of the recoding instructions is respected: 

LIST File2 

[GEN Num = XRECODE ( Char, 'a' = 1, 'A' = 9 ) ] $ 

Char Num 

a 1 

A 9 

In other aspects, XRECODE operates in the same manner that RECODE does. 

In summary, there are RECODE instructions for: 

• Numbers; 

• Character strings; 

• Missing values (M, M1, M2, M3); 

• Any good (non-missing) value left over (G). 

The data type of all the recoded values for a given variable must be the same, and it must agree with that of the 

modified or newly generated variable. 

Note: On an ASCII character set computer like a PC, you cannot use an XRECODE test of 'a' to 'Z' because 

'a' is 97 and 'Z' is 90, and the test is backwards. That would, however, be legal in EBCDIC on an IBM mainframe, 

where 'a' is 129 and 'Z' is 233.


4.3 COMPLEX RECODES 

The multiple argument usage of RECODE is exactly like the single argument usage in the way that the arguments 

are organized and in the values that can be supplied and tested. The RECODE syntax is: 

1. RECODE ( 

2. the arguments to be used in the recode. If there are multiple arguments they are separated by a vertical 

bar, "|". A comma follows the final argument. 

3. optional definitions for a set of values. The definitions provide a label for a set of values that can 

then be referenced by that label in the recode tests that follow. These definitions are enclosed in parentheses 

and are described below. 

4. one or more recoding tests which are executed in left to right order. If there are multiple arguments 

the sections of the test are separated by a vertical bar. 

5. a right parenthesis, ")" 

4.4 RECODE: The Arguments 

The composition of the arguments in a complex recode is exactly the same as in the single argument situation. An 

argument is often simply the name of a variable, but it can be a complex expression. 

RECODE ( Age, 

all tests are made using the single variable age. When there are two arguments: 

RECODE ( Age | Income, 

test values must be supplied for both variables Age and Income. A three argument example: 

RECODE ( Age | Husband.Income + Wife.Income | Region 

requires 3 values for each test. The first is a value for Age. The second is a value for the sum of variables Husband.Income 

and Wife.Income. The third value is for variable Region. The arguments and the first test for this 

RECODE might look like: 

RECODE ( Age |Husband.Income + Wife.Income | Region, 

le 30 | le 35000 | 'East' = 1, ...] 

The arguments can be numeric or character or a mixture of the two. Each value in a test must be the same 

data type as the corresponding argument. In the previous RECODE “30” and “35000” are appropriate values for 

the two numeric arguments and “East” is an appropriate value for the third argument, the character variable 

Region. 

4.5 The RECODE Tests 

There are two general tests which do not use a test segment for each argument. One is for missing arguments, the 

other is for good arguments. 

M = result 

is successful when ANY of the recode arguments is missing. 

G = result 

is successful when ALL of the recode arguments are good (non-missing). 

M=, when used, is usually placed at either the beginning or end of the tests. G=when used is usually placed 

at the end of the tests. Since tests are processed in a strict left to right order placing both of them at the beginning 

causes all the rest of the tests to be ignored. Processing of a recode stops as soon as there is a successful test. Any


set of arguments is either all good (G=) or has some missing values (M=). When both of them are placed at the 

beginning of the tests one of them will be successful and the remaining tests will never be processed. 

[ GEN Group:c = RECODE ( Age | Region, 

M = M3, 

LT 30 | 'east' = 'one', 

GE 30 | NE 'east' = 'two', 

G = 'three' ) ] 

Each test, except for M= and G=, is composed of as many test segments as there are arguments. The vertical 

bar is used to separate the test segments within each test. In the example above there are 4 tests. The first test 

“M=”is not segmented. It returns missing 3 when either Age or Region is missing. The next two tests have 2 

segments, one for each of the two arguments Age and Region. Finally “G=”, an unsegmented test, assigns to 

Group three any remaining case with non-missing values on both Age and Region 

A test segment consists of one or more comparisons. A comparison consists of: 

1. An optional logical operator such as 

LT (less than) LE (less than or equal), 

EQ (equal), NE (not equal), 

GE (greater than or equal) GT (greater than). 

EQ is assumed. LT, LE, GT, GE can only be used with a single numeric or character constant. 

2. The values to be tested. These are usually one or more constants but can also be a definition. Definitions 

are discussed below. 

The things that can be tested: 

1. numeric constant such as 12.7, 13 or 55 

2. numeric range such as 1 to 8 

3. character constant such as 'east' 

4. character range such as 'a' to 'e' 

5. G - any good value 

6. M - any missing value 

7. M1, M2, or M3 for MISSING1, MISSING2 or MISSING3 

8. (nnn) provides the number, for example “(123)”, of a definition containing the values to be tested. 

Definitions are described below. 

There can be several comparisons in a test segment. The test segment for an argument is successful when any 

one of the comparisons is successful. An example of a multiple comparison: 

LT 20, GT 30 = 11 

The segment is successful for arguments less than 20 or greater than 30. This is the same as: 

NE 20 TO 30 = 11 

In a long series of tests, it is often necessary to repeat a test segment several times. This repetition can be 

minimized by making use of the fact that a null segment automatically repeats the previous test for that segment. 

[ SET Income.Groups = RECODE 

( area code | income, 

609 908 201 215 | LE 30000 = 1, 

| GT 30000 = 2, 

NE 609 908 201 215 | LE 30000 = 3, 

| GT 30000 = 4 ) }


is the same as 

[ SET Income.Groups = RECODE 

( area code | income, 

609 908 201 215 | LE 30000 = 1, 

609 908 201 215 | GT 30000 = 2, 

NE 609 908 201 215 | LE 30000 = 3, 

NE 609 908 201 215 | GT 30000 = 4 ) ] 

Note: “NE 609 908 201 215” is true when the area code value is not equal to ANY of them. 

4.6 Defining a Set of Constants 

When a set of constants is used repeatedly, they can be defined as a group, given an integer label (from 1 to 

999999), and referred to by using that label. 

[ SET Income.Groups = recode ( 

area code | income, 

( DEFINE 101 = 609 908 201 215), 

(101) | LE 30000 = 1, 

| GT 30000 = 2, 

NE (101) | LE 30000 = 3, 

GT 30000 = 4 ) ] 

There can be many such definitions within a single RECODE. They must follow the list of arguments and 

precede the tests. The format is either “DEFINE” or “DEF” followed by a numeric label, an equal sign (=) and a 

list of values. The entire definition is in parentheses and is followed by a comma. The numeric labels in the definitions 

must be unique. The following definitions cause an error because both use the label “1”: 

( DEFINE 1 = 609 908 201 ), 

( DEFINE 1 = 215 412 610 717 814 ), 

A given definition set can be referenced many times, and can be used for any of the arguments. A given test 

segment can reference several definition sets and use additional values as well. Figure 4.2 contains both the command 

with PPL for a multiple-variable RECODE and the resulting output file Because the recode action 

progresses from left to right. It is easy to flag as errors the cases which have conflicting postal zip codes and telephone 

area codes. Definitions 1 and 101 represent the area codes and zip codes for New Jersey. Definitions 2 and 

102 represent the area codes and zip codes for Pennsylvania. 

( DEFINE 1 = 201 609 908 ), 

( DEFINE 2 = 215 412 610 717 814 ), 

( DEFINE 101 = '07000' TO '07900' '08001' TO '08990' ), 

( DEFINE 102 = '15201' TO '19980' ), 

Given these two definitions, these two tests are the same: 

(1) M1 | (101) = 'New Jersey' 

201 609 908 M1 | '07000' TO '07900' '08001' to '08990' = 'New Jersey' 

Any case with a value on Area.code that is either included in definition 1 or is missing, and with a value on variable 

Zip that is included in definition 101 is given a value of “New Jersey” on variable State: 

(1) M1 | (101) = 'New Jersey', 

A case is also coded as “New Jersey” if it has a value on Area.code that is one of the definition 1 values and is 

either missing or has a zip code that is among the values for definition 101: 

(1) | (101) M1 = 'New Jersey',


__________________________________________________________________________ 

Figure 4.2 Multi-Variable RECODE With Definitions 

MODIFY States [ 

GEN State:c = RECODE ( Area.code | Zip, 

( DEFINE 1 = 201 609 908 ), 

( DEFINE 2 = 215 412 610 717 814 ), 

( DEFINE 101 = '07000' TO '07900' '08001' TO '08990' ), 

( DEFINE 102 = '15201' TO '19980' ), 

(1) M | (101) = 'New Jersey', 

(1) | (101) M = 'New Jersey', 

(2) M | (102) = 'Pennsylvania', 

(2) | (102) M = 'Pennsylvania', 

(1) (2) | (101) (102) = 'ERROR', 

M='Undefined', G='Other' ) ], OUT States $ 

File States 

Area 

code Zip State 

201 - New Jersey 

313 30225 Other 

609 08525 New Jersey 

215 08525 ERROR 

412 16030 Pennsylvania 

- 19340 Pennsylvania 

215 - Pennsylvania 

__________________________________________________________________________ 

The same procedure is used for the area codes and zip codes used to set State to “Pennsylvania”. Any case 

that has not passed one of these 4 tests, and has a non-missing value for area code that is among the values in either 

of definitions 1 or 2 with a zip code that is in either of the definitions 101 or 102 has a coding problem: either a 

New Jersey area code and a Pennsylvania zip code or a Pennsylvania area code and a New Jersey zip code. The 

value for State on these cases is set to “ERROR”. 

Any case that has good values on both variables that are not among any of the lists in the definitions, is caught 

by the 'G=' test and is set to “Other”. Any case which has an undefined good value on one of the variables and 

missing on the other variable is caught by the 'M=” test and is set to “Undefined”, as are cases that are missing on 

both variables. 

4.7 The Result Values 

The results values can be any of the following: 

1. M1 or M missing 1 

2. M2 missing 2 

3. M3 missing 3


4. nn a numeric constant such as 22 or 1.543 

5. 'ccccc' a character constant 

6. #tt a temporary scratch variable 

7. ##tt a permanent scratch variable 

8. ARGn the value of the cited argument. If the recode has 2 arguments, they are referred 

to as ARG1 and ARG2. 

All the result values in a given recode must be the same type. In other words, you cannot use a numeric constant 

and a character scratch value as results in the same recode. 

Using scratch variables allows a recode to access other variables in a case. 

[ GENERATE #n = d; 

SET XYX = RECODE ( a|b|c, 1|2|3 = #n, etc. 

When none of the tests is successful and G= and M= are not used, the result depends on the number and type 

of the arguments. If the recode has one argument: 

1. if the argument and result types are the same, the argument is used as the result. This is useful when 

some values are to be changed, but the rest should remain the same. 

2. if the argument and result types differ, and the argument is missing, the result is set to the same kind 

of missing. 

3. if the argument and result types differ, and the argument is not missing, an error occurs. 

If the recode has more than one argument: 

4. if any argument is missing, the result is set to the same kind of missing. 

5. if all arguments are non-missing, an error occurs. 

In all situations except (possibly) the first, it is good practice to use M= and G=, so that the recode is fully 

defined. 

4.8 RECODE or IF/SET 

Using RECODE is usually clearer and faster than using a series of IFs and SETs to do the same thing. For example, 

file AAA has variables AGE and REGION. We need a new variable named SECTOR to be created from the 

values on age and region. We want SECTOR to be: 

• M1 if either age or region is missing 

• 1 if age LT 30 and region EQ 'east' 

• 2 if age LT 30 and region EQ 'central' 

• 3 if age LT 30 and region EQ 'west' 

• 4 if age GE 30 and region EQ 'east'. 

• 5 if age GE 30 and region EQ 'central' 

• 6 if age GE 30 and region EQ 'west' 

• M2 if age and region have GOOD values, but have not matched a previous test. 

Figure 4.3 contains the PPL statements first to do this recode using IF and SET and then using a multi-argument 

RECODE 

If file AAA had age and region as shown, either of the MODIFY commands in Figure 4.3 would produce the 

following results:


__________________________________________________________________________ 

Figure 4.3 RECODE or IF/SET 

Using IF/SET: 

MODIFY aaa 

[ GENERATE sector = .m1. ; 

IF age good and region good, SET sector = .m2.; 

IF age lt 30 and region EQ 'east' SET sector = 1; 

IF age lt 30 and region EQ 'central' SET sector = 2; 

IF age lt 30 and region EQ 'west' SET sector = 3; 

Using RECODE: 

IF age ge 30 and region EQ 'east' SET sector = 4; 

IF age ge 30 and region EQ 'central' SET sector = 5; 

IF age ge 30 and region EQ 'west' SET sector = 6; 

], out bbb $ 

MODIFY aaa 

[ GENERATE sector = RECODE ( age|region, 

M = m1, 

lt 30| 'east' = 1, 

| 'central' = 2, 

| 'west' = 3, 

ge 30| 'east' = 4, 

| 'central' = 5, 

| 'west' = 6, 

G = m2 ) ], out bbb$ 

__________________________________________________________________________ 

Age Region Sector 

22 -- - 

23 central 2 

44 west 6 

19 south -- 

30 east 4 

4.9 RECODE Pointers 

If you are doing a very complex recode with many variables and values, there may be some combinations that are 

far more likely than others. You can improve the speed of the command by arranging your recodes so that the 

most common results are among the early tests. Suppose you are recoding 60 country names into integers. One 

approach would be to organize the tests alphabetically, so that 'albania=22' precedes 'china'=12. If, however, half 

of the cases come from just five countries, the recode will be faster if those five tests are placed before the fiftyfive 

others. 

In the same manner, putting M=m1 or such at the beginning of the tests will be faster when many of the cases 

have a missing value on the recode argument, but will be slightly slower when no cases have missing values on 

the recode argument. 

When you are dealing with missing values, it should be noted that M= and M|M|M= are different. Consider 

a three value recode:


M= is successful when ANY argument is missing, 

M|M|M= is successful when ALL arguments are missing. 

When you are using EQ (equal) and NE (not equal) the phrase 

EQ 2 5 to 9 11 

should be thought of as 

The phrase: 

EQ 2, OR EQ 5 to 9, OR EQ 11. 

NE 2 5 to 9 11 

should be thought of as 

NE 2, AND NE 5 to 9, AND NE 11. 

Figure 4.4 shows successful EQ comparisons for arguments of 1, 2, m1, m2 and m3 when compared to test 

constants of 1, 2, m1, m2, m3, m and g. S means a successful comparison. 

__________________________________________________________________________ 

Figure 4.4 EQ and NE Comparisons 

---EQ comparisons with--- 

argument 1 2 M1 M2 M3 M G 

1 S . . . . . S 

2 . S . . . . S 

M1 . . S . . S . 

M2 . . . S . S . 

M3 . . . . S S . 

---NE comparisons with--- 

argument 1 2 M1 M2 M3 M G 

1 . S S S S S . 

2 S . S S S S . 

M1 . . . S S . S 

M2 . . S . S . S 

M3 . . S S . . S 

__________________________________________________________________________ 

4.10 XRECODE 

The X in Xrecode means eXact comparisons. Consider: 

RECODE( 'aBc', 'ABC'=1, 'aBc'=2, etc. 

XRECODE( 'aBc', 'ABC'=1, 'aBc'=2, etc. 

The RECODE returns 1, because recode ignores upper/lower case differences. Therefore, the argument value of 

aBc is matched by ABC. The XRECODE does not match aBc with ABC because the cases differ, and proceeds 

to the aBc test which succeeds, and returns 2. 

Suppose, however, you want to do a recode using two character arguments, REGION with case-independent 

comparisons (RECODE) and CODE with case specific comparisons (XRECODE). This can be done using XRE- 

CODE by:


1. converting values of REGION to upper case as the recode begins, and then 

2. using uppercase constants in its test segments. 

For example: 

XRECODE( UPPER(region)| code, 'EAST' | 'aBc' = 1, etc.


PPL Functions: 

NCOT (exp, ncot instructions) 

SUMMARY 

does N-way dichotomizations (divisions) of numeric values and recodes those values according to instructions 

given in the second argument. The arguments for NCOT must be enclosed in parentheses. 

The first argument is an expression which may be a simple variable name or a complex expression. This 

is followed by cutting points and possibly a step size. All values less than or equal to the first cutting 

point become a “1”. All values greater than the first cutting point and less than or equal to the second 

cutting point become a “2”. 

MODIFY File1 

[ SET Age = NCOT ( Age, 14 ) ; 

GENERATE NN = NCOT ( FRAC (T1), .3, .6, .9 ) ; 

SET ZZ = NCOT ( ZZ, 20, 50/5, 90/10 ) ], 

OUT File2 $ 

The final value includes all the numbers greater than the final cutting point. Thus, there will always be 

one more possible output value than there are cutting points. The instruction “20, 50/5” defines cutting 

points from 20 through 50 in steps of 5. This is a shorthand way of providing the cutting points 25, 30, 

35, and so on. In the example above, cutting points for the variable ZZ will occur at 20, 25, 30, 35, 40, 

45, 50, 60, 70, 80, and 90. The resulting values will be from 1 to 12. 

RECODE (exp, recode instructions) 

recodes the numeric or character variable specified by the expression according to the instructions given 

in the second argument: 

MODIFY Nursery 

[ GENERATE Coded.Age = 

RECODE ( ROUND (Age), LE 4 = 1, 5 = 2, GE 6 = 3 ) ; 

SET Race = 

RECODE ( Race, 0 3 = 2, M3 = 1 ) ; 

GENERATE Gender:C = 

RECODE ( Sex, 1 = 'Boy', 2 = 'Girl', G = '?' ) ], 

OUT Nursery $ 

RECODE is a function and its arguments must be enclosed within parentheses. The first argument following 

the RECODE may be a variable name or a complicated expression. 

Recoding may be applied to numeric values: 

1 TO 5 = 1 one through five become one 

6 = 'F' sixes become F 

7 TO 9 14 = 3 seven, eight, nine, and fourteen 

become three


Recoding may be applied to character values: 

'male' = 1 values of male become 1 

'male' = 'm' values of male become m 

'A' TO 'DZ' = 3 A through DZ become three 

After the recode, the new values must be all of one data type; that is, they must be either all numeric or 

all character. Recoding may be applied to missing values: 

M = 3 missing values become threes 

M1 = 'DK' missing one becomes DK 

Recoding can be applied to what is left over after the other recodes are completed: 

G = 4 unrecoded good values become 4 

G = '?' unrecoded good values become ? 

When the recoding transforms character to numeric variables or numeric to character variables, either all 

possible values must be recoded or G must be used to avoid an error situation. There may be only one G 

= in a RECODE. 

RECODE multiple argument 

recode values are based on several variables or expressions. 

[ GEN Group:c = RECODE ( Age | Region, 

M = M3, 

LT 30 | 'east' = 'one', 

GE 30 | NE 'east' = 'two', 

G = 'three' ) ] 

In this example the recode is based on the combined values of Age and Region. There are four possible 

results. Variable Group = ’one’ when Age is less than 30 and Region = 'East'. Variable Group = 'two' 

when age is greater or equal to thirty and Region does not equal 'east'. Any case that is missing on either 

variable is set to missing type 3 on variable Group. Any case with good values on both Age and Region 

that was not mentioned in the previous tests has a value of 'three' on variable Groups. 

XRECODE(exp, recode instructions) 

recodes character strings eXactly — that is, respecting the specified case (lower, upper or mixed) of the 

string: 

[ GEN Symptom = XRECODE (Note, 'a' = 1, 'b' = 2, 'A' = 0) ; 

XRECODE works like RECODE with regard to other aspects. Character strings may be exactly recoded 

to numbers or other character strings. XRECODE can be used for both simple and complex recodes.

5 


DO LOOPS and 

IF-THEN-ELSE Blocks 

The first section of this chapter documents DO loops. DO loops enable you to do repetitive operations easily. The 

second section covers the use of DO loops to generate or rename groups of variables. The last section covers the 

use of IF-THEN-ELSE blocks to handle complex logic. (The use of a simple IF was covered in the previous 

chapters.) 

5.1 DO LOOPS 

DO loops specify repetitive instructions. They are useful when it is necessary to do the same modification on a 

number of different variables. Repetitive actions can of course be done one at a time, repeating the modification 

clauses and changing the variable names or positions as many times as necessary: 

LIST F1 [ SET V(1) = RECODE ( V(1), 6 TO 9 = 5 ) ; 

SET V(2) = RECODE ( V(2), 6 TO 9 = 5 ) ; 

SET V(3) = RECODE ( V(3), 6 TO 9 = 5 ) ; 

SET V(5) = RECODE ( V(5), 6 TO 9 = 5 ) ] $ 

However, this is tedious and may be done more easily using a DO loop: 

LIST F1 [ 

DO #J USING V(1) TO V(3) V(5); 

SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 ); 

ENDDO ] $ 

The DO statement above has five components: 

1. DO which is followed by 

2. #J a temporary or permanent numeric scratch variable. The value of this changes 

each time the loop is traversed 

3. USING which indicates that a list of variables follows 

4. V(1) TO V(3) V(5) a list of variables associated with the loop 

5. ; ends the list of variables and therefore ends the DO statement. 

The USING list has four variables; therefore the statements up to the ENDDO will be done four times. The 

scratch variable #J is set to the POSITION of the next variable in the list each time the loop repeats. Thus, in the 

four iterations, it takes on the values 1, 2, 3, and 5. 

The V vector, as always, holds the data of the current case. In the SET statement, variable V(#J) is recoded. 

Therefore we recode variables 1, 2, 3, and 5 in the four iterations. Note that in V(#J) usage, the subscript expression 

(here, just the #J) must result in an integer that is within the range of variables in the file. In other words, 

fractional or negative subscripts like V(#J+.6) and V(-#J) would be errors. 

The DO loop always ends with the ENDDO instruction.

5.2 PPL: DO LOOPS and IF-THEN-ELSE Blocks 

The USING loop is one form of DO loop. The other forms is a range loop. In this form: 

DO #J = 5, 13, 1; 

the DO scratch variable takes its value from the DO range which begins with the first number (5) and, increases 

through the second number (13) in steps of the third number (1). Here #J begins with 5, becomes 6, then 7, etc. 

The final time through the loop #J has the value 13 and the loop has been executed 9 times. When the stepsize is 

1, it may be omitted. 

There is no limit to the number of PPL instructions that may be done between the DO statement and the END- 

DO statement. The DO can contain other DOs or IF/THEN/ELSE blocks (described later in this chapter). Figure 

5.1 contains the input, the LIST command and the resulting printout for a simple DO loop with a list of variables. 

__________________________________________________________________________ 

Figure 5.1 Simple DO Loop with a List of Variables 

FILE F1 

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 

1 9 2 7 6 8 

6 4 5 2 7 3 

LIST F1 [ 

DO #J USING V(1) TO V(3) V(5); 


ENDDO ] $ 

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 

1 5 2 7 5 8 

5 4 5 2 5 3 

__________________________________________________________________________ 

5.2 DO USING a Variable List 

DO USING specifies a list of variables or values to which the subsequent instructions are to be applied. Both 

variable names and positions may be used in the list of variables. A user-supplied scratch variable must be provided. 

This scratch variable is then available for general use within the loop. 

[ DO #J USING V(1) to V(3); 

The scratch variable is “#J” in this example, but it may be any legal temporary or permanent numeric scratch variable 

name. In this example the range of #J is from 1 through 3, the positions of variables V(1), V(2) and V(3). 

The subsequent modification instructions: 

SET V(#J) = RECODE ( V(#J), 6 TO 9 = 5 ) 

are done three times, first when #J has the value 1, 

SET V(1) = RECODE ( V(1), 6 TO 9 = 5 ) 

and then when it has the values 2 and 3: 



In effect, a “loop” is set up — the first value of #J is used in the instruction, then the next, and so on, until the last 

value of #J is used. The loop stops when all the instructions have been processed for the last value of #J

PPL: DO LOOPS and IF-THEN-ELSE Blocks 5.3 

Variable names and wildcards may be used in the DO list. If names are used in a DO instruction — for 

example: 

DO #J USING Math.Test TO English.Test; 

the positions of the variables in the file defines the range of the scratch variable. If Math.Test is the first variable 

in the file and English.Test is the third, #J has the values 1 through 3. If Math.Test is the sixth variable in the file 

and English.Test is the ninth, #J has the values 6 through 9. In either case, the appropriate variables are referenced 

in the instruction following the DO: 

[ DO #J USING Math.Test TO English.Test ; 


ENDDO ] 

This DO phrase may be interpreted: for the scratch variable #J, which is initially the position of Math.Test 

and subsequently the positions of the other variables in the list, set each variable in turn to the RECODE function 

of itself, setting any value between 6 and 9 to 5. As the DO loop is processed, the variable position represented 

by V(#J) changes. Initially, the variable position is that of first variable in the list. With each new loop the scratch 

variable takes as its value the position of the next variable in the USING list. 

DO #J USING *; 

requests the re-use of the USING list from the most recent DO USING loop. 

__________________________________________________________________________ 

Figure 5.2 DO With Two Scratch Variables 

File cold 

p 

Stuffy 

Date Headache Fever Nose Cough 

011593 1 0 0 0 

012293 0 1 1 1 

020993 0 1 0 1 

021093 1 0 0 0 

MODIFY Cold [ DO #J #N USING headache to Cough; 

IF V(#J) EQ 1, SET V(#J) = #N, 

F.SET V(#J) = .M1.; 

ENDDO ], 

OUT Cold $ 

LIST Cold $ 

Stuffy 

Date Headache Fever Nose Cough 

011593 1 - - - 

012293 - 2 3 4 

020993 - 2 - 4 

021093 1 - - - 

__________________________________________________________________________ 

Either form of DO loop may have a second scratch variable which has as its value the number of times the 

DO is executed. Figure 5.2 illustrates this usage. The file has a series of dummy (0/1) variables. The purpose of


the MODIFY command is to convert the zeros to missing and the ones to the position of the variable in the DO 

list. This is an easy way to convert a series of multiple response questions which are coded as dummy variables 

into the 1 through n type of code which the SURVEY command expects for multiple response banner (column) 

variables. 

MODIFY Cold [ DO #j #n USING headache to Cough; 

The first scratch variable, #j, takes on the positions (2-5) of the 4 variables in the USING list. The second 

scratch variable takes on the value 1 the first time through the loop, 2 the second time through the loop, 3 the third 

time through, and 4 in the final loop. 

5.3 DO Stepping Through a Range 

The second form of the DO uses a range of numeric constants or expressions. 

DO #K = 15, 24; 

The range of the scratch variable #K is 15 through 24. Because there is no third argument, a stepsize of 1 is assumed 

and the values of #K are 15, 16, 17, etc. 

DO #K = 15, 24, 2; 

Here the stepsize is 2 and #K takes on the value 15, 17, 19, etc. The constants and the stepsize can be any numeric 

value or expression that is available at that moment. They can, in other words, use values which change from case 

to case. The values can be real numbers with a fractional part. The only exception to this is when the DO is used 

to generate or rename a list of variables. The values in a GENERATE or RENAME loop must be available at the 

beginning of the command and must have integer values. 

__________________________________________________________________________ 

Figure 5.3 DO: Range and Stepsize 

File Tests 

pre post pre post pre post 

.1 .1 .2 .2 .3 .3 

68 75 92 94 89 88 

73 73 84 93 85 89 

78 79 72 80 73 75 

MODIFY Tests 

[ DO #j = 2, 6, 2; 

SET V(#j) = V(#j) - V(#j-1); 

ENDDO ], 

OUT Test2 $ 

File Test2 

pre post pre post pre post 

.1 .1 .2 .2 .3 .3 

68 7 92 2 89 -1 

73 0 84 9 85 4 

78 1 72 8 73 2 

__________________________________________________________________________


Figure 5.3 contains a small data set with 3 sets of values: a pre score and a post score. The MODIFY command 

is used to change the post values to the difference between the pre value and the corresponding post value. 

DO #J = 2, 6, 2; 

has a value for #J that is 2 the first time through the loop. Because the stepsize is 2, the second time through the 

loop the value of #J is 4. The final time through the loop the value of #J is 6. Because the subscripts for the V 

vector can be expressions, the use of V(#j-1) points to each of the pre variables in turn as it takes on the values 1, 

3, and 5. 

The DO numbers can be fractional values. 

DO #J = .5, .8, .1; 

This loop will have 4 iterations with #j as .5, .6, .7, and .8 . The range can go backwards with either a supplied 

negative value or a default -1. 

DO #J = 3, -3, -2; 

The arguments for the DO can be expressions. If you wish to loop with a step argument through a list of variables 

and you know the variable names but not the locations you can to the following: 

DO #J = loc(pre.1) to loc(pre.3), 2;c 

Figure 5.4 illustrates the difference between the types of DO loops. In the first command #J takes on the positions 

of the variables in the USING list. In the second command #J begins with 2, the value of VAR1 and ends 

with 6, the value of VAR3. 

__________________________________________________________________________ 

Figure 5.4 DO Loops: An Example of Each Type 

File XX 

VAR1 VAR2 VAR3 

2 4 6 

The Commands The Output 

PROCESS XX [ DO USING var1 TO var3; #J= 1 positions 

PUT #J; ENDDO ] $ #J= 2 of USING 

#J= 3 variables 

PROCESS XX [ DO #J = var1, var3; #J= 2 value var1 

PUT #J; #J= 3 

ENDDO ] $ #J= 4 

#J= 5 

#J= 6 value var3 

PROCESS XX [ DO #J = positions 

(loc)var1, (loc)var3, 2; #J= 1 var1 

PUT #J; #J= 3 var3 

ENDDO ] $ 

__________________________________________________________________________


5.4 DO Loops: Other Features 

Figure 5.5 illustrates the optional features of the DO. A DO can reference a label. For example: 

DO testloop #J .... 

This label is then used as a statement label on the ENDDO statement: 

testloop: ENDDO; 

__________________________________________________________________________ 

Figure 5.5 Labelled DO, EXITDO and NEXTDO 

File Myfile 

QA1 QA2 QA3 QB1 QB2 QB3 

2 3 4 - 1 1 

3 1 8 2 1 1 

7 3 6 1 1 1 

TEXT; 

First RECODE variables QA1 through QA3 into the values 0-4. Then 

compute the average of QA1, QA2 and QA3. However, if the QB variable 

that corresponds to the QA variable is missing, exit the DO (EXITDO) 

and ignore the remaining values. If the QB variable that corresponds 

to the QA variable is a 2, move immediately (NEXTDO) to the next 

element in the loop and do not include the current value. 

$ 

LIST Myfile [ GEN #Total = 0, GEN N = 0, GEN Average = .M.; 

............... 

DO testloop #j USING QA1 TO QA3; 

SET V(#j) = RECODE 

( V(#J), 0=M, 1 2=2, 3 TO 5=1, 6 8 9=3, G=4 ); 

IF V(#J+3) MISSING, EXITDO; 

IF V(#J+3) EQ 2, NEXTDO; 

INCREASE #Total BY V(#J), INCREASE N; 


SET Average = #Total / N ] $ 

QA1 QA2 QA3 QB1 QB2 QB3 N Average 

1 3 4 - 1 1 0 - 

2 1 3 2 1 1 2 2 

4 2 3 1 1 1 3 3 

__________________________________________________________________________ 

Any statement that has a label can be used as the target of a GOTO. GOTO, which is discussed later in this chapter, 

provides a way to selectively execute the PPL. The label in a DO is also useful when there are nested DO's.


DO loop1 #J ....; 

PPL here; 

DO Loop2 #K ....; 

More PPL here; 

loop2: ENDDO; 

And yet more PPL; 

loop1: ENDDO; 

You can exit from a DO loop at any time by using the EXITDO PPL statement. EXITDO has the effect of a 

branch to the PPL (if any) after its ENDDO. PPL processing continues there. NEXTDO, on the other hand, is a 

branch to the ENDDO statement where the DO counters are incremented. In Figure 5.5 the last 5 lines of the LIST 

command could have been written as: 

IF V(#J+3) MISSING, GOTO NEXT; 

IF V(#J+3) EQ 2, GOTO testloop; 

INCREASE #Total BY V(#J), INCREASE N; 


NEXT: SET Average = #Total / N ] $ 

In Figure 5.5, QA2 and QA3 for the first case are not recoded. This is because QB1 is missing. As soon as 

QA1 is recoded the statement 

IF V(#J+3) MISSING, EXITDO; 

is executed. Since QB1 is missing, the loop is exited without processing the remaining variables for that case. 

Case 2 has it average calculated on just the last two values. This is because QB1 on that case is a 2. The statement: 

IF V(#J+3) EQ 2, NEXTDO; 

causes a branch to the ENDDO without including QA1 in the totals. 

The DO scratch variable or variables are still defined when a DO exits. They remain set to whatever values 

they had in the final DO iteration that was done. 

EXITDO and NEXTDO can be used in phrases like: 

IF Age GT 14, T.NEXTDO, F.EXITDO; 

EXITDO and NEXTDO can be followed by a DO statement label. Here, we exit all three loops from the innermost 

loop: 

DO aaa #J = 1, 2; 

DO bbb #K = 3, 4; 

DO ccc #M - 5, 6; 

EXITDO aaa; 

ccc: ENDDO; 

bbb: ENDDO; 

aaa: ENDDO; 

Even though we are out of the loops, #J and #K and #M can be used; they have the values 1, 3, and 5. If the above 

EXITDO had no label, it would have exited only the DO ccc loop. 

5.5 GENERATE AND RENAME 

GENERATE and RENAME use the same conventions in creating variable names. When a single variable is involved 

there is no need for a complex mask:


[ GENERATE Family.Income; 

RENAME Test1 TO Math121; 

RENAME V(2) TO Chem34 ] 

RENAME requires the existing name, TO, and the new name, which must be a unique name in the file. GENER- 

ATE requires only the variable name. 

If a list of variables is to be generated or renamed, a DO loop may be used. A DO which contains a GEN- 

ERATE or RENAME may not contain other PPL statements. Also, if the DO uses a range (as in DO #J = 1, 5), 

the control values must be integer constants or integer scratch variables whose values are known when the command 

begins. It is necessary to know the range of values before any cases are read in order to properly set up the 

renames or generates. 

5.6 Using GENERATE in DO Loops 

Typically, when GENERATE is used to create a new variable, a name for the variable is provided by the user or, 

if the “?” has been used, P-STAT generates a name. When GENERATE is used in a DO loop, multiple variables 

are created and unique names need to be provided or generated for them. The format of a GENERATE within a 

DO loop is one of the following: 

GENERATE ? = value; 

GENERATE ? (mask) = value; 

GENERATE V(#J) (mask) = value; 

GENERATE V(##K) (mask) = value; 

If the variables are character, the :C or :C20 or such directly follows the mask or, if there is no mask, the “?”. 

Masks are described below. The “= value” is optional; if not supplied, the variable is set to missing. 

When the “?” is used: 

[ DO #K USING Q3 TO Q5; GENERATE ? = SQRT ( V(#K) ); 

ENDDO ] 

names for the three new variables are generated by P-STAT. If there are ten variables in the file, the new variables 

are VAR11 (the square root of the variable named Q3), VAR12 (the square root of the variable named Q4) and 

VAR13 (the square root of the variable named Q5). The same format is used to generate a list of character 

variables: 

[ DO #K USING Q3 TO Q5; 

GENERATE ?:C = CHARACTER ( V(#K) ); ENDDO ] 

The ? is followed by “:C”. The length may be specified: 

GENERATE ?:C32 

A mask containing a prefix or suffix may be provided for the names being generated. The mask is enclosed 

in parentheses and an ampersand (&) is used to represent the name of the current DO loop variable: 


GENERATE V(#K) ( 'Sqrt.' & ) = SQRT ( V(#K) ); ENDDO ] 

The new variable names are composed of the prefix “Sqrt.” followed by one of the names of the variables in the 

DO list — the variable currently in the DO loop. Since the names of the variables in the DO list are Q3, Q4 and 

Q5, the names for the new variables are “Sqrt.Q3”, “Sqrt.Q4” and “Sqrt.Q5”. A suffix is created by moving the 

“&” in the mask so that it precedes the string. 


GENERATE V(#K) ( & '.Sqrt' ) = SQRT ( V(#K) ); 

ENDDO ] 

This creates new names “Q3.Sqrt”, “Q4.Sqrt” and “Q5.Sqrt”.


If the new name is longer than 16 characters, the prefix or suffix is left intact and the current variable name 

is truncated. This may cause an error due to a repeated name. 

When GENERATE is used in a DO loop, the loop can have no other statements; i.e., it can have only DO, 

GENERATE and ENDDO. 

5.7 Using RENAME in DO Loops 

In the simplest form of RENAME, like the GENERATE illustrated above, all the renamed variables have the 

specified prefix (or suffix) in their names: 

[ DO #J USING Item1 TO Item10; RENAME V(#J) ( 'Test.' & ); ENDDO] 

Here, variables Item1 through Item10 are renamed by prefixing their names with “Test.”. The variable previously 

named “Item1” is renamed “Test.Item1”, “Item2” is renamed “Test.Item2”, and so on. If the prefix plus the original 

name contains more than 16 characters, the entire prefix is used and characters are removed from the end of 

the original name until a 16-character name results. 

The format for RENAME within a DO loop is the following: 

1. RENAME 

2. a V(#J) usage. This identifies the variable to be renamed. It also provides its current name to the 

mask. 

3. a mask in parentheses which contains strings in quotes to be used exactly as entered. It also contains 

special characters such as the “&” which are used to select or omit letters from the input label and to 

supply numbers using the DO loop scratch variable. 

4. a semicolon, ending the statement. 

This is an example of a simple mask: 

[ DO #j=21,35; RENAME V(#j) (XOOXX); ENDDO ] 

Here, a mask of (XOOXX) is supplied. The initial X says use the first input character, the OO says omit the next 

two characters, and the XX says use the next two (characters 4 and 5). This mask would rename VAR31 into 

V31. 

[ DO #n USING pre? ; 

RENAME V(#n) ( 'test' OOO & ); 

ENDDO] 

In this loop, each name that starts with 'pre' is renamed. Each new name begins with 'test'. The first 3 characters 

of each old name, which are known to be 'pre', are bypassed (indicated by ooo), and the rest of the old name (indicated 

by &) is copied into the new name area after 'test'. We are replacing 3 characters with 4. The & operator 

truncates if needed, so if a name started with 16 characters, it would get 'test' followed by characters 4 to 15. The 

OOO operator caused the & operator to start with character 4. 

When RENAME is used in a DO loop, the loop can have no other statements; ie.e, it can have only DO, RE- 

NAME and ENDDO. 

5.8 Masks for RENAME and GENERATE 

A mask is used to create a name for a variable, either by modifying the ? or V(#J) name preceding it or by 

creating a totally different name. The mask activity begins with a pointer on the initial character of the input name. 

The pointer is moved onto the next character after each usage of X, O, c or C. Further use of X-O-c-C is ignored 

when the pointer is beyond the final input character.


__________________________________________________________________________ 

Figure 5.6 Rename Examples 

FILE myfile 

Id Item1 Item2 Item3 Item4 Item5 

1 1 2 3 4 5 

LIST Myfile [ DO #j USING Item2 TO Item5; Select the 1st 

RENAME v(#j) ( XOOOX ) ; ENDDO ] $ and 5th characters 

Id Item1 I2 I3 I4 I5 

1 1 2 3 4 5 

LIST Myfile [ DO #J USING Item2 TO Item5; Provide a prefix 

RENAME V(#j) ( 'Q.' d ); ENDDO; ] $ and use the DO 

loop number (d) 

Id Item1 Q.3 Q.4 Q.5 Q.6 

1 1 2 3 4 5 

LIST Myfile [ DO #J USING Item2 TO Item5; Provide a prefix 

RENAME V(#j) ( 'Question.' n ); ENDDO; ] $ and use the DO 

loop counter (n) 

Question Question Question Question 

Id Item1 .1 .2 .3 .4 

1 1 2 3 4 5 

LIST Myfile [ DO #J USING Item2 TO Item5; Use the original 

RENAME V(#j) ( & '.' d ); ENDDO; ] $ name and the DO 

loop number. 

Item2 Item3 Item4 Item5 

Id Item1 .3 .4 .5 .6 

1 1 2 3 4 5 

LIST Myfile [ DO #J USING Item2 TO Item5; Use 'Q.', the 

RENAME V(#j) ( 'Q.' & '.' n ); ENDDO; ] $ original name, 

'.' and the DO 

Q. Q. Q. Q. loop counter 

Item2 Item3 Item4 Item5 

Id Item1 .1 .2 .3 .4 

1 1 2 3 4 5 

__________________________________________________________________________


1. x or X takes the current input character, if usable. 

2. o or O omits the current input character. NOTE: the digit '0' is also usable. 

3. c takes the current input character, if usable, and, if it is a letter, puts it into lower case. C takes the 

current input character, if usable, and, if it is a letter, puts it into upper case. 

4. & takes all remaining usable input characters that can fit, starting at the current location of the 

pointer. 

5. @4 places the pointer onto the 4th input character. (@4 xxx) and (ooo xxx) are identical. 

6. @-5 places the pointer on the 5th character from the right hand end. 

A character that has been the subject of any of the X-O-c-C-& operators is no longer usable by any subsequent 

operator. As a result, ( @-4 OOOO @1 & 'post' ) could be used to inactivate the rightmost 4 characters, take the 

rest, and add 'post' to it. In other words, the mask has replaced an existing 4-character suffix with a new one. 

Blanks are ignored in masks, which can markedly improve readability. For example: 

(XXXXXXXXX) and 

(XXX XXX XXX) are identical, 

Other features of the mask are: 

1. 'ab.cde' moves the string contents into the new name. 

2. D or d inserts the V subscript. This is based on the current value of the DO scratch variable. If 

V(#j) is used, 17 is inserted when #j=17. If V(#j+10) is used, 27 is inserted when #j=17. DD is like 

D, but forces 2 characters; 07 is used instead of 7. If DDD is used, three numbers are inserted in the 

new name and a 7 bonds to the new label as 007. 

3. N or n inserts the current iteration count of the DO loop. If this is the third trip through the loop, 3 

is inserted. NN provides 2 digits, NNN 3 digits. You do not have to use a counter scratch variable 

in the DO statement in order to use 'N'. 

The following are some examples of a name, a mask and the resulting labels 

Suppose we have: 

current name mask result 

abcdefg (xx @-4 xoxx ) abdfg 

abcdeF (Cccc @-4 cccc ) Abcdef 

abc12345def ('Var' @4 xxxxx) Var12345 

abc12345def ('Var' @4 & ) Var12345def 

abcd (@-6 xxxxxx ) abcd 

[ DO #j=11,13; RENAME v(#j) (a mask); ENDDO ] 

When #j is 12, meaning its the second iteration, and the name of v(12) is “ abcdef”, the masks behave as follows: 

abcdef (“item.” NN ) item.02 

abcdef ('PreTest.' DDD) PreTest.012 

abcdef ( xxx '.' D ) abc.12 

Figure 5.6 contains 5 examples of RENAME masks and illustrates the use of 'X' and 'O', text strings, the original 

variable name and both of the DO scratch variable. 

Figure 5.7 illustrates the difference between the use of ? and V(#scratch) in the DO LOOP GENERATE. File 

work has four variables. The ? uses the generated labels as the labels on which to base any changes. In the first 

example in Figure 5.7 these labels are VAR5 through VAR8. When a scratch variable is used, the labels provided 

to the mask are the current DO loop variables. In the second example in Figure 5.7 these variables are V1 through


V4. The only reason for ever using the V vector and a scratch variable in a RENAME or GENERATE is to use 

some of the characters found in the original variable name. 

__________________________________________________________________________ 

Figure 5.7 GENERATE: Generated Versus Original 

LIST work [ DO #j = 1, 4; 

GEN ? ( 'New.' & ); 

ENDDO ] $ 

New New New New 

V1 V2 V3 V4 VAR5 VAR6 VAR7 VAR8 

1 2 3 4 - - - - 

LIST work [ DO #j = 1, 4; 

GEN v(#j) ( 'New.' & ); 

ENDDO ] $ 

New New New New 

V1 V2 V3 V4 V1 V2 V3 V4 

1 2 3 4 - - - - 

__________________________________________________________________________ 

DO #J #N USING Pre?; 

SET PRE?(#N) = ..... ; 

ENDDO; 

If there are 5 variables beginning with “pre”, the loop will be exercised 5 times and the scratch variable #N will 

take on the values 1, 2, 3, 4, and 5. Here, using V(#J) is the same as using PRE?(#N). 

__________________________________________________________________________ 

Figure 5.8 Dynamic Array, Wildcard, Prefix and GENERATE 

LIST Tests 

[ DO #P #N USING pre?; 

GEN ? ( 'Diff.' n ) = post?(#N) - pre?(#N) ; 

ENDDO ] $ 

pre post pre post pre post Diff Diff Diff 

.1 .1 .2 .2 .3 .3 .1 .2 .3 

68 75 92 94 89 88 7 2 -1 

73 73 84 93 85 89 0 9 4 

78 79 72 80 73 75 1 8 2 

__________________________________________________________________________ 

When variables have a common prefix, the combination of the DO and a dynamic vector created using a wildcard 

can be a powerful tool. A dynamic vector is created any time that a wildcard is used in a variable name list.


Age Sex Pre.1 Post.1 Pre.2 Post.2 Pre.3 Post.3 Aptitude 

Given these variable names, the use of 

Pre? 

creates a dynamic vector containing the three variables in the list which begin with the characters “pre”. These 

variables can now be referenced in the same way that the variables in the V vector are referenced: 

__________________________________________________________________________ 

Figure 5.9 Complex MASK: Generate Variable Names 

Given a file with variables “Variable01” “Variable02” and “Variable03” 

DO #j = 2, 3; 

GEN v(#j) 

( C D @3c 'mMm' @-4C NN o & 'zz' dd )= v(#j); 

ENDDO; 

The variable name created from input variable “Variable02” is: 

V2rmMmL0102zz02 

The mask C produces V upper-case letter from input 

D produces 2 the value of #J in the DO loop 

@3c produces r 3rd character of input in lower case 

'mMm' add mMm strings enclosed in quotes are added 

@-4Ce produces L 4th character from the endof the 

input variable name, upper case 

NN adds 01 iteration count in the DO to 2 places 

o omit the next letter in input variable name: skip the “e” 

& include 02 use the rest of the input variable name 

'zz' adds zz mask can have multiple strings 

dd adds 02 same as DD the value of #J to 2 places 

Spaces are used in masks to make them easier to follow. They are not 

required: 

(CD@3c'mMm'@-4CNNo&'zz'dd) 

The characters in the new name associated with scratch variable #J are; 

D NN dd 

V 2 rmMmL 01 02zz 02 

The characters in the new name taken from the original name are: 

C @3c @-4C o& 

V2 r mMm L 01 02 zz02 

Character strings added to the new name 

V2r mMm L0102 zz 02 

___________________________________________________________________________


Figure 5.8 contains the command and the resulting output file for a DO/GENERATE using dynamic arrays. 

The mask for the variable names ( 'Diff' n ) asks for names beginning with the string “Diff.” followed by the DO 

loop counter “n” 

Figure 5.9 illustrates a complex rename which uses all the possible mask codes. The variable named 

“Variable02” in the input file is used in the creation of a new variable “V2rl02mmm01zz02'. The mask used 

in this example is: “(CD@3c@-4co&’mmm’NN’zz’dd )” 

5.9 IF-THEN-ELSE BLOCKS 

Figure 5.10 contains an example of a series of IF/SET statements contrasted with a complex RECODE. These 

same IF/SET statements could have been written as an IF-THEN-ELSE block. 

The IF-THEN-ELSE block makes the logic easier to follow when there is more than a single condition. The IF- 

THEN-ELSE block has additional advantages because there can be any number of PPL statements and actions in 

the block, including nested IF-THEN-ELSE blocks and DO loops. 

__________________________________________________________________________ 

Figure 5.10 IF or IF-THEN-ELSE 

MODIFY aaa 



IF age lt 30 and region EQ 'east' SET sector = 1; 

IF age lt 30 and region EQ 'central' SET sector = 2; 

IF age lt 30 and region EQ 'west' SET sector = 3; 

IF age ge 30 and region EQ 'east' SET sector = 4; 

IF age ge 30 and region EQ 'central' SET sector = 5; 

IF age ge 30 and region EQ 'west' SET sector = 6; 

], OUT bbb $ 

MODIFY aaa 



IF age lt 30 THEN; 

IF region EQ 'east' SET sector = 1; 

IF region EQ 'central' SET sector = 2; 

IF region EQ 'west' SET sector = 3; 

F.ELSE; 

IF region EQ 'east' SET sector = 4; 

IF region EQ 'central' SET sector = 5; 

IF region EQ 'west' SET sector = 6; 

ENDIF ], out bbb $ 

__________________________________________________________________________ 

IF-THEN-ELSE-ENDIF blocks can be nested 9 deep. They can occur within a DO loop, as long as the block 

is ENTIRELY within the DO loop. The block begins with an IF statement. The IF statement begins with IF. It 

can also have OR and AND. A THEN ends the statement. The THEN, just like a consequence in a simple IF 

statement can be preceded with FMT qualification, like


For example: 

M.THEN 

LIST X [ GEN Newvar = 99; 

IF Age GE 21 OR age EQ 19, 

THEN; 

SET oldvar = 22 ; 

SET abcdef = 33 ; 

ELSE; 

SET xyz = 44; 

ENDIF ] $ 

The ELSE section is executed whenever the section before the ELSE is not executed. I.e., given THEN, the 

statements before ELSE are executed when the IF is true, and the statements after the ELSE are executed when 

the IF is false or missing. Using FM.THEN would reverse this and the statements following the FM.THEN will 

be executed when the result of the IF was either false or missing while the statements following the ELSE will be 

executed whenever the result of the IF is true. In other words the previous is example is exactly the same as: 

LIST X [ GEN Newvar = 99; 

IF Age GE 21 OR age EQ 19, 

FM.THEN; 

SET xyz = 44; 

ELSE; 

SET oldvar = 22 ; 

SET abcdef = 33 ; 

ENDIF ] $ 

5.10 IF-THEN-ELSE: Other Features. 

F.ELSE and M.ELSE sections can be used to provide greater control. These allow a true 3-way logic in the 

IF-THEN blocks. T.ELSE is also allowed; however it is only useful when M.THEN or F.THEN begins the block. 

Alternate names TELSE, FELSE and MELSE are recognized. 

There are some restrictions. GENERATE cannot be used within an IF block. The ELSE section, if used, must 

be the last section. 

Figure 5.11 illustrates the use of an IF-THEN block with F.ELSE and M.ELSE. The example illustrates a 

way of estimating a missing value from the mean of previous values. This is sometimes referred to as a hot deck 

approach. The results change with the data. For purposes of this example, it was decided to use the average of 

the previous 10 non-missing values as the substitute value. These values are stored in the first 10 locations of the 

P vector. 

Because we have chosen to use previous values, there is a problem if any of the first ten cases has a missing 

value. Once ten non-missing cases have been read, the problem disappears. In this example we have decided to 

use whatever information is available. Given the following data values for variable Age: 

33 9 15 20 73 - 44 23 18 54 62 29 - 50 82 19 - 29 39 

we use the 5 values prece 

ding the first missing value to produce a result value of 35. If there were no good values available, the result would 

be set to missing type 3. 

When the first case in the file is processed we set the 10 locations in the permanent vector that we are going 

to use to -1. This permits us to test for positive values when we calculate the substitute value. 

MODIFY Ages [ IF FIRST ( .FILE. ) 

THEN; 

DO #j = 1, 10 ;


SET P(#j) = -1; 

ENDDO; 

ENDIF; 

The 3-way logic is determined in the Age LT 10 test: 

1. True if Age is non-missing and less than 10. This is considered an error and a simple 

report with the case number is written. 

IF Age LT 10 

THEN; 

PUT .N. >; 

GO TO NEXT; 

2. False if Age is non-missing and greater or equal to 10. The next location in the P vector is calculated 

and that value is replaced by the current good value on Age. Thus the contents of the P vector continually 

change as good values are processed. When 10 good values have been found, there are no 

longer any negative numbers (-1) remaining. 

F.ELSE; 

IF ##Ploc EQ 10, SET ##Ploc = 0; 

__________________________________________________________________________ 

Figure 5.11 IF-THEN with F.ELSE and M.ELSE in a Simple Hot Deck Example 

GEN ##Ploc = 10, GEN ##Total = 0, GEN ##N=0 $ 

MODIFY Ages [ IF FIRST ( .FILE. ) 

THEN; 

DO #j = 1, 10 ; 

SET P(#j) = -1; 

ENDDO; 

ENDIF; 

IF Age LT 10 

THEN; 

PUT .N. >; 

GO TO NEXT; 

F.ELSE; 

IF ##Ploc EQ 10, SET ##Ploc = 0; 

INCREASE ##Ploc; 

SET P(##Ploc) = Age; 

GO TO NEXT; 

M.ELSE; 

SET ##Total = 0, SET ##N = 0; 

DO #J = 1, 10; 

IF ( P(#j) LT 0 ) EXITDO; 

/* increase count of good P values */ 

INCREASE ##N;


/* increase totals of good P values */ 

INCREASE ##Total BY P(#j); 

ENDDO; 

IF ##N GT 0 THEN; 

SET Age = INT ( ##Total / ##N ) ; 

PUT .N. > Age 

>; 

ELSE; 

SET Age = .M3.; 

ENDIF; 

ENDIF; 

NEXT: ], OUT Ages $ 

__________________________________________________________________________ 

INCREASE ##Ploc; 

SET P(##Ploc) = Age; 

GO TO NEXT; 

3. Missing if Age is unknown. The current contents of the P vector are totaled and the average is calculated. 

This average is based on the number of good values currently available in the P vector. This 

average is substituted for Age in the file and a report is printed giving the case number and the new 

value. 

M.ELSE; 

SET ##Total = 0, SET ##N = 0; 

A DO loop is used to examine the 10 values currently in the P vector. If a negative number is found 

the P vector is not yet stocked with the full complement of 10 values and we can exit from the DO 

loop with ##N set to the current number of good values and ## Total to the sum of those values. 

DO #J = 1, 10; 

IF ( P(#j) LT 0 ) EXITDO; 

INCREASE ##N; 

INCREASE ##Total BY P(#j); 

ENDDO; 

If there is at least 1 good P value we can now calculate an average, set Age to that value and write 

the appropriate information in the report. 

IF ##N GT 0 THEN; 

SET Age = INT ( ##Total / ##N ) ; 

PUT .N. > Age 

>; 

If there have been no good values as this case is processed, it is set to missing 3. 

ELSE; 

SET Age = .M3.; 

Given the following 19 values for variable Age: 

33 9 15 20 73 - 44 23 18 54 62 29 - 50 82 19 - 29 39 

the output file contains: 

33 9 15 20 73 35 44 23 18 54 62 29 37 50 82 19 45 29 39


And the report is: 

Case 2 is too young. 

Case 6 given 35 as the current hot deck value of Age. 



In both Figures 5.11 and 5.13, the scratch variables that are needed are generated before the command 

in which they are used. If all the scratch variables are predefined, there is no need to worry about the restrictions 

on generating variables in either an IF-THEN-ELSE block or a DO. 

5.11 IF-THEN-ELSE: Another Example 

Figure 5.12 contains a small data set and the resulting report. Figure 5.13 contains the commands which produced 

the report. The commands contain IF-THEN-ELSE blocks within IF-THEN-ELSE blocks as well as a DO loop 

inside the blocks. The data set mimics a survey in which the respondents were asked about their computer hardware 

and software. The software questions Appl.1 through Appl.3 coded 1 for an editor or report writer, 2 for a 

database and 3 for an analysis or statistics program. The character variables Wappl.1, Wappl.2 and Wappl.3 are 

character variables containing the name of the program associated with the usage in the Appl? questions. 

__________________________________________________________________________ 

Figure 5.12 IF-THEN-ELSE: The Data and the Report 

File Compute 

Appl Wappl Appl Wappl Appl Wappl 

OS Chip .1 .1 .2 .2 .3 .3 

DOS 386 1 Word Perfect 2 Dbase III - 

MVS 386 2 Excel 3 P-STAT 1 Kedit 

Unix Spark 1 P-STAT 2 Informix 3 P-STAT 

The Report 

error on case 2 

PC users 1 

Unix users 1 

P-STAT users 1 

P-STAT usages 2 

__________________________________________________________________________ 

The purpose of this possibly daunting example is to show the generality of use of IF-THEN-ELSE blocks and 

DO loops. When we say 'if #Puse EQ 1' within the DO loop we have: 

1. a simple IF 

2. within an IF-THEN block (it has no ELSE) 

3. within a DO loop 

4. within an IF-THEN-ELSE block 

5. within an IF-THEN-ELSE block. 

The report contains a counter of the number of people using a PC operating system or a Unix operating system, 

using P-STAT for any single purpose and a count of the total times that P-STAT was cited. Before any totals


are accumulated, the data are check for validity and if the answers seem inappropriate the information in the case 

is not used in the report. 

The first step in generating the report is to set up a series of scratch variables. This is done as stand-alone PPL. 

GEN ##Error = 0, GEN ##Numuse = 0, GEN ##PCuse = 0, 

GEN ##Users = 0, GEN ##Aps = 0 $ 

It could instead be included at the beginning of the PROCESS command: 

PROCESS Computers [ IF FIRST ( .FILE. ),GEN ##Error = 0, 

GEN ##Numuse = 0, GEN ##PCuse = 0, GEN ##Users = 0, GEN ##Aps = 0; 

After generating a single temporary scratch variable, the first step in the PROCESS command in Figure 5.13 

is to set up the major IF-THEN-ELSE block. 

IF OS MATCHES ' ( DOS | Windows | NT | OS/2 ) * ' 

THEN; 

The MATCHES function is described in detail in the chapter “PPL: Modification of Character Variables”. Here 

it is used to see if the operating system is any of the common operating systems for Intel Chip machines. If the IF 

is true, a second IF-THEN-ELSE block is used to see if the computer chip is one of the Intel chips: 

__________________________________________________________________________ 

Figure 5.13 IF-THEN-ELSE Block with Nested IF and a DO Loop 

GEN ##Error = 0, GEN ##Numuse = 0, GEN ##PCuse = 0, 

GEN ##Users = 0, GEN ##Aps = 0 $ 

PROCESS Compute 

[ GEN #Puse = 0; IF OS MISSING OR Chip MISSING GOTO Err; 

IF OS MATCHES ' ( DOS | Windows | NT | OS/2 ) * ' 

THEN; 

if Chip AMONG ( '286' '386' '486' 'Pentium' ) 

then; 

INCREASE ##PCuse; 

ELSE; 

else; 

PUT Chip > OS 

> .n. ; 

SET ##Error = 1; 

GO TO Err; 

endif; 

if OS NE 'UNIX' THEN; 


GO TO Err; 

else; 

INCREASE ##Users; 

DO #AP #N USING Appl?; 

IF V(#AP) MISSING, NEXTDO;


IF Wappl?(#N) AMONG ( 'P-STAT' 'PSTAT' ) THEN; 

INCREASE ##Aps, INCREASE #Puse; 

IF #Puse EQ 1, INCREASE ##Numuse; 

ENDIF; 

ENDDO; 

endif; 

ENDIF; 

GO TO Report; 

Err: PUT @SKIP2 .n. @NEXT ; GO TO Next; 

Report: IF LAST ( .FILE. ) 

PUT @20 ##PCuse @next 

> @20 ##Users @next 

> @20 ##Numuse @next 

> @20 #Puse; 

Next: ] $ 

__________________________________________________________________________ 

if Chip AMONG ( '286' '386' '486' 'Pentium' ) 

then; 

INCREASE ##PCuse; 

If the replies to the questions about the operating system and the chip agree, the scratch variable ##PCuse is increased. 

If it is false there is a possible error indicated by the PUT statement and a branch to the statement labelled 

“Err”. 

else; 

PUT Chip > OS 

> .n. ; 


GO TO Err; 

endif; 

The endif completes the nested IF-THEN-ELSE block and also the THEN portion of the major block. 

If the first IF-THEN is false and we have a computer that appears to be running an operating system other 

than the standard PC operating systems we will now process the “ELSE”. 

ELSE; 

if OS NE 'UNIX' THEN; 


GO TO Err; 

This starts a nested IF-THEN-ELSE block to eliminate and print an error report for any respondents who, like 

the second case in the data in Figure 5.12, are not using the UNIX operating system. The final section of the command 

is used to examine the applications for all cases like the third case in Figure 5.12 who are running UNIX. 

INCREASE ##Users; 


IF V(#AP) MISSING, NEXTDO; 


INCREASE ##Aps, INCREASE #Puse; 

IF #Puse EQ 1, INCREASE ##Numuse; 

ENDIF; 

ENDDO;


The scratch variable ##Users, our counter of UNIX users is immediately incremented. Next a DO is used to examine 

the list of applications and increment the remaining counters that are needed for the report. 


Two scratch variables are created by the DO. #AP takes on the values of the positions of any variable beginning 

with the characters “Appl”. Thus the first time through the loop #AP = 3. The second time #AP = 5. The final 

time #AP = 7. 

IF V(#AP) MISSING, NEXTDO; 

If the value of the application variable is missing, the remaining steps in the DO are bypassed. If there are no more 

loop iterations to be done control moves past the ENDDO statement. If this is not missing, the next test is done 

to determine if P-STAT is the name given for the application: 


#N in the DO loop takes on the values 1, 2, and 3 as the loop progresses. The use of the wildcard to set up a dynamic 

vector results in Wappl?(#N) tests variable Wappl.1 when #N is a 1, and Wappl.2 when #N is a 2 and 

Wappl.3 when #N is a 3. The third case in the file contains: 

Unix Spark 1 P-STAT 2 Informix 3 P-STAT 

The first time through the loop Wappl.1 has the value “P-STAT', therefore, the IF is true and the rest of the four 

line IF-ENDIF is executed. 

We increase the permanent scratch variable ##Aps, which is used for a total of all P-STAT applications, and also 

increase #Puse, a temporary scratch variable that is reset to 0 as each case starts. Thus when “P-STAT is found 

again in the third loop #Puse becomes 2 and we do not increase ##Numuse a second time for the same case. 

The work is now all done. It is only necessary to end the open IF blocks and write out the error messages and 

the reports: 

endif; 

ENDIF; 

GO TO Report; 

Err: PUT @SKIP2 .n. @NEXT ; GO TO Next; 

Report: IF LAST ( .FILE. ) 

PUT @20 ##PCuse @next 

> @20 ##Users @next 

> @20 ##Numuse @next 

> @20 #Puse; 

Next: ] $


DO LOOPS and IF-THEN-ELSE BLOCKS 

DO #J USING vn TO vn; 

SUMMARY 

specifies a scratch variable and then, after the USING, a list of variable names or positions. The list can 

also use TO, .ON. and wildcards like PRE? . The remaining PPL in the loop is executed once for each 

variable in the list. The scratch variable is set to the LOCATION of the current variable; therefore, the 

scratch variable is different in each iteration. 

Thus, #L, in the loop below, is set to the location (not the value) of Test1 in the first iteration, and to the 

location of Test10 in the last iteration. 

[ DO #L USING Test1 TO Test10 ); 

SET V(#L) = SQRT ( V(#L) ); ENDDO ] 

[ DO #J USING v(1) Height v(5) TO v(7) ); 

INCREASE V(#J); ENDDO ] 

The variables in the DO list can be tested to ensure that numeric operations are not preformed on character 

variables or visa-versa. The operators CHARACTER, NUMERIC, MISSING and GOOD can be 

used. 

[ DO #QQ USING SS.Number to ZIP; 

IF V(#QQ) NUMERIC, NEXTDO; 

SET V(#QQ) = LEFT ( V(#QQ); ENDDO ] 

[ DO #I USING V(1) TO V(25) V(28); 

IF V(#I) CHARACTER OR V(#I) GOOD, NEXTO; 

SET V(#I) = .M3.; ENDDO ] 

There is no limit to the number of PPL instructions that may be included in a DO loop. DO's may include 

other DO loops and IF-THEN-ELSE blocks. 

DO #J = nn, nn, nn; 

EXITDO 

specifies a scratch variable and then, after the '=', a start expression, an end expression and an optional 

stepsize expression. The scratch variable takes the values of the numbers from the start value through 

the end value as incremented by the stepsize. If the stepsize is not supplied, 1 is assumed. The scratch 

variable is usable in the PPL within the loop. 

[ DO #Vars = 1, 3 ; 

SET V(#Vars) = V(#Vars) / 12 ; ENDDO ] 

In this example, “#Vars” is the user-supplied scratch variable. It is used in the SET instruction as the 

subscript of V, the vector of variables in the file. Each of the first 3 variables in the file has its value 

divided by 12. 

causes the DO loop to be exited immediately even if all the loop instructions have not been completed. 

vn=variable name nn=number exp=expression


NEXTDO 

ENDDO 

causes a jump to the ENDDO statement where the DO loop counter is evaluated. If there are no more 

iterations, the loop terminates. 

Defines the end of the DO loop domain. Then ENDDO is processed, the current DO value is evaluated. 

If the loop is not complete, the counters are incremented and the commands in the DO domain are executed 

with the new values. 

GENERATE Within a DO Loop 

A new variable can be generated in each iteration of a DO-GENERATE loop. When GENERATE is used 

in a DO loop, the loop can have no other statements; i.e., it can have only DO, GENERATE and ENDDO. 

The format of a GENERATE within a DO loop is one of the following: 

GENERATE ? = value; 

GENERATE ? (mask) = value; 

GENERATE V(#J) (mask) = value; 

GENERATE V(##K) (mask) = value; 

If the variables are character, the :C or :C20 or such directly follows the mask or, if there is no mask, the 

“?”. Masks are described below. The “= value” is optional; if not supplied, the variable is set to missing. 

If the file currently has 20 variables, the ? causes the name of VAR21 to be created. It can then be 

masked. Use of V(#J) must be followed by a mask since that name already exists. 

RENAME Within a DO Loop 

A group of variables can be renamed in a DO loop. Each iteration renames a different variable. When 

RENAME is used in a DO loop, the loop can have no other statements; i.e., it can have only DO, RE- 

NAME and ENDDO. 

[ RENAME Social.S.Num TO SS.Number ] 

The format for RENAME within a DO loop is the following: 

1. RENAME 

2. a V(#J) usage. This identifies the variable to be renamed. It also provides its current name to the 

mask. 

3. a mask in parentheses which contains strings in quotes to be used exactly as entered. It also contains 

special characters such as the “&” which are used to select or omit letters from the input 

label and to supply numbers using the DO loop scratch variable. 

4. a semicolon, ending the statement. 

Examples of DO-RENAME loops with masks: 

[ DO #J USING Q1 TO Q23; 

RENAME V(#J) ( 'Survey.' & ); 

ENDDO] 

“Survey.” is a prefix. Variables Q1 through Q23 will be renamed by prefixing their names with “Survey.” 

The new names will be “Survey.Q1”, “Survey.Q2”, and so on. This is an example of a simple mask: 

exp=expression vn=variable name nn=number


[ DO #j=21,35; 

RENAME V(#j) (XOOXX); 

ENDDO ] 

Here, a mask of (XOOXX) is supplied. The initial X says use the first input character, the OO says omit 

the next two characters, and the XX says use the next two (characters 4 and 5). This mask would rename 

VAR31 into V31. 

MASKS for RENAME and GENERATE 

A mask is used to create a name for a variable, either by modifying the ? or V(#J) name preceding it or 

by creating a totally different name. The mask activity begins with a pointer on the initial character of 

the input name. The pointer is moved onto the next character after each usage of X, O, c or C. Further 

use of X-O-c-C is ignored when the pointer is beyond the final input character. 

1. x or X takes the current input character, if usable. 

2. o or O omits the current input character. NOTE: the digit '0' is also usable. 

3. c takes the current input character, if usable, and, if it is a letter, puts it into lower case. C takes 

the current input character, if usable, and, if it is a letter, puts it into upper case. 

4. & takes all remaining usable input characters that can fit, starting at the current location of the 

pointer. 

5. @4 places the pointer onto the 4th input character. (@4 xxx) and (ooo xxx) are identical. 

6. @-5 places the pointer on the 5th character from the right hand end. 

A character that has been the subject of any of the X-O-c-C-& operators is no longer usable by any subsequent 

operator. As a result, ( @-4 OOOO @1 & 'post' ) could be used to inactivate the rightmost 4 

characters, take the rest, and add 'post' to it. In other words, the mask has replaced an existing 4-character 

suffix with a new one. 

Blanks are ignored in masks, which can markedly improve readability. For example: 

(XXXXXXXXX) and 

(XXX XXX XXX) are identical, 

Other features of the mask are: 

1. 'ab.cde' moves the string contents into the new name. 

2. D or d inserts the V subscript. This is based on the current value of the DO scratch variable. If 

V(#j) is used, 17 is inserted when #j=17. If V(#j+10) is used, 27 is inserted when #j=17. DD is 

like D, but forces 2 characters; 07 is used instead of 7. If DDD is used, three numbers are inserted 

in the new name and a 7 bonds to the new label as 007. 

3. N or n inserts the current iteration count of the DO loop. If this is the third trip through the loop, 

3 is inserted. NN provides 2 digits, NNN 3 digits. You do not have to use a counter scratch 

variable in the DO statement in order to use 'N'. 

IF-THEN-ELSE 

IF-THEN-ELSE blocks may include any PPL statements including other IF-THEN-ELSE blocks and DO 

LOOPS. GOTO may also be used as long as the target label is not in the middle of another block. 

vn=variable name nn=number exp=expression


IF Age GE 14, THEN; 

ELSE; 

ENDIF; 

The IF of an IF-THEN can be complex (using OR and AND) but it can only be followed by 'THEN;'. 

The IF-THEN statement is followed by all the PPL statements to be executed when the IF clause is true. 

The directions of this logic can be changed by using the M/F prefixes. M.THEN would be followed by 

PPL statements to be executed if the IF statement result is missing. F.THEN is executed only if the IF 

statements evaluate to FALSE. 

[ IF Age GE 14, THEN; 

PUT 'over 14'; 

ELSE; 

PUT 'failed'; 

ENDIF; ] 

ELSE; is followed by all the PPL statements to be executed when the IF clause is not true. Not true includes 

results that are either false or missing. ELSE is optional. M.ELSE or F.ELSE can also be 

specified. 

ENDIF is required to denote the end of the IF block. 

exp=expression vn=variable name nn=number

6 


Functions and System Variables 

This chapter explains P-STAT functions. Functions evaluate or transform one or more arguments and yield a numeric 

or character value. This chapter also contains a complete list and description of the P-STAT system 

variables. System variables are special variables whose values are set by P-STAT, but may be accessed by the 

user. 

Numeric functions and functions that transform either numeric or character arguments are explained in this 

chapter. The final PPL chapter covers character (string) functions. The prior PPL chapters cover the basics of 

PPL modification — case and variable selection, changing existing variables, creating new variables, logical selection, 

positional notation, DO loops, the two recoding functions, NCOT and RECODE, and IF-THEN-ELSE 

blocks. 

Most data modification is done on a single case. The case is retained, deleted, or modified, depending on the 

values of variables found in that case or on the value of some system variable such as .N. , the case number. This 

is within-case modification. The functions and system values described in this chapter are primarily applicable to 

modification of a single case. The next PPL chapter covers across-case modification, that is, modification of multiple 

cases grouped together because of a common relationship. 

6.1 ONE-EXPRESSION FUNCTIONS 

There are four basic types of functions in the P-STAT programming language: 

1. functions that evaluate a single expression; 

2. functions that evaluate a list of expressions; 

3. special functions that evaluate the first expression in the argument list, using the additional arguments 

to define the function more precisely; and 

4. distribution functions that give the probability of obtaining a random deviate less than a specified 

value. 

One-expression functions evaluate a single numeric expression enclosed in parentheses. The expression may 

be a variable name or position, a constant, or a complex expression. Complex expressions are nested expressions, 

expressions containing arithmetic operators, and combinations of both of these. 

The function is used in a PPL clause containing an instruction or a logical test and its consequence: 

SET Age = INT (Age); 

IF Income GOOD, SET Income = ROUND (Income); 

Parentheses enclose the expression that the function evaluates. In the first example just given, the INT (i.e., integer) 

function evaluates Age and yields the integer portion of Age. Age is set to this integer value. In the second 

example, if Income is GOOD (non-missing), the ROUND function evaluates Income and yields a value rounded 

to the nearest whole number. Income is set to this rounded value. 

Functions may be nested within functions. For example: 

SET Root.Income = ROUND ( SQRT ( Income )); 

The square-root of Income is rounded and stored in variable Root.Income.

6.2 PPL: Functions and System Variables 

The functions that evaluate a single numeric expression are: 

ABS ( exp ) absolute value 

COS ( exp ) cosine 

ACOS ( exp ) arc cosine 

EXP ( exp ) exponential (e raised to this exponent) 

FACTORIAL (exp) the factorial value of the argument 

FRAC ( exp ) fractional part 

INT ( exp ) integer part 

LOC ( vn ) location (of a variable) 

LOG ( exp ) natural logarithm (base e) 

LOG10 ( exp ) common logarithm (base 10) 

ROUND ( exp ) rounds to nearest integer 

CEIL (exp) smallest integer greater than or equal to the input value 

FLOOR (exp) largest integer that is less than or equal to the input value 

SIN ( exp ) sine 

ASIN ( exp ) arc sine 

SQRT ( exp ) square root 

TAN ( exp ) tangent 

ATAN ( exp ) arc tangent 

6.2 Rounding Functions 

The FRAC, INT and ROUND functions yield rounded values (of sorts). The original signs of the numbers are preserved. 

The ABS (absolute value) function yields the original value of a number without any sign. Examples of 

these functions, using the same value as the argument for each, highlight the differences among the functions: 

Function Result 

FRAC ( -621.87 ) -0.87 

INT ( -621.87 ) -621 

ROUND ( -621.87 ) -622 

ABS ( -621.87 ) 621.87 

6.3 Floor and Ceiling 

FLOOR is a function that takes a numeric input and produces the largest integer that is less than or equal to the 

input value. Thus: 

FLOOR(-4.1) = -5 

FLOOR( 2 ) = 2 

FLOOR( 2.9) = 2 

CEIL is a function that takes a numeric input and produces the smallest integer that is greater than or equal to the 

input value. Thus: 

CEIL (-4.7) = -4 

CEIL ( 2 ) = 2 

CEIL ( 2.1) = 3

PPL: Functions and System Variables 6.3 

6.4 Exponential and Trigonometric Functions 

The SQRT function yields the square root of a number. The square of a number is obtained using the numeric 

operator ** (see the first PPL chapter). The LOG and LOG10 functions yield the natural and common logarithms 

of values, to base e and base 10, respectively. The EXP (exponential) function raises e to the value given as its 

argument (“undoing” the effect of the LOG function). Similarly, raising 10 to the value produced by LOG10 “undoes” 

that function. 


LOG ( 12094.5 ) 9.40051 

EXP ( 9.40051 ) 12094.5 

LOG10 ( 12094.5 ) 4.08259 

10 ** 4.08259 12094.5 

The SIN, COS and TAN functions yield the sine, cosine and tangent of their numeric argument. The ASIN, 

ACOS and ATAN functions yield the arc sine, arc cosine and arc tangent. Using these functions in conjunction 

with the numeric operators permits calculation of a variety of trigonometric expressions. 

6.5 The Factorial Function 

The FACTORIAL function yields the factorial value of the argument. This is often shown as N!. The argument 

should be a non-negative integer. If the argument is zero, the result is one. If the argument is an integer from 1 

through 169 or so, the result is the product of integers from one through that argument. 


FACTORIAL (0) 1 

FACTORIAL (5) 120 

FACTORIAL (169) 0.4269068E305 

FACTORIAL (200) Missing 1 (the result would be too large) 

FACTORIAL (-12) Missing 3 (argument is negative) 

FACTORIAL (3.5) Missing 3 (argument not an integer) 

6.6 Creating Dummy Variables with the LOC Function 

The LOC function yields the location of a variable. Thus, it is slightly different from the other simple functions 

because it is not purely a numeric function. The value it returns is numeric, but the variable given as its argument 

may be a character or numeric one. See the explanation for EXPAND later in this chapter for another way to generate 

several variables from one or more input variables. 


LOC ( Name ) 6 (when Name is the 6th variable) 

LOC ( Age ) 10 (when Age is the 10th variable) 

LOC is often used when the location of a variable, referenced by position, is not known: 

SET V ( LOC ( North.East ) + 1 ) = 100 ; 

The location of the variable named “North.East” plus 1 defines a value; if North.East is the fourth variable in the 

file, that value is 5. This value is the subscript or index of V (the vector of variables in the file) and V(5) is set to 

100. 

In Figure 6.1, four variables are created, one for each of the possible values of Region. These variables are 

set to 0 or 1 depending on the value of Region for that case. This is sometimes referred to as creating dummy


variables, a technique often used in setting up data for regression or analysis of variance. With only four variables, 

it may be easier to understand what is being done if you use: 

rather than: 

[ IF Region EQ 1, SET North.East = 1 ; 

IF Region EQ 2, SET North.West = 1 ; 

IF Region EQ 3, SET South.East = 1 ; 

IF Region EQ 4, SET South.West = 1 ] 

SET V ( LOC ( North.East ) + Region - 1 ) = 1; 

However, with many more variables and corresponding IF statements, the use of this calculated expression becomes 

more desirable. 

When a variable is created with GENERATE, it takes the next position at the right end of the file. Thus, it is 

easy to calculate the location of each variable in turn, if the location of the first new one is known. Since there are 

three variables in file Regional, the variable North.East will be in position four. The LOC function returns the 

location of a variable. Therefore LOC ( North.East ) has a value of 4: 

Region LOC (North.East) V (LOC (North.East) + Region-1) 

1 4 4 + 1 - 1 = 4 

2 4 4 + 2 - 1 = 5 

3 4 4 + 3 - 1 = 6 

4 4 4 + 4 - 1 = 7 

When Region is 4, then the variable in position 7, South.West, is set to 1. 

__________________________________________________________________________ 

Figure 6.1 Calculating Variable Positions 

FILE Regional: 

Age Sex Region 

52 1 1 

31 2 2 

65 1 3 

27 2 4 

LIST Regional 

[ GENERATE North.East = 0, GENERATE North.West = 0, 

GENERATE South.East = 0, GENERATE South.West = 0; 

SET V ( LOC ( North.East ) + Region - 1 ) = 1 ] $ 

North North South South 

Age Sex Region East West East West 

52 1 1 1 0 0 0 

31 2 2 0 1 0 0 

65 1 3 0 0 1 0 

27 2 4 0 0 0 1 

__________________________________________________________________________


If the value of Region is outside the expected range, an error condition could occur, or the value of some other 

existing variable could be changed. The use of an AMONG test ensures that the value of Region will be used only 

if it is non-missing and between 1 and 4: 

IF Region AMONG (1 TO 4), 

SET V ( LOC ( North.East ) + Region - 1 ) = 1; 

When a calculation might produce a value other than an integer, the INT or ROUND function may be used: 

IF Region AMONG (1 TO 4), 

SET V ( LOC ( North.East ) + INT ( Region ) - 1 ) = 1; 

The creation of dummy variables may be simplified if the original order of the variables does not need to be 

preserved. KEEP rearranges the new variables after they are created: 

GENERATE North.East = 0, GENERATE North.West = 0, 


KEEP .NEW. Age Sex Region; 

SET V (Region) = 1; 

When Region is a 3, using V(region) is equivalent to using V(3). Note the use of ".NEW." a system variable which 

refers to all the variables created in the current command. 

6.7 Creating a Single Variable from Dummy Variables 

Sometimes the data are already entered as a series of variables coded 0 and 1, and you would like to, in effect, 

“undummy” them; that is, you would like to create a new variable which has its value based on the location of the 

one variable in the series which has a value of 1. Given cases in a file such as this case: 

North North South South 

Age Sex East West East West 

52 1 1 0 0 0 

These PPL clauses create the new variable Region: 

GENERATE Region = .M1.; 

DO #J USING North.East TO South.West; 

IF V(#J) EQ 1, SET Region = #J + 1 - LOC(North.East); 

ENDDO; 

The DO loop scratch variable is“#J. #J takes on the values 3, 4, 5 and 6 as the DO loop is processed. The 

LOC of North.East always has a value of 3. If V(3) is 1 when #J = 3, the new variable Region is set to 1: 

3 (#J) + 1 (a constant) - 3 (location of North.East) = 1 

If V(4) = 1 when #J = 4, Region is set to 2 which is: 

and so on. 

4 (#J) + 1 (a constant) - 3 (location of North.East) = 2 

Again, the calculations of position may be simplified by using a second scratch variable in the DO loop: 

GENERATE Region = .M1.; 

DO #J #N USING North.East TO South.West; 

IF V(#J) EQ 1, SET Region = #N; 

ENDDO; 

#N takes on a value which corresponds to the number of times through the loop. Thus it will be a 1 when the 

current DO is positioned at North.East and a 4 when it is positioned at South.West.


6.8 LIST FUNCTIONS 

These functions evaluate a list of variables given as their arguments. The variables may be referenced by names 

or positions, or a combination of both. Ranges of variables and wildcards may be included in the list. The numeric 

list functions are: 

MAX ( vnp list ) maximum value of variables 

MAX.GOOD ( vnp list ) 

MEAN ( vnp list ) mean of variables 

MEAN.GOOD ( vnp list ) 

MIN ( vnp list ) minimum value of variables 

MIN.GOOD ( vnp list ) 

SDEV ( vnp list ) standard deviation of variables 

SDEV.GOOD ( vnp list ) 

SUM ( vnp list ) sum of variables 

SUM.GOOD ( vnp list ) 

The list functions that evaluate either numeric or character arguments are: 

COUNT.GOOD ( vnp list ) number of non-missing values 

FIRST.GOOD ( vnp list ) value of first non-missing var 

LAST.GOOD ( vnp list ) value of last non-missing var 

These function can be used quite generally: 

GENERATE Check = 0; 

IF MIN ( Test1 TO Test8 ) EQ MAX ( Test1 TO Test8 ), 

SET Check = 1; 

GENERATE Average = MEAN.GOOD ( Test1 TO Test8 ); 

6.9 Numeric List Functions 

The arguments for the numeric list functions are enclosed in parentheses. Individual variable names and positions, 

wildcards, and ranges of variables may be specified. 

The numeric list functions may be suffixed with “.GOOD” to specify that they apply only to good (non-missing) 

values. “.GOOD” may be abbreviated to “.G” if desired. The difference between the function MEAN and 

MEAN.GOOD is that MEAN gives the mean of all the variables in the list, whereas MEAN.GOOD gives the 

mean of only the good variables in the list. If MEAN is used and any one of the variables in the list is missing, 

the result is missing. If MEAN.GOOD is used and any of the variables in the list is missing, the mean is computed 

using only whatever good values are available. 

A teacher computing final grades could use the function MEAN and give students who have not completed 

all tests a missing or incomplete grade. Given this file, 

FILE Students: 

MidTerm.1 Final.1 MidTerm.2 Final.2 

2 3 4 4 

3 - 2 1 

these instructions compute both the mean of all values and the mean of non-missing values:


LIST Students [ 

GENERATE Average.Good = 

MEAN.GOOD ( MidTerm.1 TO Final.2 ); 

GENERATE Average.All = 

MEAN ( MidTerm.1 TO Final.2 )] $ 

MidTerm Final MidTerm Final Average Average 

.1 .1 .2 .2 Good All 

2 3 4 4 3.25 3.25 

3 - 2 1 2.00 - 

A doctor looking at average blood pressure readings for his or her patients might use MEAN.GOOD, which uses 

only the available good information. SUM.GOOD, MAX.GOOD, MIN.GOOD, and SDEV.GOOD all use only 

the good data and ignore the missing values: 

GENERATE Low.Score = MIN.GOOD ( Test10 TO Test15, V(33) ) ; 

6.10 Character and Numeric List Functions 

The COUNT.GOOD, FIRST.GOOD and LAST.GOOD functions detect or count non-missing data. The arguments 

for these functions may be character or numeric variable name and position lists. However, numeric and 

character values cannot be combined in one list. 

COUNT.GOOD yields a numeric value: 

IF COUNT.GOOD ( Course.1 TO Course.8 ) 

NOTAMONG ( 4 TO 6 ), SET Special = 1; 

FIRST.GOOD and LAST.GOOD yield the value of either the first or last non-missing variable in the argument 

list. This may be either a character or numeric value: 

GEN Last.Course:C = 

LAST.GOOD ( Course.1 TO Course.8 ); 

Thus, when a variable is being generated or recoded, its data type must agree with that of the value returned by 

LAST.GOOD. 

The FIRST and LAST functions access either the first or last cases in a file, or the first and last cases in subgroups. 

These across-case functions are explained in the next PPL chapter, and they are also described briefly in 

the summary in this chapter. 

6.11 SPECIAL FUNCTIONS 

Most of the special functions require two arguments. The first is the actual argument for the function. his is followed 

by a second argument that provides extra information and controls how the function operates. The special 

functions and their arguments are: 

CHAREX ( expression, mask ) 

COMBINATIONS ( expression, expression ) 

DIF ( expression, constant ) 

LAG ( expression, constant ) 

MOD ( expression, constant ) 

NCOT ( expression, instructions )


NUMEX ( expression, mask ) 

PLACES ( expression, constant ) 

RECODE ( expression, instructions) 

NCOT and RECODE, which are discussed in the second PPL chapter, are typical of these functions: 

SET Age = RECODE ( Age, 91 TO 99 = 90 ); 

GENERATE Coded.Age = NCOT ( Age, 20, 90/5 ); 

The first argument may be a simple or complex expression which, when resolved, is a value. The second argument 

provides additional instructions for evaluating the function. 

6.12 The LAG and DIF Functions 

LAG and DIF access a variable value in a prior case. These functions are used in econometrics, as well as in other 

fields. The LAG function “lags” back a specified number of cases to obtain a value of a given variable to use in 

the current case. The variable name and the number of cases to lag back are necessary: 

GENERATE Gross.Last.Month = LAG (Gross.Profit, 1); 

In this example, the variable Gross.Last.Month is generated from the variable Gross.Profit one case back. Each 

case represents a month’s values here. 

__________________________________________________________________________ 

Figure 6.2 Using LAG and DIF 

TITLE 'Gross Profit (in Thousands of Dollars)' $ 

LIST Acct84 [ 

GENERATE Gross.Last.Month = LAG (Gross.Profit, 1); 

GENERATE Difference.1 = DIF (Gross.Profit, 1); 


GENERATE Two.Month.Gross = LAG (Gross.Profit, 1) + Gross.Profit; 

KEEP Month Gross.Profit .NEW. ], MAX.PLACES 1 $ 

Gross Profit (in Thousands of Dollars) 

Gross Gross Last Difference Difference Two Month 

Month Profit Month .1 .2 Gross 

1 4.8 - - - - 

2 5.1 4.8 0.3 - 9.9 

3 4.9 5.1 -0.2 0.1 10.0 

4 5.7 4.9 0.8 0.6 10.6 

5 6.2 5.7 0.5 1.3 11.9 

6 5.6 6.2 -0.6 -0.1 11.8 

__________________________________________________________________________ 

The LAG function’s arguments are: 1) a name of a numeric variable or an expression that provides the location 

of a numeric variable, and 2) a positive integer constant (not exceeding 500) that indicates the number of cases 

to lag back. The new variable’s values in the initial cases in the file are set to missing type one. To do a lag on a 

character variable use the CLAG function described in the chapter “Modification of Character Variables”.


The DIF function finds the difference between a variable’s value in the current case and that variable’s value 

in a prior case. The variable name (or expression) and the number of cases back to find the comparison variable 

value are required: 



Here, the variables Difference.1 and Difference.2 are generated and set equal to the difference in Gross.Profit this 

month (the current case) and last month and the month before that (one case back and two cases back). 

The DIF function’s arguments are: 1) a variable name or expression, and 2) a positive integer constant (not 

exceeding 500) that indicates the number of cases back in which to find the comparison value. The new variable’s 

values in the initial cases are set to missing type 1. Thus, DIF is very similar in operation to LAG. Figure 6.2 

illustrates the results obtained in various usages of LAG and DIF. Note that it is also easy to get sums, products, 

quotients, and so on, by using LAG and DIF in conjunction with other arithmetic operations. 

LAG and DIF work with the cases they get from any preceding PPL. This means that you will almost never 

use these functions within an IF. Only the cases which have a true value on the IF in Figure 6.3 will be input to 

the LAG/DIF function. Thus variable If.1 is only set when the IF statement is true. The first time that the IF is 

true there is nothing in the lag buffer so variable If.1 is set to missing and the lag buffer is set to the current value 

of var1, a 3.. The second time that the IF is true, variable If.1 is set to 3, the value stored in the lag buffer. The 

lag buffer now contains the value 5 which is used the next time there is a true result for the IF. 

___________________________________________________________________________ 

Figure 6.3 Interaction of LAG and IF 

File Work 

var1 var2 

1 3 

2 4 

1 5 

1 6 

2 7 

MODIFY work [ GEN If.1; GEN No.If; 

IF var1 = 1 SET If.1 = LAG ( var2, 1 ); 

SET No.If = LAG ( var2, 1 ); ], 

OUT work2 $ 

File Work2 

No 

var1 var2 If.1 If 

1 3 - - 

2 4 - 3 

1 5 3 4 

1 6 5 5 

2 7 - 6 

___________________________________________________________________________ 

6.13 Modular (Remainder) Arithmetic 

MOD is a function that returns the remainder after a constant has been divided into the value of the first expression. 

(This is often referred to as modular arithmetic.) The first expression usually points to a numeric variable; the


second argument must be numeric. If Age is 25, then MOD (Age, 7) is 4, the remainder after all the possible 7's 

are removed. 

The following examples illustrate the results returned by the MOD function: 


MOD ( .75, .5 ) .25 

MOD ( 1, .3 ) .1 

MOD ( 6, 1 ) .0 

MOD ( 6, 2 ) .0 

MOD ( 6, 4 ) 2.0 

MOD ( 12, 7 ) 5.0 

MOD ( .M., 3 ) - 

MOD may be used to construct patterns for retaining cases: 

IF MOD ( .N., 3 ) EQ 1 OR 

MOD ( .N., 7 ) EQ 1, RETAIN; 

This instruction tests the case number (.N.) and, if it is 1, 4, 7, 8, 10, 13, 15, 16, and so on, retains the case. 

6.14 Setting PLACES in Specific Variables 

The function PLACES requests a specific number of decimal places for specified numeric variables. The function 

sets the selected variable to the desired number of places before the file is passed to any commands, such as the 

LIST command. Thus, the function PLACES may operate on one particular variable, and the subsequent use of 

LIST identifiers, such as MIN.PLACES or MAX.PLACES, may then affect all of the variables in the listing (including 

the one already modified by the PLACES function). The following example, which uses the output file 

from the command T.TEST with the PLACES function and the LIST identifier MAX.PLACES, illustrates this. 

The number of decimal places of the variable named T.Prob is set to 2, and then the file is passed to the LIST 

command: 

LIST TTests 

[ SET T.Prob = PLACES (T.Prob, 2) ], MAX.PLACES 3 $ 

The identifier MAX.PLACES requests that the maximum number of decimal places for all of the variables be limited 

to 3. The listing produced will have three decimal places (if the data has that many places) for all of the 

variables, except for T.Prob, which will have only two places. 

The PLACES function requires two expressions in parentheses: 1) the argument that is the name of the variable 

whose places are to be set, and 2) an integer from 0 to 9 that specifies the number of decimal places in the 

fractional portion of the number, counting from the decimal point. 

Note: the result of a PLACES function is usually less accurate than the input value, because information beyond 

the requested number of places has been dropped. 

6.15 Extracting Digits Using NUMEX 

Specific digits may be extracted from numeric variables to yield a new numeric value. The NUMEX function 

operates only on the integer portion of a numeric value; any sign and fraction portion are ignored. 

NUMEX requires two arguments, a numeric expression and a character string mask composed only of X's 

and 0's and enclosed in quotes: 

GEN Engine.Num = NUMEX ( Serial.Num, 'X0XXX0' );


The selection mask is made up of X and 0 (zero) characters and may be up to nine characters in length. An X 

retains (extracts) a digit and a 0 drops (ignores) a digit. The mask is aligned with the right-most digit of the numeric 

value. 

Lead zeros are not retained in the output numeric value. The following examples illustrate NUMEX: 


( 984601, 'XXX' ) 601 

( 80742 , 'XXXX' ) 742 

( 10065 , 'X00X' ) 5 

The CHAREX function is similar to NUMEX. It extracts specific digits from a numeric value, but CHAREX 

yields a character representation of the digits, in which lead zeros are preserved. CHAREX is explained further 

in the final PPL chapter. 

6.16 COMBINATIONS of N things, K at a time 

COMBINATIONS (n,k) returns the number of different ways that K things can be taken from N things; i.e., N 

things K at a time. For example, combinations (5,2) is 10, namely, 1:2, 1:3, 1:4, 1:5, 2:3, 2:4, 2:5, 3:4, 3:5 and 4:5. 

N should be an integer from 1 to 6,000. K should be an integer from 0 to 60, but not more than N. If the result 

would be too large, missing 1 is returned. If an argument is invalid, missing 3 is returned. 

The function is defined as N! divided by the product of K! and (N-K)!. However, the actual computation is 

done by a series of integer divisions, cancelling out terms, until the denominator is all ones. The result is the product 

of the remaining values in the numerator. 


COMBINATIONS( 6,0 ) 1 




COMBINATIONS( 46,6 ) 9,366,819 (the NJ lottery odds) 

COMBINATIONS( 6000,60) 0.4368755E145 

COMBINATIONS( 20,.7) Missing 3 (invalid argument) 

6.17 EXPAND ONE OR MORE VARIABLES 

EXPAND is a PPL statement that projects the values of one or more input variables into a set of new variables, 

each associated with a specified value in the input variables. In it’s simplest usage, EXPAND uses one input variable 

to create a group of new zero/one variables. Each case begins with the new variables set to zero. Then, if the 

value on the input variable is one of the specified values, the associated new variable is set to one. 

The input variable can be numeric or character. The output variables are always numeric. These new variables 

are sometimes called “dummy” variables. Several input variables can be expanded together. The output variables 

can then be set to: 

1. one if ANY of the input variables has the associated value. This is the default. 

2. the NUMBER of input variables that have the associated value. 

3. a one (1) if the first input variable has the associated value, otherwise a two (2) if the second input 

variable has the value, and so on. In other words, the RANK of the value.


Suppose variable BREAD is coded 1 through 4, with 1 meaning rye, 2 meaning wheat, 3 meaning raisin and 4 

meaning white. The PPL statement 

[ expand bread, values 1:4, gen rye wheat raisin white ] 

will generate four new variables named RYE, WHEAT, RAISIN and WHITE. If a given case has a 1 on BREAD, 

the value for RYE for that case will be set to one and the other three to zero. If BREAD is two, the second new 

variable, WHEAT, is set to one and the rest to zero, and so forth. 

The new variables are placed after the last current variable. In the VALUES phrase, either 1:4 or 1 TO 4 could 

have been used, they mean the same thing. 

6.18 Overall Syntax of a PPL EXPAND Statement 

An EXPAND statement consists of phrases, separated by commas. Some phrases are just a single word, others 

are more extensive. Three of these phrases are required. They start with EXPAND, VALUES and GENERATE. 

EXPAND is followed by the names of the variables to be expanded. There is usually just one variable, but there 

can be more. If several, they must be either all numeric or all character. The EXPAND phrase comes first, the 

order of the rest of the phrases doesn’t matter. 

[ EXPAND crust, ... 

[ EXPAND first.top second.top third.top, ... 

VALUES is followed by integers if the EXPAND variables are numeric, or by quoted character strings is the 

input is character. 

6.19 Numeric Input Values 

If the input variables are numeric, the values to be tested should be integers from 0 to 9999. Ranges can be used, 

they are indicated by TO or a colon (:). If some values and/or ranges are placed within parentheses, they will all 

be mapped into a single output variable. 

An output variable is created: 

1. for EACH integer outside of parentheses, and 

2. for each parenthesis structure. 

VALUES 1 3:6 9, makes six output variables. 

VALUES 1 9, makes two output variables. 

VALUES 1 TO 9, makes nine output variables. 

VALUES 0:9, makes ten output variables. 

VALUES 1 (3 5:8) 4, makes three output variables. 

6.20 Character Input Values 

If the input variables are character, the values to be tested should come in quotes, either ‘xxx’ or “xxx”. The default 

is to ignore leading blanks, trailing blanks and case. Thus ‘ Ohio ‘ is equivalent to ‘ohio’. 

VALUES (‘nj’ ‘new jersey’) ‘ohio’ ‘virginia’, 

The above example creates 3 output variables, since there is one set in parentheses, and 2 standalone values. The 

first output variable is set to 1 when EITHER ‘nj’ or ‘new jersey’ is found. 

6.21 The GENERATE or GEN phrase 

[ EXPAND topping.1 topping.2, VALUES 1:5 9, GEN top.* ] 

[ EXPAND region, VALUES 1:4, GEN east west north south] 

GENERATE provides the names for the variables being created. This can be done in two ways, prefix or full 

names.


GENERATE prefix.* 

GENERATE name name name 

A prefix like crust.* can be provided. Given 

[EXPAND varname, VALUES 1:3, GENERATE vvv.* ] 

the new variables will be named vvv.1, vvv.2, and vvv.3 . Given 

[EXPAND varname, VALUES 7 5 2, GENERATE vvv.* ] 

the new variables will be named vvv.7, vvv.5, and vvv.2 . 

A prefix can be used in a character expand. The quoted values are used to complete the names of the new 

variables. If (‘nj’ ‘new jersey’) or such is supplied, the first element is used, in this case ‘nj’. 

Alternatively, a name can be supplied for each value. Given [EXPAND varname, VALUES 1:3, GENER- 

ATE aaa bbb ccc] the new variables will be named aaa, bbb, and ccc, with aaa representing the value 1 and so 

forth. Given 

[EXPAND varname, VALUES 7 5 2, GENERATE aaa bbb ccc] 

the new variables will be named aaa, bbb, and ccc, with aaa representing the value 7 (because 7 was the first value, 

and aaa was the first name). In a character expand, the first test value is associated with the first output name, and 

so on. 

6.22 Options With Several Input Variables 

*The default, when two input variables have the same value, is to simply set the associated output variable to 1. 

1. ADD, causes an output variable to show the NUMBER of input variables that have that value. 

2. RANK, causes an output variable to show the ORDER of the input variable that is the first to have 

that value. By “order” we mean its position in the EXPAND phrase. Consider 

[EXPAND var1 var2 var3, values 1:5, gen xxx.*]. 

If a case has a 3 on both var1 and var2, the default is to simply set xxx.3 to 1. If ADD is in use, xxx.3 

would be 2, the count of input variables that have that value. 

Suppose RANK is in use and a case has 2, 4 and 5 on input variables var1, var2 and var3, having 

used VALUES 1:5. That would cause xxx.2 to be set to 1, xxx.4 to 2, and xxx.5 to 3. Why is xxx.5 

set to 3 ? Because the initial 5 was found in the third input variable. 

3. NEED 2, This sets the number of non-missing input values that are needed for the output variables 

to be non-missing. The default is one. Thus, if all of the expand input is missing for a case, the default 

is for the result variables to be set to missing for that case. 

“NEED 0” can be used. This causes the result variables to be non-missing, no matter what the input 

is. “NEED 0” can be used when there is just one input variable. 

Suppose “NEED 2” is used when there are 3 input variables. Any case with less than 2 non-missing 

input values will be given missing result values. 

6.23 Options When the Input Variables Are Character 

[ EXPAND region, VALUES ‘north’ ‘south’ ‘east’ ‘west’, 

GEN region.*, EXACT, NO TRIM ] 

The default is to ignore leading blanks, trailing blanks and case. Thus ‘ Ohio ‘ is equivalent to ‘ohio’.


1. EXACT, Causes the case used in the VALUES quoted constants to be matched exactly. If VAL- 

UES ‘East’ ‘South’ ‘West’ were used, a case with ‘east’ would by default match the first value. 

However, if EXACT is in use, only ‘East’ would match it. 

2. NO TRIM, Causes lead and trailing blanks used in the VALUES quoted constants to be matched 

exactly. 

The default does the compares using left-justified copies of both the test values (from the VALUES phrase) 

and the data from the EXPAND variables in the current case. In other words, lead blanks are trimmed before comparing. 

If NO TRIM is used, the lead blanks are meaningful. Trailing blanks don’t matter in any event. 

__________________________________________________________________________ 

Figure 6.4 EXPAND Example 

File xxx has one variable and four cases. 

crust 

1 

3 

9 

-- 

LIST xxx [ EXPAND crust, VALUES 1:5, GEN crust.* ]$ 

produces 

crust crust.1 crust.2 crust.3 crust.4 crust.5 

1 1 0 0 0 0 

3 0 0 1 0 0 

9 0 0 0 0 0 

-- - - - - - 

___________________________________________________________________________ 

In Figure 6.4, note the 9 in case 3. A non-missing input value is ignored when it does not match anything in the 

VALUES phrase. Suppose there is only one input variable. If a case has a value of zero on that variable when 

VALUES 1 TO 7 was used, the new variables for that case are all zero. 

6.24 SYSTEM VARIABLES 

System variables are defined and set by P-STAT as a run is processed. Typically, the values of system variables 

may not be changed by users, but they may be accessed, tested and assigned to other variables. The names of 

system variables are surrounded by decimal points. This distinguishes system variables from user variables, 

whose names must begin with a letter. 

General system variables are discussed in the following sections. Numeric constants (.e., .PI.) are in the summary 

at the end of the chapter. Other system variables used in across-case modifications are described elsewhere 

in this manual. 

6.25 Referencing Good and Missing Data 

.G. is the system variable for good data and .M. is the system variable for missing data of any type. .M1. indicates 

missing type 1, .M2. indicates missing type 2, and .M3. indicates missing type 3. Combinations such as .M13. for 

types 1 and 3 can also be used. 

The system variable .G. tests for good (non-missing) data:


IF Name EQ .G., RETAIN; 

The system variables for missing data are used both to test for missing and to set values to one of the types of 

missing: 

IF Age EQ .M., SET Age = .M2. ; 

IF Age LT 3 , SET Age = .M3. ; 

When .M. is specified as a consequence, it is treated as if it were .M1. (missing type 1). When .M. is specified 

as a test, it is treated as if it were any of the three types of missing. In this example, a case is deleted if the value 

of variable Age is any of the three types of missing: 

IF Age EQ .M., DELETE ; 

Note that when an IF clause tests for missing or good values, it produces only a true or false result. 

The system variable .M. and the equal-sign operator can be combined into the operator MISSING. Both of 

these instructions produce the same results: 

IF Age MISSING, DELETE ; 

IF Age EQ .M. , DELETE ; 

Similarly, .G. and the equal-sign may be combined into the operator GOOD. These are the same: 

IF Income GOOD, RETAIN; 

IF Income EQ .G., RETAIN; 

Note that the system variables .G. and .M. are values, and thus may be used with the equal-sign operator, but that 

GOOD and MISSING are operators already. 

6.26 Selecting Variables with .NEW. and .OTHERS. 

.NEW. and .OTHERS. reference variables concisely in variable selection clauses. .NEW. is used after KEEP and 

DROP to refer to all new variables created within a command: 

GENERATE Medicare.Amt = .80 * Approved.Amt ; 

GENERATE Patient.Amt = Approved.Amt - Medicare.Amt; 

KEEP Patient.ID TO Approved.Amt .NEW. ; 

Only the specified variables and the two new variables are kept. .NEW. may be used in KEEP and DROP clauses 

with or without other variable names. 

.OTHERS. is used in KEEP clauses as a shortcut to rearranging variables. .OTHERS. refers to all other variables 

in the file not explicitly specified in the KEEP clause: 

KEEP Patient.ID Code.Num .OTHERS. 

Billed.Amt TO Approved.Amt ; 

This clause keeps all of the variables in the file, but reorders them as specified. 

6.27 Referencing the Number of Variables in the File 

.NV. is the system variable for the number of variables in the file at a given time. This value changes as KEEP, 

DROP, GENERATE, SPLIT and COLLECT statements are processed. The following example illustrates another 

solution to the problem of creating a series of dummy variables, discussed earlier in this chapter. 

GENERATE Number.Vars = .NV., 

GENERATE North.East = 0, GENERATE North.West = 0, 


SET V (Number.Vars + Region) = 1;


As each case is read, a new variable Number.Vars is generated equal to the number of variables in the file. This 

number includes the variable being created: 

Number North North South South 

XA XB XC Region Vars East West East West 

1 2 2 2 5 0 1 0 0 

2 - 3 4 5 0 0 0 1 

Number.Vars is 5. Thus, V ( Number.Vars + Region ) is V(7) or North.West when variable Region is 2, and V(9) 

or South.West when variable Region is 4. 

6.28 Referencing the Current Case Number 

.N., .HERE. and .USED. are system variables that refer to case numbers. .N. is equal to the current input case 

number (after any case selection). This value is increased every time a case is read, even though that case may be 

deleted and not passed to the current command. .HERE. is the number of cases that have been retained — that 

have actually been passed to the command up to the point when .HERE. is processed. .USED. is the number of 

cases that have been used after the completion of all PPL. These are cases that are passed to the current command 

preceding the PPL. The three values are the same when no cases have been deleted. 

.N. provides an easy way to delete individual cases by using their positions in the file: 

IF .N. AMONG ( 31, 100 TO 105, 399 ), DELETE; 

The next instruction retains the first 98 cases in the file and makes them available to any subsequent PPL clauses 

and to the current command: 

IF .N. LT 99, RETAIN; 

However, the case reader continues to read through the rest of the file, testing each case against the value of .N. 

Thus, case selection is more economical: 

CASES 1 TO 98; 

The diagonal elements of a square matrix may be set to 1 easily with .N. and a DO loop: 

DO #J USING 1 .ON.; 

IF .N. EQ #J, SET V(#J) = 1; 

ENDDO; 

This is often useful when working with matrices. 

.HERE. is set to the number of cases that have been processed by the PPL clause in which .HERE. is found. 

If no cases have been deleted prior to that PPL clause, .HERE. is the same as .N. , the current input case number. 

.USED., which may be abbreviated to .U., is set after all cases are processed by all PPL clauses. It is the count of 

all cases not deleted by any logical tests. .USED. is the same as .N. when no cases have been deleted in any of the 

PPL clauses. 

Figure 6 .5 illustrates the differences between .N., .HERE. and .USED. Each case in the output file contains 

three new variables. Input.Case.No is the sequence number in the input file, Student.No is the sequence number 

of students, and Output.Case.No is the sequence number in the output file. Student.No and Output.Case.No have 

gaps in the number sequence, indicating cases that were not students and cases that were students with missing 

tests, respectively.


__________________________________________________________________________ 

Figure 6.5 Showing the Differences Between .N., .HERE. and .USED. 

File ABU: 

Test Test Test 

Status .1 .2 .3 

student 95 99 94 


non-mat 78 86 89 

student 67 - 69 


LIST ABU 

[ GENERATE Input.Case.No = .N. ; 

GENERATE Output.Case.No; 

GENERATE Student.NO ] 

[ 

IF Status NE 'Student', DELETE ; 

SET Student.No = .HERE. ; 

IF ANY ( Test? ) MISSING, DELETE ; 

SET Output.Case.No = .USED. ] $ 

Input Output 

Test Test Test Case Student Case 

Status .1 .2 .3 No No No 

student 95 99 94 1 1 1 

student 87 81 93 2 2 2 

student 87 88 90 5 4 3 

__________________________________________________________________________ 

When files are concatenated on-the-fly and the same modifications are applied ( * ), case counting (.N., 

.HERE. and .USED.) continues as if the files were a single file: 

MODIFY File1 

[ GENERATE Input.Case.No = .N.; GENERATE Output.Case.No; 

IF Test4 MISSING, DELETE ; 

SET Output.Case.No = .HERE.] 

+ File2 [ * ], OUT File12 $ 

6.29 Referencing Numeric and Character Variables 

.NUMERIC. is the list of all numeric variables in a file. Similarly, .CHARACTER. is the list of all character variables 

in the file. Both of these system variables are used in KEEP or DROP clauses. 

A selection of numeric variables may be appropriate for recoding and for input to some commands:


SURVEY Streams 

[ KEEP .NUMERIC. ; 

DO #J USING Stream1 TO Stream7; 

IF V(#J) MISSING THEN; 

SET V(#J) = 0; 

ELSE; 

SET V(#J) = NCOT ( V(#J), 25, 50, 75 ); 

ENDIF; 

ENDDO ] ; 

STUBS Stream1 TO Stream7 $ 

A selection of character variables may be useful in mapping (recoding) character values. 

MAP River 

[KEEP .CHARACTER. ], VAR Station1 TO Station7, OUT RiverMap $ 

Either system variable reorders variables: 

MODIFY Class88 

[ KEEP ID Name .NUMERIC. .OTHERS. ], OUT Class88 $ 

6.30 Accessing the PUT Counter 

Each time that a case is read, .PUT. is set to 0. If the PUT or PUTL instructions are evoked (see the prior chapter), 

.PUT. is increased. After all the modifications for a given case are done, .PUT. can be tested to see whether the 

PUT logic produced printed text. In this example, several checks for mutually inconsistent data are made. If inconsistencies 

are found, an explanatory statement is printed: 

MODIFY InFile 

[ IF Age LT 18 AND Veteran GT 0, 

PUT 'Inconsistent values of Age and Veteran for ' 

First.Name Last.Name ; 

IF Age LT 15 AND Married GT 0, 

PUT 'Inconsistent values of Age and Married for ' 

First.Name Last.Name ; 

IF .PUT. GT 0, RETAIN ], 

OUT To.Check $ 

The PUT counter .PUT. is increased, and those records with inconsistencies are retained for further examination 

in a new file, To.Check. 

6.31 File, Date, Page and Line References 

.FILE. is the system variable that refers to the current P-STAT system file. It can be used to pass the filename to 

the TITLE command or as an argument in the FIRST and LAST functions. FIRST and LAST are usually used for 

processing groups of related cases. When FIRST and LAST are used with .FILE. as the argument, they test for 

the beginning and end of the file: 

[ IF FIRST (.FILE.), GENERATE #Children = 0 ; 

IF Age LT 16, INCREASE #Children ; 

IF LAST (.FILE.), RETAIN ; 

KEEP School.District #Children ]


The statement “IF FIRST (.FILE.)” is true only when the first case of a file is processed. Similarly the statement 

“IF LAST (.FILE.)” is true only when the last case of a file is processed. FIRST, LAST, and .FILE. have extensive 

uses in across-case data modification and are discussed in detail in the chapter “PPL: Across Case Modifications”. 

.DATE. is the current date. Its value is set when the current command begins. .DATE. is in character form, 

and thus a variable generated or set to .DATE. must be of character data type: 

[ GENERATE Today:C = .DATE. ] 

The exact string produced by .DATE. depends upon the computer on which P-STAT is running. Using .NDATE. 

requests the numeric form of the date. Note: .NDATE. returns a 4 digit year. 

.PAGE. and .TIME. reference the current page number since the command began and the current time when 

the command began. .CPAGE. resets the page number at each command rather than at each run. .RPAGE. sets the 

page number within a run. The page value is numeric. The time value is character in the form “11:34:05” (hours: 

minutes: seconds). .NTIME. requests the numeric values of time (without colons). These four system variables 

are often used in top and bottom titles. Exact, run and command values, as well as numeric and character values, 

may also be requested. 

Manipulating date and time values is covered in a separate chapter. It describes 40 functions and 10 commands 

for formatting date/time values, finding the difference between date/time values, etc.



SUMMARY 

Functions are part of the P-STAT programming language. Function arguments are enclosed in 

parentheses: 

LIST File109 

[ SET Usage = LOG ( Usage ) ; 

SET CPU.Time = PLACES ( CPU.Time, 2 ) ] $ 

PPL Functions: Numeric — Single Expression 

The following functions require a single numeric expression as an argument: 

ABS (exp) 

gives the absolute value of the expression. 

COS (exp) 

gives the cosine of the expression. 

ACOS (exp) 

gives the arc cosine of the expression. 

EXP (exp) 

raises e to the exponent which is the value of the expression. 

FACTORIAL (exp) 

The FACTORIAL function yields the factorial value of the argument. This is often shown as N!. 

FRAC (exp) 

gives the fractional part of the numerical expression. 

INT (exp) 

gives the integer part of the numerical expression. 

LOC (exp) 

gives the location of the variable specified in the expression. The location is the position of the variable 

in the file, counting from the left. 

LOG (exp) 

gives the natural log (base e) of the numerical expression. 

LOG10 (exp) 

gives the common log (base 10) of the numerical expression. 

vnp=var name/position nn=number vn=variable name exp=expression


ROUND (exp) 

rounds the numerical expression to the nearest integer. 

SIN (exp) 

gives the sine of the numerical expression. 

ASIN (exp) 

gives the arc sine of the numerical expression. 

SQRT (exp) 

gives the square root of the numerical expression. 

TAN (exp) 

gives the tangent of the numerical expression. 

ATAN (exp) 

gives the arc tangent of the numerical expression. 

PPL Functions: Numeric — List 

The following functions operate on a list of numeric variables, which may be referenced by name, position, 

ranges, and wildcards. Functions will return missing if any variable is missing unless “.GOOD” is 

the suffix. When that is the case, the result will be based on all non-missing (good) values. At least one 

variable name or position (vnp) is required in the list. 

MAX (vnp list) 

gives the maximum value of the variables in the list: 

[ GEN Larger = MAX ( Length Girth ) ] 

MAX.GOOD (vnp list) 

gives the maximum value of the non-missing variables in the list. 

MEAN (vnp list) 

gives the arithmetic mean of the variables in the list: 

[ GEN Mean.Weight = MEAN ( V(1) .ON. ) ] 

MEAN.GOOD (vnp list) 

gives the arithmetic mean of the non-missing variables in the list. 

MIN (vnp list) 

gives the minimum value of the variables in the list. 

MIN.GOOD (vnp list) 

gives the minimum value of the non-missing variables in the list. 

SDEV (vnp list) 

gives the standard deviation of the variables in the list. 

vn=variable name exp=expression vnp=var name/position nn=number


SDEV.GOOD (vnp list) 

gives the standard deviation of the non-missing variables in the list. 

SUM (vnp list) 

gives the sum of the variables in the list: 

GEN Score = SUM ( Test? ) / 3 ; 

SUM.GOOD (vnp list) 

gives the sum of the non-missing variables in the list. 

PPL Functions: Numeric — Special 

The following special functions require one expression that describes the domain of the function (usually 

a variable) and one or more extra arguments, depending on the particular function. 

COMBINATIONS (exp, exp ) 

COMBINATIONS (n,k) returns the number of different ways that K things can be taken from N things; 

i.e., N things K at a time. For example, combinations(5,2) is 10, namely, 1:2, 1:3, 1:4, 1:5, 2:3, 2:4, 2:5, 

3:4, 3:5 and 4:5. 

DIF (exp, nn) 

gives the difference between the current value of the numeric variable designated in the expression and 

the value of that variable nn cases back: 

GEN Difference.2 = DIF ( Gross.Profit, 2 ) ; 

The number nn must be a positive integer constant not exceeding 500. 

LAG (exp, nn) 

gives the value of the numeric variable, designated in the expression, nn cases back: 

GEN Gross.Last.Yr = LAG ( Gross.Profit, 1 ) ; 

The number nn, the number of cases to “lag” back, must be a positive integer constant not exceeding 500. 

MOD (exp, nn) 

gives the remainder after the numeric expression has been divided by the positive constant (nn): 

SET Time.Hours = MOD ( Ship.Time, 12 ) ; 

This is sometimes called modular arithmetic. 

NCOT (exp, n-chotimization instructions) 

recodes the numeric variable specified in the expression according to the instructions given in the second 

argument: 

GEN Age = NCOT ( Age, 10, 20, 30, 40 ) ; or 

GEN Age = NCOT ( Age, 10, 40/10 ) ; 

Both the preceding instructions do an N-way dichotomization or division of the variable values. All values 

of age less than or equal to 10 become 1, those less than or equal to 20 become 2, and so on up to 

values of 40, which become 4. Above 40 becomes a 5. 



NUMEX (exp, 'XX00') 

extracts specific digits from a numeric variable value and yields a numeric representation of those digits. 

NUMEX operates only on the integer portion of the number — any fractional portion and sign are ignored. 

The two required arguments are a numeric expression and a character string mask enclosed in 

quotes: 

GEN Month = NUMEX (Date, 'XX00' ) ; 

The selection mask is composed of X and 0 (zero) characters and may be up to nine characters in length. 

An X retains a digit and a 0 drops a digit. The selection mask is aligned with the right-most digit of the 

numeric value. Lead zeros are not retained in the output number. Thus, the selection mask “XX00X” 

applied to “156” yields the number 6. The character function CHAREX may be used if lead zeros are 

needed in the result. 

PLACES (exp, nn) 

sets the variable specified in the numeric expression to the number of places specified by the second argument, 

which must be a positive integer not greater than 9. 

GEN ##N = 1.2345 $ 

PUT ##N > ( PLACES ( ##N, 1 )) > (PLACES ( ##N,3 )) $ 

produces the following line: 

1.2345 1.2 1.235 

PPL Functions: Character and Numeric 

COUNT.GOOD (vnp, vnp) 

gives the number of non-missing values in the list of expressions. Only variable names or positions may 

be in the list. 

FIRST.GOOD (vnp, vnp) 

gives the value of the first non-missing variable in the list of expressions. Only variable names or positions 

may be in the list. 

GEN Date = FIRST.GOOD (Date.1 TO Date.4) ; 

LAST.GOOD (vnp, vnp) 

gives the value of the last non-missing variable in the list of expressions. Only variable names or positions 

may be in the list. 

FIRST (.FILE. or vn) 

is evaluated as true if it is the first case in the subgroup specified in the expression, and false if it is not 

the first case. The required expression is a variable name (vn) or a list of up to 5 variables, or the system 

value .FILE. (meaning the current file): 

IF FIRST (Grade, Sex), INC #Counter ; 

Changing values of the variable or variables define different subgroups. 

LAST (.FILE. or vn) 

is evaluated as true if it is the last case in the subgroup specified in the expression, and false if it is not 

the last case. The required expression is a variable name (vn) or a list of up to 5 variables, or the system 



value .FILE. (meaning the current file). Changing values of the variable or variables define different 

subgroups. 

RECODE (exp, recode instructions ) 

recodes the character or numeric variable specified in the expression according to the instructions given 

in the second argument: 

SET Height = 

RECODE ( Height, 0 TO 65 = 1, 65.1 TO 100 = 2, G = 3) ; 

All values of height from 0 through 65 become 1, and all values from 65.1 through 100 become 2. Any 

other GOOD values become 3. (See the fourth PPL chapter for a full explanation of RECODE.) 

PPL System Variables 

System variables are variables that are defined and set by P-STAT. Their names are enclosed between 

decimal points to distinguish them from user-defined variables. P-STAT automatically sets the values of 

the system variables as a run progresses. Usually the values may not be changed by users, but they may 

be accessed, tested and assigned to other variables. (System variables used especially in titles are further 

described in the TITLES chapter.) 

.CHARACTER. 

.DATE. 

.e. 

.FILE. 

.G. 

.HERE. 

.M. 

is the list of character variables in a file. .CHARACTER. is used in KEEP and DROP selections: 

DROP .CHARACTER.; 

is the current date. Its value is set when the current command begins, and it is in character form: 

GENERATE Today:C = .DATE. ; 

It is equivalent to .CDATE. (the command date). 

is the system value for e, the base of natural logs. It equals 2.718281828. 

is the current P-STAT system file. Its value is the name of that file. It is used as the argument for the 

functions FIRST and LAST, and also in titles. 

is a good or non-missing variable value. It tests whether good data is present in an expression: 

IF Test.Score EQ .G., RETAIN; 

is the count of the number of cases actually processed thus far by the current PPL clause. 

is a missing or non-good variable value. It is used to test whether missing data is present in an expression. 

.M. refers collectively to all three types of missing; it is the opposite of .G. (above). 



.M1., .M2., .M3. 

.N. 

.NEW. 

.NUMERIC. 

.NV. 

.ON. 

.OTHERS. 

.PAGE. 

.PI. 

.PUT. 

.TIME. 

are missing variable values of three types: MISSING1, MISSING2 and MISSING3. .M1., .M2. and .M3. 

are used for logical testing within an IF phrase and for recoding. 

is the case counter. Its value is the current case number after case (row) selection. 

are all variables newly generated in all PPL clauses in the current phrase. Its value is all of the names of 

these new variables. .NEW. is used in KEEP and DROP selections: 

GEN Average = MEAN.GOOD ( Value? ) ; 

GEN Total = SUM.GOOD ( Value? ) ; 

KEEP ID .NEW. .OTHERS. ; 

is the list of numeric variables in a file. .NUMERIC. is used in KEEP and DROP selections: 

KEEP .NUMERIC. ; 

is the current number of variables in the file. 

is used in case and variable selection and in DO loops to indicate from here onward through the last case 

or variable: 

DO #J USING 1 .ON. ; 

IF V(#J) GOOD, SET V(#J) = V(#J)/10 ); 

ENDDO; 

are all variables other than those explicitly referenced in a KEEP or DROP selection. It is used in reordering 

variables: 

KEEP SS.Number Department .OTHERS. Final.Grade; 

is the current page number since the command began. It is equivalent to .CPAGE. .RPAGE. is the current 

page number since the run or P-STAT session began. 

is the system value for pi. It equals 3.141592654. 

is the PUT counter. Its value is the number of times PUT was invoked in the current case. 

is the current time. Its value is set when the current command begins, and it is in character form: 

GENERATE Time:C = .TIME.; 

It is equivalent to .CTIME. (the command time). 



.USED. 

is the number of cases used after all PPL clauses are processed. The count does not include cases that 

are deleted because of logical tests. 

Other Date and Time System Variables. 

The system variables .DATE. and .TIME. may be prefaced with N, X, R or C: 

.NDATE. .NTIME. 

.XDATE. .NXDATE. .XTIME. .NXTIME. 

.RDATE. .NRDATE. .RTIME. .NRTIME. 

.CDATE. .NCDATE. .CTIME. .NCTIME. 

The N specifies the numeric form of the date or time, rather than the character form. The X specifies the 

exact date or time when the system variable is processed. The R specifies the run date or time — when 

the current run began. The C specifies the command date or time — when the current command began. 

The numeric form of exact, run, and command dates or times may also be specified. The dates and times 

are printed as they are represented in the computer system on which P-STAT is being used. 

Note: The numeric forms of the date now all have the year returned as 4 digits in preparation for the year 

2000. 


7 

Random Number and 

Distribution Functions 

This chapter covers three different groups of functions: random number functions; distribution functions and functions 

which can be used to handle the “fuzzy equals” problem.l 

7.1 RANDOM NUMBER FUNCTIONS 

The PPL functions, RANNORM, RANUNI, RANBIN and RANTABLE, generate random (“pseudo” random) 

numbers from, respectively, the normal distribution, the uniform distribution, the binomial distribution and a user's 

tabled distribution. The random numbers may be used for many purposes, such as generating random data, 

selecting a random subset of cases from a file or assigning cases to either a control or experimental treatment. Examples 

illustrating these tasks follow the basic explanations. 

In a normal distribution, the random numbers are normal deviates (“standard scores”) that range from -6 

through +6 and the probability of obtaining specific values depends on the area under the normal curve. In a uniform 

or rectangular distribution, the random numbers range from zero through one and the probability of obtaining 

any value equals the probability of obtaining any other value. (The random numbers do not include the exact values 

zero and one.) 

In a binomial distribution, the random numbers are observations from a binomial distribution with the specified 

order — that is, they are integers that range from 0 to the order of the binomial distribution. The probability 

depends on the likelihood of the possible observations and the probability of a single event ( a “win”), which is 

assumed to be .5 unless another probability is supplied. In a user's tabled distribution, the random numbers are 

observations (integers) that range from one to the order of the distribution specified by the user. The probability 

of the various observations is also specified by the user. 

The arguments for any of the random number functions are: 1) an initial seed control argument, 2) three optional 

scratch variables, and 3) any function specific arguments. The initial argument controls how the seed 

functions that prime the random number generator are obtained. NOTE: the arguments are initialized at the beginning 

of the command. Except for the PPL command this is when the first case is processed. A BRANCH in 

a macro back to a location outside of the command which is generating the numbers causes the arguments to be 

re-initialized. Possible first argument values are: 

0 different seed values obtained from the current date and time are used 

-1 same default seed values are used every time the function is used 

-3 three seed values are supplied by the user as the next three arguments 

When 0 is specified, three seed values obtained from the current date and time are used to start the number 

generator. The seed values and the random numbers they generate differ each time: 

RANNORM ( 0 ) 

When -1 is specified, three default seed values are used — they are the same each time one of the random number 

functions is used: 

RANUNI ( -1 ) 

The argument -1 is used only when the same “random” values are desired. This may be the case when a specific 

procedure involving random numbers must be repeated exactly. When -3 is specified as the first argument, three

7.2 Random Number and Distribution Functions 

seed values should be supplied as the next three arguments. The values should be three constants that are integers 

between 1 and 30,000: 

RANNORM ( -3, 912, 4508, 7 ) 

Three scratch variables may be given as the next arguments for any of the random number functions: 

RANUNI ( 0, #S1, #S2, #S3 ) 

When three scratch variables are supplied, the final seed values are saved as the values of the scratch variables. 

Thus, a subsequent run can use these values as starting seeds and continue a progression. The scratch variables 

should be generated prior to using them. (See the second RANTABLE example in the final paragraph of this section.) 

Finally, any function specific arguments follow — only RANBIN and RANTABLE require these. Here, 

the “2” is the order of the binomial distribution: 

RANBIN ( -1, 2 ) 

7.2 Normal and Uniform Distributions 

The RANNORM function may be used to generate a file of random numbers with a specific mean and standard 

deviation. First, a file with one case is built: 

MAKE Random, VAR Random.Number ; 

- $ 

Then, that file is modified to produce the desired number of cases and to set the values to random numbers. The 

REPEAT instruction repeats the one case 100 times: 

MOD Random [ 

REPEAT 100 ; 


OUT RandomX $ 

The RANNORM function generates a standardized random number, a “Z-score” with mean 0 and standard deviation 

1. That number is multiplied by the desired standard deviation and then added to the desired mean. 

The RANUNI function is often used to select a random sample of cases from a file. This command selects a 

random subset of one third of the original cases in file Subjects: 

MOD Subjects [ 

GEN #Temp EQ RANUNI (0) ; 

IF #Temp LT .333334, RETAIN ], OUT Sub.3 $ 

These instructions do sampling with replacement — they select a random sample of five cases from a file of 100 

cases (the same case could be selected more than once): 

MOD FileA [ 

GEN #N = MOD (.N., 100) + 1 ; 

IF #N EQ 2, GEN #R = ( RANUNI (0) * 100 ) + 1 ; 

IF #N EQ INT (#R), RETAIN ] 

+ FileA (*) + FileA (*) + FileA (*) + FileA (*), 

OUT FileB $ 

The scratch variable #N (a pseudo case number) is generated equal to the MOD of the case number plus one to 

get numbers running from 2 to 100 followed by 1. (The actual case numbers run from 1 to 500 when the files are 

concatenated using the “+” operator. After the MOD function, they run from 1 to 99 followed by 0.) 

When the first case of the file is processed (that is, when #N = 2), a random number between zero and one is 

generated. It is multiplied by 100 and one is added to it. (Random numbers exactly equal to zero or one are not 

generated. By multiplying by 100 and adding one, the range of the random numbers shifts from 0-to-1 to 1-

Random Number and Distribution Functions 7.3 

through-100.) If #N equals the integer value of the random scratch variable, the case is selected. The file is read 

four more times and the same instructions are executed each time. 

7.3 Binary and User's Tabled Distributions 

The RANBIN function could be used to assign cases to either a control or an experimental treatment group. This 

command does this: 

MOD Expermt5 [ 

GEN #Bin = RANBIN ( 0, 2 ) ; 

IF #Bin EQ 1, SET Group = 'C', F.SET Group = 'E' ], 

OUT Expermt5 $ 

#Bin is generated equal to a random observation from an order 2 binomial distribution — that is, from a binomial 

distribution that contains the integers 0, 1 and 2 in these proportions .25, .5, and .25. (You could think of this as 

the distribution obtained when tossing two coins. Zero heads are observed 25% of the time, one head 50% of the 

time and two heads 25% of the time, when the probability of obtaining a head in a single toss is .5.) When RAN- 

BIN returns a 1, which it does half the time, a case is assigned to the control group; when it returns a 0 or 2, it is 

assigned to the experimental group. 

The RANTABLE function is similar to RANBIN, except that the probabilities are set by the user. This 

command: 

MOD Expermt6 

[ SET Group = RANTABLE ( 0, 1, 2, 2 ) ], OUT Expermt6 $ 

assigns cases to one of three groups, with the probability of assignment to group one being 1/5, group two 2/5 and 

group three 2/5. The arguments for RANTABLE after the initial seed control argument give the number of values 

in the distribution and the proportions in which they are observed. In this example, there are three function arguments 

(1, 2, 2), so there are three values in the distribution (1, 2 and 3). The sum of the arguments divided by the 

value of a single argument gives the proportion of that value in the distribution. For example, 1 / (1 + 2 + 2) is 

1/5, which is the proportion of the total observations that are ones. 

This command does the same task the prior command does, but it sets the seed values with the three constants 

following the -3 and saves them in the supplied scratch variables: 

MOD Expermt6 [ 

GEN #A = .M., GEN #B = .M., GEN #C = .M. ; 

SET Group = 

RANTABLE ( -3, 657, 1469, 20078, #A, #B, #C, 1, 2, 2 ) ; 

IF LAST ( .FILE. ), 

PUT #A > #B > #C ], 

OUT Expermt6 $ 

The initial argument of -3 for RANTABLE specifies that the initial seed values are supplied as three constants 

The three scratch variables follow. The constants and scratch variables come directly after the initial seed control 

argument and before the function specific arguments. Alternatively, the three scratch variables could be generated 

equal to the three initial seed values and those constants could be omitted from the RANTABLE arguments. 

7.4 DISTRIBUTION FUNCTIONS 

Distribution or probability functions return the area under a distribution from the lower tail of the distribution to 

the specified critical value. The area is the probability that a random value falls below this critical value. Subtracting 

this value from one yields the significance level for a one-tailed test — that is, the percentage of the 

distribution in the upper tail. To obtain the significance level for a two-tailed test, subtract the probability from 

one and multiply by two:


( 1 - PROBNORM ( ABS (nn), df) ) * 2 

Inverse probability functions return the critical value corresponding to the probability or area under the distribution 

that is supplied as the function argument. The critical value is the value that must be obtained for 

significance at one minus the supplied probability. 

7.5 Probability Distributions 

The probability functions may have expressions as their arguments. The expressions should reduce to one or more 

arguments appropriate for the function. These are the probability functions and their arguments: 

1. PROBBIN ( nn, n, p ) Binomial Distribution 

computes the probability that a variable from a binomial (Bernoulli) distribution with probability p 

and size or degree n is less than or equal to the first argument nn: 

PROBBIN ( 4, 10, .5 ) = .376953125 

This is the probability of getting four or fewer tails in ten tosses of a coin. (This is the same as the 

probability of getting six or more heads.) The probability of a single value is the difference between 

two successive values: 

PROBBIN ( 4, 10, .5 ) - PROBBIN ( 3, 10, .5 ) = .205078125 

This is the probability of getting exactly four tails in ten tosses of a coin. (This is the same as the 

probability of getting exactly six heads.) 

2. PROBCHI ( nn, df ) Chi-square Distribution 

computes the probability that a random variable from a chi-square distribution with degrees of freedom 

df is less than the specified argument: 

PROBCHI ( 31.264, 11 ) = .999 

Degrees of freedom must be an integer. 

3. PROBF ( nn, df1, df2 ) F Distribution 

computes the probability that a variable from an F distribution with numerator degrees of freedom 

df1 and denominator degrees of freedom df2 is less than the specified argument: 

PROBF ( 3.32, 2, 30 ) = .950170464 

Degrees of freedom may be a whole or fractional number. 

4. PROBNORM ( nn ) Normal Distribution 

computes the probability that a random variable from a normal distribution is less than the specified 

argument: 

PROBNORM ( -1.96 ) = .02499789530314 

The critical value -1.96 is significant at the .025 level for a one-tail test and at the .05 level for a twotail 

or non-directional test. 

The argument for PROBNORM should be a deviate from a normal distribution with a mean of zero 

and standard deviation of one — that is, a standard score between -6 and +6. 

5. PROBPOIS ( nn, lambda ) Poisson Distribution 

computes the probability that a variable from a Poisson distribution is less than or equal to the first 

argument. Lambda is the mean of the distribution. The mean in this example is 1.12 — it is the


number of defects per length of material: 

PROBPOIS ( 2, 1.12 ) = .896355852 

PROBPOIS ( 2, 1.12 ) - PROBPOIS (1, 1.12 ) = .204642687 

.8964 is the probability of finding two or fewer defects in a length of material. .2046 is the probability 

of finding exactly 2 defects. 

6. PROBT ( nn, df ) t Distribution 

computes the probability that a random variable from a t distribution is less than the first argument 

nn when degrees of freedom equal the second argument df. This is the probability that a random 

variable is less than 2.179: 

PROBT ( 2.179, 12 ) = .975008377 

The significance level for a two-tail test is 1 minus the probability times 2: 

(1 - PROBT ( 2.179, 12 ) ) * 2 ) = .049983245959 

A critical value of 2.179 is significant at the .025 level for a one-tail test and at the .05 level for a 

two-tail test (.025 in each tail) when the degrees of freedom are 12. 

The first argument for PROBT should be a deviate or critical value from student's t distribution with 

a mean of zero and standard deviation of one. The degrees of freedom may be a whole or fractional 

number. 

7.6 Inverse Probability Distributions 

The inverse probability functions may have expressions as their arguments. However, the expressions should reduce 

to one or more arguments appropriate for the function. These are the inverse probability functions and their 

arguments: 

1. INVBIN ( nn, n, p ) Inverse Binomial Distribution 

INVBIN.RT ( nn, n, p ) Inverse Binomial Distribution — Right Tail 

returns the observation from the binomial distribution with probability p and size or degree n whose 

area is nn: 

INVBIN ( .38, 10, .5 ) = 4 

INVBIN.RT ( .38, 10, .5 ) = 6 

Approximately 38% of the time, when tossing 10 coins, you will get 4 or fewer tails and 6 or more 

heads. INVBIN.RT returns an observation from the right tail of the binomial distribution. INVBIN 

is the inverse of the PROBBIN function. 

2. INVCHI ( nn, df ) Inverse Chi-Square Distribution 

returns the critical value from the chi-square distribution with degrees of freedom df and whose area 

is the argument nn: 

INVCHI ( .999, 11 ) = 31.2641339 

INVCHI is the inverse of the PROBCHI function. 

3. INVF ( nn, df1, df2 ) Inverse F Distribution 

returns the critical value from the F distribution with degrees of freedom df1 and df2 whose area is 

the argument nn:


INVF ( .95, 2, 30 ) = 3.315829544 

INVF is the inverse of the PROBF function. 

4. INVNORM ( nn ) Inverse Normal or Probit Distribution 

returns the deviate or critical value from the normal distribution whose area is the specified argument. 

PROBIT is a synonym: 

PROBIT ( 0.025 ) = -1.959964 

A critical value of -1.96 or less is required for a one-tail test with a significance level of .025, or a 

value of 1.96 or greater is required if a difference in the opposite direction is expected. For a twotail 

or non-directional test with a significance level of .05, a critical value of -1.96 or less or 1.96 or 

more is required (.025 in each of the two tails). 

The argument for INVNORM is an area, measured from the lower tail of the normal distribution, 

that is the probability of obtaining a value less than the calculated deviate. It should be a number 

between 0 and 1. This function is the inverse of PROBNORM. 

5. INVPOIS ( nn, lambda ) Inverse Poisson Distribution 

INVPOIS.RT ( nn, lambda ) Inverse Poisson Distribution — Right Tail 

returns the observation from the Poisson distribution with mean lambda whose area is the argument 

nn: 

INVPOIS ( .9, 1.12 ) = 2 

Approximately 90% of the time, 2 or fewer defects will be found in a unit length of material with 

1.12 defects per unit. INVPOIS.RT returns an observation from the right tail of the Poisson distribution. 

INVPOIS is the inverse of the PROBPOIS function. 

6. INVT ( nn, df ) Inverse t Distribution 

returns the critical value from the t distribution with degrees of freedom df whose area is the argument 

nn: 

INVT ( .975, 12 ) = 2.178812725 

A critical value of 2.179 is required for a one-tail test with a significance level of .025 or a two-tail 

test with a significance level of .05. INVT is the inverse of the PROBT function. 

7.7 THE FUZZY EQUALS PROBLEM 

The internal representation of fractional decimal numbers in a binary computer can be exact for numbers (like .5 

or .75) that can be expressed as sums of reciprocals of powers of two. This is true up to a point: .5 + 1/2**53 is 

accurate on a pentium chip (which uses 53 bits to represent the fractional part), but .5 + 1/2**54 and beyond would 

not be accurately represented. 

Most fractional numbers however cannot be represented accurately. Computation involving them is consequently 

approximate. It is quite possible for two different sequences of calculation that ‘should’ produce the same result 

to instead produce results that differ slightly, perhaps by one bit, sometimes by several. 

For example, consider this P-STAT statement. 

IF .1 + .2 EQ .3, PUT ‘YES’, F.PUT ‘NO’ $ 

This ought to say YES, but on a Pentium PC it says NO because they are not quite the same: a HEX display of the 

result of adding .1 and .2 is one bit different from a HEX display of .3, and a one-bit difference prevents an equal 

result. This is not a P-STAT effect: exactly the same thing occurs in a trivial C or Fortran 95 program.


There may be situations when a FUZZY compare rather than an EXACT compare is appropriate. An exact compare 

returns equal only when the two numbers being compared are exactly the same. A fuzzy compare would 

accept as equal two numbers that are VERY close. The question is: how close ? 

Logical operators like EQ and GT now have optional extensions like EQ.2 or GT.5 which cause the compare to 

be fuzzy. For example, using EQ.2 rather than just EQ will treat two numbers as equal if they are no more than 

two steps apart. 

We use ‘step’ to mean moving from a given 64-bit double-precision floating-point number to the next representable 

number. An upwards step from 0.1 is slightly more than 0.1, a downwards step is slightly less. 

A step can best be seen by using HEX notation. The HEX representation of the 64-bit value 0.1 is 3FB9 9999 

9999 999A. Each HEX character represents 4 bits; the characters 0-9 and A-F are used to show the 16 possible 

forms of 4 bits. Note: the actual 64-bit internal representation of 0.1 may differ slightly on computers using differing 

chips and compilers. 

The ending ‘A’ shows that the last 4 bits of 0.1 are 1010. The value one STEP.UP from 0.1 would be one bit 

greater; in this case it would have the same initial 15 bytes, and the final byte would be 1011, one bit more. A 

step affects the 15th or 16th significant digit on a Pentium type of chip. For example, it takes 2 steps to go 

from 30.11122233344411 

to 30.11122233344412 which differs in the 16th decimal digit. 

7.8 The Fuzzy Functions 

Four new functions have been added to manipulate such numbers. 

1. HEX ( number ) produces the HEX representation of the input in a character*16 

result. 

2. STEP.UP ( number, n ) produces the number that is N steps up from the input value. The 

second argument, the number of steps, can be from zero to 9999. If 

omitted, it defaults to one. 

3. STEP.DOWN( number, n ) produces the number that is N steps down from the input value. The 

second argument, the number of steps, can be from zero to 9999. If 

omitted, it defaults to one. 

4. STEPS ( nn1, nn2 ) produces the number of steps from the smaller of NN1 and NN2 to 

the larger. Missing 3 is returned if more than one million steps separate 

the arguments. 

put ( HEX( .1 ))$ is ‘3FB999999999999A’ 

put ( HEX( STEP.UP(.1 )))$ is ‘3FB999999999999B’ 

put ( HEX( STEP.UP(.1, 2)))$ is ‘3FB999999999999C’ 

put ( STEPS ( STEP.DOWN(.1), STEP.UP(.1) ))$ is 2 

7.9 Fuzzy Logical Operators 

There are 6 logical operators: GT, GE, EQ, NE, LE and LT. GT means greater than, EQ means equals, and 

so forth. 

There are also 6 eXact versions: XGT, XGE, XEQ, XNE, XLE and XLT. XEQ causes the compare of character 

values to be case-specific, whereas EQ is case-independent. For numeric compares, EQ and XEQ will by default 

do exact (non-fuzzy) compares. However, the EQ and GT type of operators can be directed to do fuzzy compares. 

For numeric compares, the EQ operators can be made to do fuzzy compares in two ways. 

1. EQ.2 or GT.5 or such can be used to cause a fuzzy compare of that many steps. The step part can 

be from 0 to 99, with 0 meaning no steps. EQ.2 is treated as a simple EQ when the compare involves 

character values.


2. FUZZ 5 $ is a new command that causes later use of the EQ type of logical operators to use that 

many steps. It is ignored for character compares, and does not affect the XEQ type of operators. It 

is also ignored for an operator like EQ.3 that already has a specific stepsize. 

In other words, the step count of 3 in EQ.3 has precedence over any current FUZZ command setting. Fuzz 0 $ 

would turn it off. 

7.10 How Fuzzy Operators Work 

Consider 

IF aaa EQ.2 bbb. 

The above test will be true whenever AAA is either equal to BBB or within 2 steps of BBB (it does not matter 

which is the larger). 

The following 5 lines would do exactly the same thing: 

Consider 

IF STEP.DOWN(aaa, 2) XEQ bbb or 

STEP.DOWN(aaa ) XEQ bbb or 

aaa XEQ bbb or 

STEP.UP (aaa ) XEQ bbb or 

STEP.UP (aaa, 2) XEQ bbb 

IF aaa GT.5 bbb. 

It is first determined if AAA and BBB are ‘equal’, which in this case means no more than 5 steps apart in either 

direction. Since the GT test is true only when (1) AAA is greater and (2) they are not equal, AAA must be more 

than 5 steps greater than BBB for a true result to occur. 

In the first of these next two statements, the values being compared are not equal, so a GT result can be true. In 

the second, the GT.1 test has enough fuzz to cause the two values to be considered to be equal, so one cannot be 

greater. 

IF STEP.UP( 999 ) GT.0 999 will be true, 

IF STEP.UP( 999 ) GT.1 999 will be false. 

Thus, aaa GT.5 bbb asks if AAA is more than 5 steps greater than BBB. The other operators work in a similar 

manner. 

7.11 FUZZY Summary 

The GT, GE, EQ, NE, LE and LT logical operators have always done exact compares on numeric values; the default 

has not changed. 

These 6 operators have been extended: EQ.3 for example will return an equal result if the two values being compared 

are separated by no more than 3 steps. A step is the distance from one internally representable number to 

the next one. 

The step part (the .3 in EQ.3) can be from 0 to 99. Using 5 steps should be sufficient to cover random differences. 

NEAR is supported as a more readable form of EQ.5 . Similarly, NOTNEAR means NE.5 . 

A new command, FUZZ 3 $ or such, causes subsequent use of EQ, etc. to do fuzzy compares of that many steps 

automatically. However, this does NOT change an explicitly supplied step like GT.0 or EQ.1 . 

Using FUZZ 2 $ or such might be useful when pages of PPL are involved and you want to quickly see if fuzz 

makes a difference. 

XGT, XGE, XEQ, XNE, XLE and XLT can still be used in numeric compares. They always do an exact (nonfuzzy) 

compare. In other words, XEQ and EQ.0 are the same.


SUMMARY 

PPL Functions: Numeric — Random Numbers 

The number of arguments for the random number functions depends on how they are used. There may 

be from one to three types of arguments: 1) a required initial seed control argument, 2) three optional 

scratch variables, and 3) any function specific arguments. The initial seed control argument is one of 

these constants: 0, -1 or -3. When it is 0, three seed values from the current date and time are used to 

start the random number generator. When it is -1, three default seed values that are the same every time 

are used. When it is -3, three constants to be used as the initial seed values should follow. 

Three scratch variables may be supplied next. When they are supplied, the final seed values are saved 

as the values of the scratch variables. They may be used as initial seeds at a future time to continue a 

progression. Any function specific arguments come last. 

RANBIN (nn, nn, nn, nn, #vn, #vn, #vn, nn, p) 

generates random observations from a binomial distribution with the order and probability specified as 

the right-most arguments. When the probability is .5, it need not be given: 

[ GEN Obs = RANBIN (0, 2) ; 

IF Obs EQ 1, SET Group = 1, F.SET Group = 2 ] 

The GEN instruction generates observations from a binomial distribution of order 2 and probability .5 — 

that is, with the integers 0, 1 and 2 in the proportions .25, .5 and .25. For example, this is the distribution 

of heads (or tails) obtained when tossing two coins. The IF statement tests the value of the random number 

and assigns group membership, with 50% in each group. 

RANNORM (nn, nn, nn, nn, #vn, #vn, #vn) 

generates random numbers from the normal distribution: 

GEN Random = (RANNORM (0) * 2.5) + 43.6 ; 

The random numbers are standard scores that range from -6 through +6 and the probability of obtaining 

specific values depends on the area under the normal curve. The example above generates random numbers 

with a standard deviation of 2.5 and a mean of 43.6. 

RANTABLE (nn, nn, nn, nn, #vn, #vn, #vn, nn, nn, nn) 

generates random observations from a user's tabled distribution. The values and the probabilities of each 

are given as the right-most arguments: 

GEN Section = RANTABLE (0, 15, 5, 10, 20) ; 

This instruction generates the random section numbers 1, 2, 3 and 4 because four arguments are supplied 

(not counting the initial seed control argument). They are generated in the following proportions: 15/50 

= .3, 5/50 = .1, 10/50 = .2 and 20/50 = .4. (The arguments are summed to get the total, and the value of 

each argument is the proportion of the total desired for that value.) 

RANUNI (nn, nn, nn, nn, #vn, #vn, #vn) 

generates random numbers from a uniform distribution. The random numbers range from zero to one 

and the probability of obtaining any value equals the probability of obtaining any other value. The result 

can be multiplied by a constant to change the range of the generated values. A random subset of cases 

may be selected using RANUNI:


GEN #Random EQ RANUNI (-1) ; 

IF #Random LE .7, RETAIN ; 

These instructions do the same things as the previous ones, but they also set and save the seed values: 

GEN #A = .M., GEN #B = .M., GEN #C = .M. ; 

GEN #Random = RANUNI (-3, 257,25,8004, #A,#B,#C ) ; 

IF #Random LE .7, RETAIN ; 

IF LAST (.FILE.), PUT #A ' ' #B ' ' #C ; 

PPL Functions: Numeric — Probability 

The following probability functions require one or more expressions as their arguments. Each expression 

should reduce to the argument appropriate for the function. 

PROBBIN (nn, n, p) 

computes the probability that a variable from a binomial (Bernoulli) distribution with probability p and 

size or degree n is less than or equal to the first argument nn (that is, has nn or fewer successes in n trials). 

The probability of a single value is the difference between two successive values: 

PROBBIN ( 4, 10, .5 ) - PROBBIN ( 3, 10, .5 ) = .205078125 

.205 is the probability of getting exactly four tails in ten tosses of a coin. 

PROBCHI (nn, df) 

computes the probability that a random variable from a chi-square distribution with degrees of freedom 

df is less than the specified argument. Degrees of freedom must be an integer. 

PROBF (nn, df1, df2) 

computes the probability that a variable from an F distribution with numerator degrees of freedom df1 

and denominator degrees of freedom df2 is less than the specified argument. 

PROBNORM (nn) 

computes the probability that a random variable is less than the specified argument. The argument should 

be a deviate from a normal distribution with a mean of zero and standard deviation of one — that is, it 

should be a standard score between -6 and +6. This is the probability that a random variable from a normal 

distribution is less than 1.96: 

PROBNORM ( 1.96 ) = .975002105 

For the significance level of a two-tail test, multiply 1 minus the probability of the absolute value of the 

deviate by 2: 

( 1 - PROBNORM ( ABS( -1.96) ) ) * 2 ) = .04999579060628 

PROBPOIS (nn, lambda) 

computes the probability that a variable from a Poisson distribution is less than or equal to the specified 

argument. Lambda is the mean of the distribution. 

PROBT (nn, df) 

computes the probability that a random variable is less than the first argument when degrees of freedom 

equal the second argument. The first argument should be a deviate from student's t distribution with a 

mean of zero and standard deviation of one. The degrees of freedom may be a whole or fractional 

number.


PPL Functions: Numeric — Inverse Probability 

The following inverse probability functions require one or more expressions as their arguments. Each 

expression should reduce to the argument appropriate for the function. 

INVBIN (nn, n, p) 

returns the observation from the binomial distribution with probability p and size or degree n whose area 

is nn: 

INVBIN ( .38, 10, .5 ) = 4 

Approximately 38% of the time, when tossing 10 coins, 4 or fewer will be tails. INVBIN.RT returns an 

observation from the right tail of the distribution. INVBIN is the inverse of the PROBBIN function. 

INVCHI (nn, df) 

returns the critical value from the chi-square distribution with degrees of freedom df whose area is nn. 

INVCHI is the inverse of the PROBCHI function. 

INVF (nn, df1, df2) 

returns the critical value from the F distribution with degrees of freedom df1 and df2 whose area is the 

argument nn. INVF is the inverse of the PROBF function. 

INVNORM (nn) 

returns the deviate or critical value from the normal distribution whose area is the specified argument. 

The area, measured from the lower tail of the normal distribution, is a number between 0 and 1 that is the 

probability of obtaining a value less than the calculated deviate. PROBIT is a synonym for INVNORM: 

PROBIT ( .95 ) = 1.644853628 

A critical value of approximately 1.64 is required for a significance level of 5% for a one-tail test. This 

function is the inverse of PROBNORM. 

INVPOIS (nn, lambda) 

returns the observation from the Poisson distribution with mean lambda whose area is nn. INVPOIS is 

the inverse of the PROBPOIS function. INVPOIS.RT returns an observation from the right tail of the 

distribution. 

INVT (nn, df) 

returns the critical value from the t distribution with degrees of freedom df whose area is nn. INVT is 

the inverse of the PROBT function. 

PPL Functions: Fuzzy Numeric 

HEX ( nn ) 

produces the HEX representation of the input in a character*16 result. 

STEP.UP ( nn, n ) 

produces the number that is N steps up from the input value. The second argument, the number of steps, 

can be from zero to 9999. If omitted, it defaults to one.


STEP.DOWN ( nn, n ) 

produces the number that is N steps down from the input value. The second argument, the number of 

steps, can be from zero to 9999. If omitted, it defaults to one. 

STEPS ( n, n ) 

produces the number of steps from the smaller of NN1 and NN2 to the larger. Missing 3 is returned if 

more than one million steps separate the arguments. 

EQ / NE / LT / LE / GT / GE 

EQ.2 or GT.5 or such can be used to cause a fuzzy compare of that many steps. The step part can be from 

0 to 99, with 0 meaning no steps. EQ.2 is treated as a simple EQ when the compare involves character 

values. 

FUZZ 5 $ is a new command that causes later use of the EQ type of logical operators to use that many 

steps. It is ignored for character compares, and does not affect the XEQ type of operators. It is also ignored 

for an operator like EQ.3 that already has a specific step size. 

NEAR is a synonym for EQ. NOTNEAR is a synonym for NE. NEAR and NOTNEAR can be used 

after the FUZZ command has set a fuzz level.

8 


Across-Case Modifications 

Changes and summary statistics on groups of related cases are produced by data modification and aggregation 

across cases. Related cases are groups of cases in a file that is ordered by one or more variables defining group 

membership. For example, cases having the same value of a key variable such as Household.Number could be 

grouped together in the file. They are related by their common values of Household.Number. Across-case modifications 

use: 

• variables that exist across cases to hold accumulated values, and 

• functions to identify particular cases within a group of related cases. 

Scratch variables and the permanent vector permit the incrementing and saving of variables across cases. 

Scratch variables hold either numeric or character values. The permanent vector is referenced with a P(J) notation 

allowing for calculation of the index value. It is created at the beginning of a run and can be used to pass values 

between commands as well as between cases. The P vector holds only numeric values. Multi-dimensional userdefined 

arrays are easier to use when an array is intrinsically multi-dimensional and can be defined to hold either 

character or numeric data. 

PPL, the P-STAT Programming Language, provides functions that identify or manipulate particular cases in 

a subgroup for modification and aggregation: 

• FIRST is true when the current case is the first case in a group. 

• LAST is true when the current case is the last case in a group. 

• SPLIT splits a case into a number of new cases. 

• COLLECT collects a number of cases into one large case. 

Splitting single cases into multiple cases reorganizes data for plotting, t tests, or analysis of variance For example, 

monthly water flow measurements for multiple years may each be split into 12 separate cases, and new 

variables showing the month and year may be created. Splitting also “undoes” collecting. Family or patient cases 

that are collected for modification may be split back into their original cases afterwards. 

Collecting related cases permits subsequent modification of those cases: a family telephone number may be 

corrected for all members of a family, or a diagnosis may be added to all of a patient’s visit records. Collecting 

cases also permits the calculation of statistics that summarize the related cases, such as counts, means, totals and 

others. Total sales may be tallied for all the salesmen in each department, or mean income may be calculated for 

voters in each district. 

Several P-STAT commands also perform data modification and aggregation across related cases. The AG- 

GREGATE and DUPLICATES commands both produce files that contain summary records of a file or subgroups 

within a file. Aggregation and modification using these commands are most appropriate when a file of summary 

information is the desired result. However, if the goal is to join summary information back onto the cases of the 

original data file, COLLATE or LOOKUP must then be used to do a hierarchical join. Using PPL for across-case 

modification and aggregation, as the file is read by a command such as MODIFY or LIST, often saves some extra 

steps.

8.2 PPL: Across-Case Modifications 

8.1 BASIC ACROSS-CASE AGGREGATION 

The FIRST and LAST functions are generally used with scratch variables or the permanent vector P for most basic 

types of aggregation. PPL instructions (such as GENERATE, SET and INCREASE), operators (such as + , * and 

CONTAINS), and functions (such as MEAN, SQRT and TRIM) perform the actual modifications and 

calculations. 

8.2 Accessing FIRST and LAST Cases 

The FIRST and LAST functions determine whether a case is the first or last case in a file or, for cases ordered by 

subgroups, whether a case is the first or last case in the subgroup. FIRST and LAST are used with the system 

variable .FILE. 

to test for the beginning and ending cases of a file. (Any case selection is done before any other PPL, including 

testing for FIRST and LAST cases.) Figure 8.1 illustrates the use of FIRST and LAST. 

__________________________________________________________________________ 

Figure 8.1 FIRST and LAST with Subgroups 

Age Sex 

Given these 1 12 1 

six cases: 2 12 2 

3 12 2 

4 13 2 

5 14 1 

6 14 1 

The following statements are true for... case numbers: 

IF FIRST (.FILE. ), ... 1 

IF LAST (.FILE. ), ... 6 

IF FIRST ( Age ), ... 1, 4, 5 

IF FIRST ( Age, Sex ), ... 1, 2, 4, 5 

IF LAST ( Age ), ... 3, 4, 6 

IF LAST ( Age, Sex ), ... 1, 3, 4, 6 

__________________________________________________________________________ 

The statement: 

IF FIRST ( .FILE. ), 

is true only when the first case of the file is processed. Similarly, 

IF LAST ( .FILE. ), 

is true only when the last case of the file is processed. 

If a file is ordered or sorted by one or more variables, the FIRST and LAST functions determine if a given 

case is the first or last member of the subgroup defined by those variables: 

IF FIRST (Division), GEN #Counter = 0 ;

PPL: Across-Case Modifications 8.3 

The first case of each division satisfies this test. Each time the first case in a division is processed, this IF statement 

is true and the variable Counter is set to zero. The LAST function is similar except that only the last case of a 

subgroup satisfies a LAST test. 

The FIRST and LAST functions are shown accessing cases in subgroups in Figure 8.1. The statement: 

( IF FIRST ( Age, Sex ) 

is true for cases 1, 2, 4, and 5 — each time the value of Age changes or Sex within an Age group changes. (Notice 

that a comma separates the variables defining the subgroups.) 

The FIRST and LAST functions are used primarily with scratch variables and the permanent vector, both of 

which pass information between cases. Scratch variables contain either numeric or character values. Depending 

on how they are created, they may be temporary or permanent. A temporary scratch variable exists only during 

the current command or macro. A permanent scratch variable retains its values across commands. The permanent 

vector exists for the duration of a P-STAT run. The permanent or (P) vector contains only numeric values. 

8.3 Scratch Variables 

Scratch variables may contain numeric or character values. They are created with GENERATE. A “#” (crosshatch 

or pound sign) is the first character in the scratch variable name. A scratch variable that starts with a single crosshatch 

exists for the duration of a single command or macro. A scratch variable that is created with two 

crosshatches exists for the duration of the P-STAT run. A scratch variable that is to contain character information 

must be defined as a character variable when it is generated. Its length, if greater than 40, must be cited: 

GENERATE #Name:C50 = Department.Name; 

GENERATE ##NAME:C50 = Department.Name; 

A scratch variable is not associated with a case; therefore, it has no position in the file. Scratch variables may 

not be used after ANY or ALL, or in list functions such as MEAN, SUM, MIN, MAX and SDEV. 

Scratch variables may be used within a command to hold temporary information: 

GENERATE #Temp = Rdg1 * Rdg2 + SQRT ( F.Factor ), 

GENERATE Result = ROUND ( #Temp / 10 ); 

The scratch variable #Temp breaks up a complex calculation into simpler components, without creating a new 

variable in the file. This calculation could be done in a one PPL clause with nested functions. However, several 

simple statements are more apt to be written correctly than a single complicated statement. 

The major use of scratch variables is across-case modification and aggregation. The scratch variable does not 

automatically change when a new case is read, but only when it is explicitly changed. It is this property that makes 

it useful for passing information across cases in the file. Another frequent use of scratch variables is in the TITLES 

command. 

TITLES 'Study Number #Study.Number' 

The following is an example which uses FIRST and LAST to count the number of cases which have the value 

of 'male’ on variable Sex: 

[ IF FIRST ( .FILE. ), GENERATE #Total.Males = 0; 

IF SEX EQ 'male', INCREASE #Total.Males; 

IF LAST ( .FILE. ), RETAIN ; 

KEEP .OTHERS. #Total.Males ] 

The scratch variable #Total.Males is generated and set equal to zero when the first case in the file is processed. A 

scratch variable remains zero until it is explicitly changed, typically with INCREASE or SET. The IF Sex EQ test 

is done for every case that is processed. When the result of the IF test is true, the value of #Total.Males is increased 

by 1. Each case is tested to see if it is the last case in the file. If it is not, the next case is read. The last case in 

the file is the only case that is retained.


The last case contains all the original variables for that case plus the scratch variable Total.Males. When a 

scratch variable is used in a KEEP instruction, a regular variable with a position in the file is created. The # or ## 

is removed from the variable name and the variable can be referred to in subsequent PPL as Total.Males. 

__________________________________________________________________________ 

Figure 8.2 Using Scratch Variables and FIRST 

File Staff: 

Rank Name Division Salary 

3 Sulley 12 22000 

2 De Jong 13 34300 

2 Swartz 13 27700 

5 Bryan 14 19500 

2 Fernald 12 26500 

3 Widmer 13 25000 

4 Williams 14 21300 

SORT Staff, BY Division Rank, OUT StaffSor $ 

LIST StaffSor 

[ IF FIRST (Division), GEN #Cum.Salary = 0 ); 

INCREASE #Cum.Salary BY Salary; 

KEEP .OTHERS. #Cum.Salary ] , 

CONTROL Division $ 

Cum 

Rank Name Division Salary Salary 

2 Fernald 12 26500 26500 

3 Sulley 12 22000 48500 

2 De Jong 13 34300 34300 

2 Swartz 13 27700 62000 

3 Widmer 13 25000 87000 

4 Williams 14 21300 21300 

5 Bryan 14 19500 40800 

__________________________________________________________________________ 

Figure 8.2 illustrates the use of scratch variables with FIRST to get cumulative salary totals. The file first 

must be sorted or ordered by the variables defining subgroup membership. It may also be sorted by other variables 

of interest. Fernald is the first member of Division 12 after the sort. Thus #Cum.Salary is generated equal to zero 

when his case is read: 

[ IF FIRST (Division), GEN #Cum.Salary = 0; 

#Cum.Salary is then increased by 26,500, the value of Fernald’s Salary: 

INCREASE #Cum.Salary BY Salary; 

The KEEP instruction creates a new variable Cum.Salary. For Fernald’s case, Cum.Salary is set to 26,500, the 

current value of the scratch variable #Cum.Salary: 

KEEP .OTHERS. #Cum.Salary ]


Sulley, the next case after the sort, is also a member of Division 12. Therefore, #Cum.Salary is not reset to 

zero, but it is increased by the value of Sulley’s salary to 48,500. This value, 48,500, is then moved into the 

Cum.Salary variable for Sulley. De Jong is the first case in the next division, so #Cum.Salary is reset to zero, and 

the procedure is repeated. 

__________________________________________________________________________ 

Figure 8.3 Creating a Summary Case with FIRST and LAST 

File Depts: 

Name Department Age Sex Position 

John Jones Hardware 33 m sales 

Jim Smith Hardware 42 m clerk 

Sara Clark Hardware 25 f service 

Gerry Walker Hardware 52 m manager 

Arlene Burns Personnel 29 f secretary 

George Dun Personnel 32 m clerk 

Jane Mason Personnel 43 f manager 

LIST Depts 

[ IF FIRST (Department), GENERATE #Employees = 0 ] 

[ INCREASE #Employees ) ] 

[ IF LAST (Department) RETAIN ; 

KEEP Department #Employees ] $ 

Department Employees 

Hardware 4 

Personnel 3 

__________________________________________________________________________ 

Figure 8.3 also illustrates aggregation using scratch variables. However, only a single summary case is retained 

for each department. The input file is already ordered by Department, so it need not be sorted. 

Each time the FIRST test is true, #Employees is initialized. It is increased as each case is processed. When the 

LAST test is not true the case is deleted, and processing of the next case begins immediately. When the LAST 

test is true, KEEP is used to select variables for the summary case. The result is a report with one line per department 

containing the variables Department and Employees. 

8.4 The Permanent Vector 

The permanent (P) vector holds double-precision numeric values. The length of the P vector, like that of the V 

vector, is the maximum number of variables possible in a P-STAT system file. When a run begins, P-STAT generates 

the P vector with all the values set to missing type 1. New values are placed in the P vector using SET (using 

GENERATE will cause an error), and they remain there until they are changed. The P vector is not re-initialized 

when a new P-STAT command begins. 

The values in the P vector are referenced by position number — P(5) refers to the value in the fifth location 

of the vector. This “subscript” notation permits the use of a variable or an expression as the index. Thus, locations 

in the P vector may be referenced with a DO loop:


DO #J = 1, 8; SET P(#J) = V(#J) ; ENDDO; 

This will set the first eight values in the permanent vector to the first eight values in the current case. The contents 

of the expression denoting which P variable is to be used may be calculated: 

DO #L = 1, 6; 

SET P(#L) = V(#L); 

SET P(#L+6) = SQRT( V(#L) ); 

ENDDO; 

However, the result of #L+6 must be an integer between 1 and the maximum number of variables in a file. 

Figure 8.4 illustrates the use of the permanent vector to move information between files. The initial PRO- 

CESS command is used as a vehicle for PPL; there is no output file. The locations P(1) and P(2) are set to zero 

when the first case in the file is processed. Then, for each case (including the first), if Income is not missing, P(1) 

is increased by 1 and P(2) is increased by the value of Income. Thus, P(1) has the count of cases with good values 

for Income and P(2) contains total Income for all cases. P(1) and P(2) are available to the second MODIFY command, 

permitting the calculation of #Mean.Income. As each case is processed, a simple subtraction produces the 

difference between Income for that case and #Mean.Income. 

__________________________________________________________________________ 

Figure 8.4 Moving Values Between Files with the P Vector 

C 'Get number of people and income totals.' $ 

PROCESS File1 

[ IF FIRST (.FILE.), 

SET P(1) = 0, SET P(2) = 0 ] 

[ IF Income GOOD, 

INCREASE P(1), 

INCREASE P(2) BY Income ] $ 

C 'Now get mean income and income differences.' $ 

MODIFY File1 


GENERATE #Mean.Income = P(2) / P(1) ] 

[ GENERATE Difference = Income - #Mean.Income ], 

OUT File2 $ 

___________________________________________________________________________ 

The permanent vector is also useful for passing a large number of numeric variables across cases within a 

command. Given some number of tests, each with a variable name beginning with “Test”, these instructions will 

get class totals for all the tests: 

[ IF FIRST ( Class ), 

DO #J USING Test?; SET P(#J) = 0; ENDDO; 

DO #J USING Test?; INC P(#J) BY V(#J); 

IF LAST ( Class ), RETAIN; 

DO #J USING Test?; SET V(#J) = P(#J; ENDDO ] 

Given any number of tests in any locations in the file, the P values in corresponding locations are initialized when 

the first case of a class is processed. (The index J uses the positions of all the Test? variables in the file as its 

values.) Each of the P values is increased by the associated test value. 

Each case is then evaluated to determine if it is the last case for a class. If the test result is false, that case is 

not retained, the next case is read and processing resumes with the first PPL statement. If the test result is true,


the PPL continues and the test values are set to the accumulated totals. This final case for the class is the only case 

that is seen by the P-STAT command. 

The choice of whether to use the P vector or scratch variables depends on the number of variables involved, 

whether any are character variables, and the desired tasks. The P vector is usually easier to use when many variables 

are treated the same way, as in initialization: 

DO #Q = 1 TO 16; SET P(#Q) = 0 ; ENDDO; 

Scratch variables may be more convenient when only a few variables or character data are involved: 

GENERATE #T1 = 0, GENERATE #NN:C = ' ' ; 

8.5 User-defined Arrays 

Like scratch variables, arrays they are defined during a run and used in PPL statements. An array can have up to 

7 dimensions, and can be character or numeric. Array names have two characters, the second being the same as 

the first, like XX or cc or Zz. Case doesn’t matter. There can be up to 26 active arrays. 

How do arrays compare to the P vector? The P vector allows N numeric values, where N is the maximum 

number of variables in a file in a given version of P-STAT. This is usually 6,000. The P vector in one-dimensional; 

P(1) through P(N) can be used. Arrays are an improvement over the P vector in 3 ways: 

1. allowing character as well as numeric arrays. 

2. allowing dimensioning like XX(2,1,5). 

3. providing an array buffer (where arrays are placed) that is 3 times larger than the P vector. 

Arrays are defined by using the DEFINE.ARRAY command. 

DEFINE.ARRAY xx (10,30) TO 0 $ 

This command defines XX as a numeric array with 2 dimensions. The first subscript will be 1 to 10, the second 

1 to 30. The 300 array values are initialized to zero. Initialization (the “TO 0” part) is optional; if it is not used 

the values are set to missing type 1. 

A dimensioning using zero or negative integers is allowed; for example: for example, 

DEFINE.ARRAY aa (-4:4, 0:10, -100:0) $ 

A character array is defined by adding a numeric value which is the length for each of the character values in the 

array 

DEFINE.ARRAY KK:12 (2, 10, 101:140) TO ' ' $ 

This defines KK as a character array with 3 dimensions. Each value can hold 12 characters. This is shown by the 

KK:12. The first subscript can be 1 or 2, the second 1 to 10, and the third 101 to 140. The 800 array values are 

initialized to blank. The maximum size of a character value is 50,000 characters. 

Each character value has a status word (to indicate missing or good), followed by the characters of the value. 

A character:4 value uses one array buffer element (as does a numeric value). A character:12 value needs 2 array 

buffer elements, a character:20 value needs 3 elements, and so on. The array buffer is very large and even the 

Whopper II size can hold an array of 6000 C20 elements. 

SHOW.ARRAYS $ 

The show.arrays command displays the names, size (if character), number of dimensions, and defined subscript 

range for each array. 

DROP.ARRAY aa zz pp $ 

This command ends the definition of the indicated array or arrays and releases the array buffer space, making it 

available for other definitions.


DROP.P.VECTOR $ 

This command takes the P vector space and adds it to the array buffer. This allows larger arrays, but ends 

any use of the P vector in the run. 

Suppose we used 'DEFINE.ARRAY xx(3,5)$' and set #n to 2. These 3 (unrelated) standalone PPL statements 

would be valid, as would the nested DO loop. 

SET XX (2, #n ) = 77 $ 

PUT XX (#n, #n-1 ) $ 

IF XX (1, 3 ) LT XX(2,4), SET XX(1,3) to .m1. $ 

DO #j = 1,3; 

DO #k = 1,5; 

SET XX( #j, #k ) = #j * 10 + #k; 

ENDDO; 

ENDDO $ 

__________________________________________________________________________ 

Figure 8.5 DEFINE.ARRAY and SHOW.ARRAYS 

DEFINE.ARRAY xx ( 0:3, 5 ) $ 

DEFINE.ARRAY cc:12 ( 44, 2,2 ) $ 

SHOW ARRAYS $ 

---------Numeric array xx has been defined--------- 

It has 20 values, organized into 2 dimensions. 

The array buffer now has 17,980 unused elements. 

--------------------------------------------------- 

-------Character array cc:12 has been defined------- 

It has 176 values, organized into 3 dimensions. 

The array buffer now has 17,628 unused elements. 

---------------------------------------------------- 

---------------array summary--------------- 

There are 2 user-defined arrays: 

cc:12 ( 44, 2, 2 ) 

xx ( 0:3, 5 ) 

The array buffer contains 18,000 elements. 

372 are in use by existing arrays. 

17,628 are available for array definition. 

------------------------------------------- 

__________________________________________________________________________ 

Once an array is defined it can be used by any command in the same way that the P vector is used. Given a 

file containing at least the following 3 variables: 

Gender coded 1=male 

2=female 

Age coded 1=le 30 

2=31 - 40 

3=Over 40 

Income coded in dollar amounts.


Produce the following report where Group represents one of the 6 possible gender/age groups. 

Group n had the highest average income of $xx,xxx.xx 

This type of question can be solved in a variety of ways. Because there are 6 groups and it is necessary to 

save both the number of cases in each group and the total income of each group across all the cases, arrays provide 

an easy way to handle the data collection. 

__________________________________________________________________________ 

Figure 8.6 One-dimensional Arrays 

DEFINE.ARRAY gg (6) to 0 $ 

DEFINE.ARRAY tt (6) to 0 $ 

GEN ##High = 0; GEN ##Group $ 

PROCESS px1298a [ 

GEN #N = 0; 

DO #A = 1, 3; 

DO #G = 1, 2; 

INC #N; 

IF Age.ban = #A and Gender = #G, 

INCREASE GG(#N), 

INCREASE TT(#N) BY Income; 

IF LAST ( .FILE. ) AND TT(#N) / GG(#N) GT ##High, 

SET ##High = TT(#N) / GG(#N), 

SET ##GROUP = #n; 

ENDDO; 

ENDDO; 

] $ 

PUT "Group " ##Group 

" had the highest average income of $" 

@COMMAS @PLACES2 ##High $ 

__________________________________________________________________________ 

The first two commands in Figure 8.6 define two arrays with 6 elements in each. Array gg is used to accumulate 

the cases for each group while tt is used to accumulate total income. Figure 8.7 illustrates the same solution 

using two-dimensional arrays. In this particular example, the use of the one-dimensional arrays is somewhat easier 

to follow and there is little difference in the amount of code required. 

It would be possible to use a single three dimensional array (3,2,2) to hold all twelve of the values that are 

needed for this particular problem. Such complexity might serve as an exercise in nesting do loops and handling 

scratch variables but would only complicate the solution of a fairly simple problem. 

Multiple dimensions are most useful when the contents of the cells is similar. For example: data on sales on 

5 divisions for 12 months from 6 regions of the country and might best be processed if stored in a single 5,12,6 

array.


__________________________________________________________________________ 

Figure 8.7 Two-dimensional Arrays 

DEFINE.ARRAY gg (3,2) to 0 $ 

DEFINE.ARRAY tt (3,2) to 0 $ 

GEN ##High=0; GEN ##Group $ 

PROCESS Myfile [ 

DO #A = 1, 3; 

DO #G = 1, 2; 

IF Age = #A and Gender = #G, 

INCREASE GG(#A,#G), 

INCREASE TT(#A,#G) BY Income; 

IF LAST (.FILE.) AND TT(#A,#G) / GG(#A,#G) GT ##HIGH, 

SET ##High = TT(#A,#G) / GG(#A,#G), 

SET ##Group = #A + (#G-1) * 3; 

ENDDO; 

ENDDO; 

] $ 

PUT "Group " ##Group 

" had the highest average income of $" 

@COMMAS @PLACES2 ##High $ 

__________________________________________________________________________ 

8.6 Interaction of FIRST, LAST and Other PPL 

The FIRST and LAST functions interact with other PPL instructions in the following manner: 

1. Case selection, such as CASES 11 TO 30, is done first. The rest of the PPL sees only those 30 cases 

and has no idea that they came from a larger file. 

2. An internal FIRST/NOTFIRST and LAST/NOTLAST flag is set for each FIRST or LAST test used 

in the PPL. This is done as soon as the case passes the CASES filter. FIRST(.FILE.), for example, 

is true for the first case processed. That case may have been the eleventh case of the original input 

file. 

3. The PPL for the current case is done now. Because the FIRST and LAST settings for a case are determined 

before other PPL begins, FIRST and LAST testing cannot be done on newly generated 

variables. Also, recoding a FIRST or LAST variable has no effect on the FIRST and LAST settings, 

since those settings are done before recodes occur. 

4. The DELETE and RETAIN instructions should not be used until all FIRST and LAST tests are 

complete. 

To summarize, FIRST and LAST logic is based on the pre-PPL values of only those cases that remain after any 

case selection. 

The last portion of this section on basic across-case modification gives two detailed examples that use scratch 

variables, the P vector and the FIRST function, along with IF tests, DO loops, the PUT instruction and the PUT 

counter (.PUT.). The examples integrate the various PPL procedures covered thus far in handling realistic problems 

encountered in data modification.


8.7 Example: Checking a List of Variables 

In creating a new variable from a series of dummy variables, it may be sensible to check that only one of the dummy 

variables contains the value 1 and that the rest are zero. 

The instructions shown in Figure 8.8 test that only one of the dummy variables has been coded 1 and they 

create the new variable Region. Scratch variables are used to contain the results of the IF test. The scratch variables 

#Test1 and #Test2 are generated equal to 0 in all cases. 

#Test1 is incremented each time the DO loop test is true. #Test1 is 0 at the end of the DO loop if none of the 

variables contained a 1. #Test1 is greater than 1 if more than one of the variables contained a 1. #Test2 is incremented 

by the value of the variable in the DO loop each time the IF test is false or missing. If #Test2 is missing 

at the end of the loop, one of the dummy variable values is missing. If #Test2 is greater than 0 at the end of the 

loop, one or more of the dummy variables was some value other than 0 or 1. 

__________________________________________________________________________ 

Figure 8.8 Checking Variables Using PUT and Scratch Variables 

MODIFY Regional 

[ KEEP North.East TO South.West Age Sex ; 

GENERATE Region = .M1. ; 

GEN #Test1 = 0, GEN #Test2 = 0 ] 

[ DO #J = 1 TO 4; 

IF V(#J) EQ 1, SET Region = J, 

T.INCREASE #Test1, FM.INCREASE #Test2 BY V(#J) ; 

ENDDO; 

IF #Test1 EQ 0, PUT .N. ; 

IF #Test1 GT 1, PUT .N. , 

SET Region = .M2.; 

IF #Test2 MISSING OR #Test2 GT 0, 

PUT .N. , 

SET Region = .M3. ; 

IF .PUT. GT 0, RETAIN ], 

OUT Errors $ 

__________________________________________________________________________ 

The PUT instruction reports any error conditions. Region is set to either missing type 2 or missing type 3, 

depending on whether the error is a missing or multiple coding of 1 or a missing or non-zero coding of 0. Finally, 

if there were errors, the case is retained and written to an error file — the output file named “Errors”. .PUT. is a 

system variable which is reset to 0 as each new case is read. It is incremented each time that a PUT instruction is 

issued. In Figure 8.8 the PUT instructions are only made to report errors and any case with a .PUT. value of 0 is 

error free. 

Consider: 

[ IF FIRST ( .FILE.) GEN #n = 0] and 

[ GEN #n = 0] 

The generate by itself zeros #n whenever a new case is read. The generate hung on FIRST (.FILE.) only zeros #n 

when the initial case is read.


8.8 Example: Selecting a Block of Cases 

Selecting a block of cases, such as all cases from the one with the value “Jones” on Last.Name up to (but not including) 

the case with the value “Smith”, is a bit more complicated than selecting cases by their position in the file 

(.N.) or by specific values of a variable. Values in the permanent vector may be used to delineate the block of 

cases: 

[ IF FIRST ( .FILE. ), SET P(1) = 0 ; 

IF Last.Name EQ 'Jones', SET P(1) = 1 ; 

IF P(1) = 0, DELETE ; 

IF Last.Name EQ 'Smith', QUITFILE ] 

When the first case in the file is processed, P(1) is set equal to 0. It is reset to 1 when the Jones case is processed. 

Any cases with values of 0 for P(1) are deleted from further processing. Thus, cases prior to Jones are deleted. 

When the Smith case is found, processing of the file stops (without using this case). 

8.9 THE SPLIT FUNCTION 

The SPLIT function divides a case into multiple cases. When data are collected with related information in 

a single case, reorganization into multiple cases may be necessary for various commands. For example, a household 

survey may have both household information and information for several household members in the same 

case. A medical study may have several patient visits or lab results in the same case. This organization is often 

inappropriate for many statistical analyses such as TTEST or ANOVA, which require explicit grouping variables 

or indices. 

Special forms of the SPLIT function: 

[ SPLIT ] or [ SPLIT * ] 

reverse the effects of COLLECT, the function which gathers multiple cases into one case. They are described after 

the discussion of COLLECT and in the summary ending this chapter. 

8.10 Splitting a Case 

The simplest usage of SPLIT divides each case into a designated number of cases. This file has two cases and 

each case has two variables: 

Test1 Test2 

16 12 

17 11 

Suppose you wanted to split each case into 2 cases. A command such as: 

LIST X [ SPLIT INTO 2 ] $ 

receives only the newly created cases. When SPLIT INTO 2 is encountered, each case in the file is converted into 

two cases: 

Test1 

16 

12 

17 

11 

There is an error message if the number of variables being split is not a multiple of the SPLIT argument. For example, 

you cannot do a simple SPLIT INTO 3 when there are 5 variables, but you can when there are 3, 6, 9, 12, 

etc.


Either: 

SPLIT INTO N or SPLIT N 

may be used; the word INTO is optional. N, the number of new cases, must be an integer. It can be an integer 

constant, a permanent scratch variable (##n), or (in a macro) a temporary scratch variable (#n). Thus, when the 

input case has 40 variables, SPLIT INTO 4 yields ten variables in each of the four new cases. The first ten variable 

names are used. Case one has values 1 to 10, case two has values 11 to 20, and so on. There is an error message 

if the variables are split into cases such that a numeric and a character variable would be combined into a single 

variable (would be in the same column). 

The variables present in the new cases are also determined by additional options used with the SPLIT function. 

These options can occur in any order, as often as needed. Their order determines the order of the variables 

in the new cases. SPLIT itself must precede any options. 

8.11 CARRYing Identifying Variables 

CARRY is an optional instruction that specifies one or more variables to be carried in every case formed by the 

SPLIT. CARRY requires one or more variables as its argument: 

CARRY Name, or CARRY Name Age Sex, 

Figure 8.9 illustrates the results of a SPLIT where CARRY is used to position the variables Name, Age and 

Sex in each of the new cases. Only the variables not mentioned in the CARRY instruction are split. (STUB may 

be used with LIST to highlight the hierarchical relationship between the carried variables and the split variables.) 

__________________________________________________________________________ 

Figure 8.9 Using CARRY in the SPLIT Function 


Name Age Sex Test1 Test2 

Smith, Jason 11 1 16 12 

Wilson, Ann 14 2 17 11 

LIST Students [ SPLIT INTO 2, CARRY Name Age Sex ] $ 

Name Age Sex Test1 

Smith, Jason 11 1 16 


Wilson, Ann 14 2 17 


LIST Students [ SPLIT INTO 2, CARRY Name Age Sex ], 

STUB Name Age Sex $ 

Name Age Sex Test1 


12 


11 

_________________________________________________________________________


8.12 Selecting Variables To USE 

The variables to be used in the SPLIT may first be selected by using KEEP in a separate modification clause, 

[ KEEP Test1 Test2 ; 

SPLIT INTO 2 ] 

or they can be specified as part of the SPLIT function with the USE option: 

SPLIT INTO 2, USE Test1 Test2 ; 

Figure 8.10 shows the results of a USE selection. 

__________________________________________________________________________ 

Figure 8.10 Selecting Variables for SPLIT with USE 




Wilson, Ann 14 2 17 11 

LIST Students [ SPLIT INTO 2, USE Test1 Test2 ] $ 

Test1 

16 

12 

17 

11 

__________________________________________________________________________ 

USE requires either one or more variable names as its argument. The number of variables must be a multiple 

of the SPLIT argument. If SPLIT INTO 6 is used and USE specifies 18 variables, the 18 variables are split into 

six output cases with three variables each. The variable names are those of the first three variables specified after 

USE. The USE variables can include ranges: 

USE Test1 TO Test9 Test99 

The USE option may be used solely to reorder the variables that are in the output cases: 

SPLIT INTO 2, USE Test2 Test1 ; 

When all variables are to be used, USE is not necessary — these two instructions are equivalent: 

SPLIT INTO 2; 

SPLIT INTO 2, USE V(1) .ON. ; 

USE is also not necessary when other options, such as CARRY, are present and the number of variables not being 

carried is a multiple of the SPLIT argument. 

8.13 Defining New Variables with CREATE 

When the cases in a file are split, the names of the variables are those of the variables present in the first new case. 

Test1 and Test2 may be appropriate names before the SPLIT, when the variables are in one case. However, when 

the case is SPLIT, a variable name such as Test.Score may be more appropriate for all the Test? variables. The 

CREATE option gives an output variable a new name and also specifies just which variable values are to be used 

for that variable. CREATE takes the place of USE.


The first argument for CREATE is the new variable name. The subsequent arguments are the existing variables 

whose values will be those of the new variable: 

CREATE Test.Score Test1 Test2 

The new variable created is “Test.Score”. The first new case output from SPLIT gets the value of the variable 

Test1, the first variable in the current input case to be used, for the new variable Test.Score. The second new case 

gets the value of Test2, the second variable in the current input case, for the same new variable, and so on. Figure 

8.11 shows the effect of CREATE. Note that the variables produced by the SPLIT are in the order in which they 

are mentioned. 

__________________________________________________________________________ 

Figure 8.11 Naming the New Variables with CREATE 




Wilson, Ann 14 2 17 11 

LIST Students 

[ SPLIT 2, 

CREATE Test.Score Test1 Test2 , 

CARRY Name ] $ 

Test 

Score Name 

16 Smith, Jason 

12 Smith, Jason 

17 Wilson, Ann 

11 Wilson, Ann 

__________________________________________________________________________ 

The number of variables in the list following after CREATE and the name for the created variable must equal 

the number of new cases being produced. Several CREATE instructions may follow a SPLIT. However, the number 

of variables in each CREATE list must equal the number of new cases being produced. Figure 8.12 shows 

three new cases produced from each existing case. Thus, three variables are in each CREATE list. Two new variables 

are defined. Two new variables, each using three existing variables, equal six variables, which is the number 

of variables in the original case to be split. 

When CREATE is used, any variables not cited in the CARRY or CREATE instructions are omitted from the 

SPLIT unless USE is also included. When USE is included without a variable list, all the remaining variables are 

included in the SPLIT. The variable names for these additional variables are those of the variables in the first case 

of the output file. 

8.14 Wildcard Notation and Masks 

Often cases which contain the type of data that is appropriate for splitting have variable names in which part of 

the name is a prefix and the rest is a counter or additional text to distinguish the values. When this situation exists, 

the ? wildcard notation can be used.


The ? either follows a prefix to indicate all variables starting with that prefix, or it precedes a suffix to indicate 

all variables ending with that suffix. Either of the following produce the same result: 

SPLIT 2, CREATE Test.Score Test1 Test2 ; 

SPLIT 2, CREATE Test.Score Test? ; 

Another way to select certain variables is to use a mask after a range: 

USE Test1 TO Test8 (MASK 1001), 

is the same as saying: 

USE Test1 Test4 Test5 Test8, 

given of course that Test1 through Test8 are consecutive variables in the file. 

Figure 8.12 Multiple CREATE Lists 

File Field121: 

Crop Date Y1 Y2 Y3 Y4 Y5 Y6 

Alfalfa 8/24/83 181 179 182 195 192 198 

Alfalfa 8/30/82 179 177 176 192 190 199 

LIST Field121 

[ SPLIT INTO 3, CARRY Crop Date, 

CREATE Plot.1 Y1 TO Y3, CREATE Plot.2 Y4 TO Y6 ], 

STUB Crop Date $ 

Crop Date Plot.1 Plot.2 

Alfalfa 8/24/83 181 195 

179 192 

182 198 

8/30/82 179 192 

177 190 

176 199 

__________________________________________________________________________ 

8.15 INDEXing Cases 

The INDEX option sequences the cases created by SPLIT. Several different indices may be built at the same time. 

Multiple indices are most useful when SPLIT is used to reorganize data for analysis of variance. 

INDEX requires a name for the new variable: 

SPLIT 2, INDEX Treatment ; 

A new variable named “Treatment” is created. It has the value 1 in the first case created by SPLIT and the value 

2 in the second case created by SPLIT. Figure 8.13 illustrates the use of INDEX. 

Multiple indices may also be created: 

INDEX Plot 2 Subplot 3, 

This creates two new variables named “Plot” and “Subplot”. Plot has the values 1 and 2. Subplot has the values 

1, 2, and 3. The first index moves more slowly than the second index, so that Plot remains 1 as Subplot is succes-


sively 1, 2, and 3. Then Plot becomes 2, and Subplot is successively 1, 2, and 3. This means that there must be 

six cases created by the SPLIT. 

When the right-most index value is omitted, the appropriate value is assumed. INDEX A 2 B is equivalent to 

INDEX A 2 B 3 when SPLIT INTO 6 has been used, because the product of the INDEX values equals the SPLIT 

argument. 

__________________________________________________________________________ 

Figure 8.13 Producing an Index Variable 




Wilson, Ann 14 2 17 11 

LIST Students 

( SPLIT 2, CARRY Age Sex, 

INDEX Seq, CREATE Test.Score Test? ) $ 

Test 

Age Sex Seq Score 

11 1 1 16 

11 1 2 12 

14 2 1 17 

14 2 2 11 

__________________________________________________________________________ 

8.16 Ordering Variables with STEP and CYCLE 

The order of the variables in a file is sometimes not the desired one. Variables may be rearranged by using the 

PPL instruction KEEP with variable selection and possibly a MASK, or within a SPLIT, by using CREATE and 

USE with lists of variables. In addition, if the variables are arranged in a regular pattern, they may be ordered 

using the STEP and CYCLE options, which permit more concise specification when there are many variables. 

The STEP option selects every second variable when its argument is two, every third variable when its argument 

is three, and so on. For example, given a file with 26 variables named A to Z, this: 

SPLIT 13, USE ( A TO Z) STEP 2; 

selects every second variable between A and Z, beginning with A. 

STEP moves through the list of variables selecting the first (A), advancing the step size (2), selecting the designated 

variable (C), and so on, until the list is exhausted. The number of variables selected by the STEP 

procedure must be a multiple of the number of variables required by the SPLIT function. In the prior example, 13 

variables are selected from the 26 variables in the USE list, and these are divided into 13 cases. There is one variable 

per case. The USE list should be specified “B TO Z” if every other variable beginning with B is to be selected. 

When STEP and CYCLE are used, the variable list following USE or CREATE must be enclosed in parentheses. 

CYCLE works in a similar manner, except that when the variable list is exhausted, CYCLE goes back to the beginning 

of the list and begins selecting from the unused variables. (STEP does not return to the start of the list.)


Because the initial starting place in the list changes when CYCLE is used, different variables are selected in each 

iteration. The number of iterations depends on the CYCLE argument and the number of variables in the USE list: 

SPLIT 6, USE ( V(1) TO V(12) ) CYCLE 3 ; 

The CYCLE instruction selects variables 1, 4, 7, 10; 2, 5, 8, 11; and 3, 6, 9, 12; in that order. 

The selection order is a result of the initial variable in the USE list, the number of variables in the USE list, 

and the CYCLE argument. Ultimately, all the variables in the USE list are selected. Thus, CYCLE differs from 

STEP, where only a fraction of the variables in the USE or CREATE list are selected. Note that the number of 

variables in the variable list (12) must be a multiple of the number of cases into which the current case is being 

SPLIT (6). This is true for both the STEP and CYCLE procedures. Also, both STEP and CYCLE must follow 

either USE or CREATE and must not have a comma preceding them. 

Figure 8.14 shows the differing results that depend on whether STEP or CYCLE is used. STEP moves 

through the entire USE list, beginning with the first variable (Q2) and selecting every other variable. Four variables 

are selected and that meets the requirement of this SPLIT that a multiple of 4 be chosen. Four variables split 

into four cases yield one variable per case. 

__________________________________________________________________________ 

Figure 8.14 Using STEP and CYCLE 

File F: 

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 

1 2 3 4 5 6 7 8 9 

11 12 13 14 15 16 17 18 19 

LIST F 

[ SPLIT 4, INDEX A 2 B, INDEX C, CARRY Q1, 

USE ( Q2 TO Q9 ) xxxx 2 ] $ 

produces: if xxxx = STEP if xxxx = CYCLE 

A B C Q1 Q2 Q2 Q4 

1 1 1 1 2 2 4 

1 2 2 1 4 6 8 

2 1 3 1 6 3 5 

2 2 4 1 8 7 9 

1 1 1 11 12 12 14 

1 2 2 11 14 16 18 

2 1 3 11 16 13 15 

2 2 4 11 18 17 19 

__________________________________________________________________________ 

CYCLE moves through the entire USE list, beginning with the first variable and selecting every other variable 

also. However, since eight variables are cited in the USE list, CYCLE returns to the first unused variable 

(Q3) in the USE list, and begins selecting again. It keeps cycling until all of the variables in the USE list are selected. 

Eight variables are chosen, meeting the requirement of this SPLIT that a multiple of 4 variables be chosen. 

These are split into four cases, yielding two variables per case. 

SPLIT 1 and Cycle can be used to rearrange the variables in a file so that the variables in the second half of 

the file are interleaved with the variables in the first half of the file.


A1 A2 A3 A4 B1 B2 B3 B4 becomes 

A1 B1 A2 B2 A3 B3 A4 B4 with the following PPL 

LIST AB [ SPLIT 1, USE ( V(1) .ON. ) CYCLE 4 ] $ 

8.17 How SPLIT Interacts With Other PPL 

There can be only one SPLIT function per command. Normal PPL can precede or follow SPLIT. Using PPL first 

allows selection and modification of cases in the usual manner before SPLIT is used. 

The interaction of PPL with SPLIT is as follows: 

1. The first case passes from the first PPL phrase to the next, and it is modified and retained or deleted 

in the usual manner. If retained, it reaches the SPLIT instruction. 

2. When SPLIT receives the case, the current number of variables and their names change as the original 

case is split into a number of new cases. 

3. The first of these new cases passes to subsequent PPL phrases, one after another, until it is deleted 

or retained and received by the command in use. The second new case then passes to the PPL phrases 

following the SPLIT instruction, and so on, until all of the new cases resulting from the split of the 

first original case have passed through. 

4. The second original case is now processed. It passes to the PPL preceding the SPLIT, and then it 

passes to the SPLIT instruction. It is split into multiple new cases, which each pass in turn to the 

PPL following the SPLIT. When all the new cases resulting from splitting the second case have been 

processed, the process begins again with the third original case. 

__________________________________________________________________________ 

Figure 8.15 A Simple COLLECT 

File MyFile: 

Id Age Sex 

1 29 M 

2 26 F 

3 42 F 

4 - M 

LIST MyFile [ COLLECT 2 ] $ 

Age Sex Age Sex 

Id.1 .1 .1 Id.2 .2 .2 

1 29 M 2 26 F 

3 42 F 4 - M 

__________________________________________________________________________ 

8.18 THE COLLECT FUNCTION 

COLLECT is used to gather two or more adjacent cases into a single larger case. This larger case can be used 

with PPL to modify related variables or to generate across-case statistics. After the PPL, SPLIT can be used to


break the collected case back up into its original cases with any new variables appended. Additional PPL can precede 

or follow the COLLECT function. 

COLLECT is always followed by an integer argument which indicates how many cases to collect: 

COLLECT 4 

This integer is the “COLLECT counter”. It specifies the maximum number of cases to collect. It can be an integer 

constant, a permanent scratch variable (#nn), or (in a macro) a temporary scratch variable (#n). 

Figure 8.15 illustrates a simple COLLECT. As the input file is processed, every two cases are collected into 

a single case. Because variable names in a P-STAT file must be unique, the variable names in the collected case 

have a suffix added to the original variable name. The maximum suffix value is equal to the COLLECT counter. 

Variables in the first case get a suffix of .1, variables in the second case get a suffix of .2, and so on. When the 

number of cases in the file is not a multiple of the COLLECT counter, missing data are generated to fill the remaining 

variables in the final case. 

8.19 Collecting BY Groups 

Usually, in a COLLECT situation, the number of cases to be collected is not a constant. All households are not 

the same size. The BY option may be used with COLLECT to specify the variable or variables indicating group 

membership. When BY is used, the COLLECT counter indicates a maximum number of cases to collect, rather 

than an absolute number of cases. A maximum of 999 cases may be collected at once. However, if there are a 

great many variables in the file, the actual maximum will be less due to memory size limitations. 

Figure 8.16 illustrates using COLLECT with BY. When the value of the variable House.Id changes, the end 

of a group is signaled and the current COLLECT is considered complete. When COLLECT 4 is specified and a 

Household has only three members, missing data are generated in that collected case for the variables with the 

suffix .4. When a household has only two members, missing data are generated for all the .3 and .4 variables. A 

household with more than four members causes an error because the COLLECT counter is a maximum value. Notice 

that the variable defining group membership, House.Id, is carried only once in each collected case. 

__________________________________________________________________________ 

Figure 8.16 Collecting BY Group Membership 

File Caseload: 

House 

Id Sex Age 

1001 M 43 

1001 F 44 

1001 M 19 

1002 F 23 

1002 M 29 

LIST Caseload ( COLLECT 4, BY House.Id ) $ 

House Sex Age Sex Age Sex Age Sex Age 

Id .1 .1 .2 .2 .3 .3 .4 .4 

1001 M 43 F 44 M 19 - - 

1002 F 23 M 29 - - - - 

__________________________________________________________________________


8.20 CARRYing Common Information 

There may be several variables in a group of related cases that do not define group membership, but that are usually 

the same for all the cases in the group. For example, in the file Caseload, each case might have Address as a 

variable. This would normally be collected as Address.1, Address.2, and so on. This is reasonable when Address 

is expected to be different for each case. However, when Address has the same value for each case in a household 

group, it is more reasonable to have Address as a single variable in the collected case. Using CARRY followed 

by one or more variables: 

CARRY Address 

causes these variables to be placed in the collected case only once. 

A CARRY variable may be missing for some of the cases. However, if it is not missing, it must be the same 

for the entire group of collected cases unless either FIRST or LAST is also used as a collect option: 

CARRY Address, FIRST 

FIRST requests that the first non-missing value of the CARRY variable be used. LAST requests that the last nonmissing 

value be used. (Note that FIRST and LAST are not the previously described logical functions.) 

8.21 Ordering Cases with INDEX and SORT 

INDEX and SORT are COLLECT options which permit the individual cases to be placed in the collected case in 

a different order. INDEX may only be used when the BY option is also used. 

The use of INDEX is illustrated in Figure 8.17. The variable Visit indicates the order that each case should 

have in the collected case. The first patient has three visits in 1, 3, 2 order. When they are collected with Visit as 

the INDEX variable, the second case is placed in the .3 position because the value of Visit is 3. The third case for 

that patient is placed in the .2 position because the value of visit is 2. 

__________________________________________________________________________ 

Figure 8.17 Collecting Cases in a Specified Order 

File Patients: 

Id Visit WBC 

1354 1 98 

1354 3 72 

1354 2 70 

4211 2 83 

4211 3 85 

LIST Patients 

[ COLLECT 3, BY Id, INDEX Visit ] $ 

Visit WBC Visit WBC Visit WBC 

Id .1 .1 .2 .2 .3 .3 

1354 1 98 2 70 3 72 

4211 - - 2 83 3 85 

__________________________________________________________________________ 

An INDEX value may not be missing, but the set of index values for a given collect need not be complete. 

For example, in Figure 8.17, the patient with Id 4211 has values 2 and 3 for the INDEX variable Visit but no value 

1. INDEX values are assumed to be both: 1) within the range of the COLLECT counter, and 2) unique integers.


If INDEX values are out of range or repeated, one of the options WARN, IGNORE, FIRST, or LAST may 

be used to prevent an error message and indicate what to do. When WARN is used, a warning message is printed. 

When IGNORE is used, out of range or repeated INDEX values are ignored. When FIRST is used, the first of the 

cases with the repeated index are collected. When LAST is used, the last of the cases are collected. The other 

cases are ignored. WARN and IGNORE may be used only with BY and INDEX. 

SORT is another way of rearranging the cases in the collected case. SORT is followed by one or more variables 

giving the sort order in which the collected cases should be arranged: 

SORT WBC 

The sort direction can be controlled: 

SORT WBC (D) 

by specifying a direction. An upwards (U) or downwards (D) sort may be specified. When a direction is not specified, 

an upwards sort is assumed. Figure 8.18 shows the results produced by SORT. 

SORT variables may not be BY or CARRY variables. The use of both INDEX and SORT is redundant — indexing 

is done before sorting. Thus, sorting may “undo” indexing. 

__________________________________________________________________________ 

Figure 8.18 Sorting the Collected Case 


Id Visit WBC 

1354 1 98 

1354 3 72 

1354 2 70 

4211 2 83 

4211 3 80 

LIST Patients [ COLLECT 3, BY Id, SORT WBC ] $ 

Visit WBC Visit WBC Visit WBC 

Id .1 .1 .2 .2 .3 .3 

1354 2 70 3 72 1 98 

4211 2 83 3 85 - - 

__________________________________________________________________________ 

25.21 Complex Modification Using COLLECT 

Usually when a case is collected, there are additional PPL instructions to calculate summary statistics or to do 

cross-case comparisons or aggregations. Because the collected case has variables with the same prefix followed 

by .1, .2, and so on, the use of wildcards and DO loops is helpful in specifying the PPL instructions. 

Figure 8.19 illustrates a complex modification problem — locating all the salesmen in a department who earn 

more than their manager. It illustrates COLLECT, additional PPL, and finally a SPLIT to break the collected case 

back up into individual cases. 

In Figure 8.19, a new variable, Total.Pay, and a scratch variable, #Mgr.Total.Pay, are generated. #Mgr.Total.Pay 

is initialized. The COLLECT counter is set to 20, and cases are collected by Department. The five 

variables, Name, Position, Salary, Commission and Total.Pay, are each represented 20 times in the collected case 

as variables of the same names but with suffixes .1 to .20. The BY variable, Department, is present only once.


When a complete department is collected into one case, there are 101 variables (1, plus 5 times 20) in the collected 

case even though a given department may have fewer than 20 members. 

When the manager’s total pay is located, the scratch variable #Mgr.Total.Pay is set equal to it. The value of 

#Mgr.Total.Pay is compared with Total.Pay for each salesman. 

__________________________________________________________________________ 

Figure 8.19 A Complex Modification Problem 

FILE Staff: 

Department Name Position Salary Commission 

Furniture Adams Manager 20540 2875.25 

Furniture Brown Sales 17000 7230.80 

Hardware Mason Sales 16000 952.65 

Hardware Smith Manager 20300 862.95 

Hardware Green Sales 17000 4495.50 

LIST Staff 

[ GENERATE Total.Pay = Salary + Commission; 

GENERATE #Mgr.Total.Pay = 0 ; 

COLLECT 20, BY Department ] 

[ DO #J = 1, 20; 

IF Position?(#J) EQ 'Manager', 

SET #Mgr.Total.Pay = Total.Pay?(#J); 

ENDDO; 

DO #J = 1, 20; 

IF Total.Pay?(#J) LE #Mgr.Total.Pay, 

SET Total.Pay?(#J) = 0; 

ENDDO ] 

[ SPLIT ; 

IF Total.Pay GT 0, RETAIN ], 

COMMAS, MIN.PLACES 2 $ 

Total 

Department Name Position Salary Commission Pay 

Furniture Brown Sales 17,000 7,230.80 24,230.80 

Hardware Green Sales 17,000 4,495.50 21,495.50 

__________________________________________________________________________ 

The first DO loop goes from 1 to 20 to match the maximum possible size of the COLLECT. Note: instead of 

the constant 20 we could use the system variable .COLLECTSIZE., which is the number of cases found in the 

most recent collect. A powerful attribute of wildcard notation is illustrated: 

[ DO #J = 1, 20 ; 


SET #Mgr.Total.Pay = Total.Pay?(#J); 

ENDDO ]


One PPL instruction using wildcard notation takes the place of twenty instructions without it. Using a wildcard 

creates a vector of all the variables which begin with the wildcard prefix or suffix. The DO loop scratch 

variable #J accesses specified locations in this vector, just as it accesses locations in the V and P vectors. 

In Figure 8.19, each of the 20 variables which begin with “Position” is tested in turn to find the one with the 

value “Manager”. If “Manager” is found in the fourteenth Position? variable, then #Mgr.Total.Pay is set equal to 

the fourteenth Total.Pay? variable. 

The second DO loop examines the Total.Pay of each salesman. Again the wildcard notation and loop scratch 

variable simplify the procedure: 

DO #J = 1, 20 ; 

IF Total.Pay?(#J) LE #Mgr.Total.Pay, 

SET Total.Pay?(#J) EQ 0 ; ENDDO; 

Each value in the vector of Total.Pay variables is tested, and any value that is less than or equal to #Mgr.Total.Pay 

is set to zero. 

The instruction: 

SPLIT ; 

is a special form of the SPLIT function that restores collected cases and variable names to their original form. It 

uses the .1, .2 suffixes to ascertain the original number of cases and variable names. The order of the split cases 

may be somewhat different if SORT or INDEX is used in the COLLECT or if new variables are generated. The 

instruction: 

SPLIT *; 

produces all possible cases resulting from a COLLECT, even if no such cases existed before the COLLECT. 

Some cases may have all missing values of the suffixed variables when SPLIT * is used, whereas when SPLIT is 

used, only cases with at least one non-missing value of a suffixed variable are produced.) 

In Figure 8.19, when the collected case is split back up, each Department splits back into its original cases. 

The final PPL instruction: 

IF Total.Pay GT 0, RETAIN ] 

only retains cases where Total.Pay is greater than zero. 

In PPL where COLLECT has been used, a DO loop with a scratch variable and the wildcard notation are very 

convenient for referring to the collected variables. This is the case in the prior example in Figure 8.19. It is important, 

however, that the variable name prefixes (the part preceding the ?) be unique. This example gives 

unexpected results: 

[ KEEP ID Policy.No Agent Amount Age Class; 

COLLECT 4, BY ID; 

DO #J = 1, 4; 

IF Age?(#J) LT 18, SET Class?(#J) = 0; 

ENDDO; 

After the COLLECT takes place, all the variables (except the BY variable) have names such as Policy.No.1, 

Policy.No.2, Policy.No.3, Policy.No.4, Agent.1, Agent.2, and so on. The notation “Class?(#J)” in the prior example 

refers to the first #J variables (the first four since #J takes on the values 1 TO 4) beginning with “Class”. Thus, 

when #J = 1, if the result of the IF test is true, the variable Class.1 is set to 0. When #J = 2, if the IF test is true, 

Class.2 is set to 0, and so on. 

Similarly, the notation “Age?(#J)” refers to the first #J variables beginning with “Age”. These are Agent.1, 

Agent.2, Agent.3 and Agent.4, and not the intended variables Age.1, Age.2, Age.3 and Age.4. This is because the 

Agent variables precede the Age variables. “Age?(#J)” is not specific enough to refer to just the Age variables;


“Age.?(#J)” (with the dot) is unique. Remember, we want to wildcard against variables Age.1, Age.2, Age.3 and 

Age.4 . 

__________________________________________________________________________ 

Figure 8.20 A Second Complex Problem 


Last First 

ID Name Name Date Diagnosis Description Charges 

12425 Adams John 831105 - Room Fee 35.95 

12425 Adams John 831104 - Lab Tests 182.45 

12425 Adams John 831106 Ulcer Diagnosis - 

15743 Blair Sally 831221 - Blood Tests 36.00 

15743 Blair Sally 831222 Kidney Stones Diagnosis - 

15743 Blair Sally 831221 - Room Fee 35.00 

15743 Blair Sally 831222 - Surgery 745.25 



15743 Blair Sally 831223 - Blood Tests 45.00 

12269 Knox Tom 840304 - Lab Tests 69.50 

12269 Knox Tom 840304 - Room Fee 35.00 

12269 Knox Tom 840305 - Cat Scan 545.00 


12269 Knox Tom 840306 Brain Tumor Diagnosis - 


LIST Patients 

Resulting Listing: 

[ COLLECT 10, BY ID, 

CARRY Last.Name First.Name, SORT Date ; 

GENERATE Diagnosis:C32 = FIRST.GOOD (Diagnosis?) ; 

GENERATE Total.Charges = SUM.GOOD (Charges?) ; 

GENERATE Admit.Date = Date.1 ; 

GENERATE Discharge.Date = LAST.GOOD (Date?) ; 

KEEP Last.Name First.name .NEW. ] $ 

Last First Total Admit Discharge 

Name Name Diagnosis Charges Date Date 

Adams John Ulcer 218.40 110483 110683 

Blair Sally Kidney Stones 931.25 122183 122383 

Knox Tom Brain Tumor 719.50 30484 30684 

__________________________________________________________________________


Another example using COLLECT is illustrated in Figure 8.20. Whereas the problem in Figure 8.16 could 

be solved in other, perhaps simpler, ways, the report produced in Figure 8.20 would be extremely difficult to do 

without a function such as COLLECT, and it would require several steps using the SORT and COLLATE commands. 

With COLLECT, often only a single command is needed to produce a complex report. 

Figure 8.21 shows the variables and values for a single patient immediately after the COLLECT step: 

COLLECT 10, BY ID, CARRY Last.Name First.Name, SORT Date ; 

Each input case has seven variables. A collected case has 43 variables. There are 3 CARRY variables plus 10 

times the 4 remaining variables. Because the COLLECT is done using the SORT instruction, a patient’s cases are 

rearranged by Date so that the case with earliest date will be in the .1 position. The case with the next earliest date 

will be in the .2 position, and so on. The case with the last date can be located by looking for the last non-missing 

value for a Date? variable. 

__________________________________________________________________________ 

Figure 8.21 Before and After COLLECT 

John Adams' Records Before and After COLLECT: 

BEFORE: There are 7 variables in each case. 

Last First 

ID Name Name Date Diagnosis Description Charges 

12425 Adams John 831105 - Room Fee 35.95 

12425 Adams John 831104 - Lab Tests 182.45 

12425 Adams John 831106 Ulcer Diagnosis - 

AFTER: There are 43 variables in the collected case. 

Id Last.Name First.Name 

12425 Adams John 

Date.1 Diagnosis.1 Description.1 Charges.1 

831104 - Lab Tests 182.45 


831105 - Room Fee 35.95 


831106 Ulcer Diagnosis - 


- - - - 

- - - - 

- - - - 


__________________________________________________________________________


Because a suffix is appended onto each of the collected variables, Diagnosis is now Diagnosis.1 to Diagnosis.10. 

Therefore, this instruction (in Figure 8.20): 

GENERATE Diagnosis:C32 = FIRST.GOOD (Diagnosis?) ; 

does not cause a variable name conflict. The new variable Diagnosis is given the value of the first non-missing 

value of any variable beginning with “Diagnosis”. For John Adams, Diagnosis.1 and Diagnosis.2 are missing, but 

Diagnosis.3 is not missing. Its value, “Ulcer”, is used as the value of the newly generated variable Diagnosis. 

Creation of the other three new variables is similar. Total.Charges is the sum of all the non-missing values of any 

variable which begins with the prefix “Charges”: 

GENERATE Total.Charges = SUM.GOOD (Charges?) ; 

Because of the sort order, Admit.Date is Date.1 : 

GENERATE Admit.Date = Date.1 ; 

Even though it is not known how many cases were collected, it is easy to locate Discharge.Date with the 

LAST.GOOD function: 

GENERATE Discharge.Date = LAST.GOOD (Date?) ; 

The last good (non-missing) value of any variable which begins with “Date” becomes the Discharge.Date. For 

John Adams, this is 831106, the value of Date.3. 

The final PPL in Figure 8.20 is a KEEP to select the variables Last.Name, First.Name and all the new (.NEW.) 

variables generated by the PPL. 

8.22 COLLECT System Variables 

COLLECT sets 5 system variables. 

1. .COLLECTSIZE. The number of cases in the most recent collect 

2. .COLLECTMIN. The size of the smallest collected group so far 

3. .COLLECTMAX. The size of the largest collected group so far 

4. .COLLECTIONS. The total number of collects that occured 

5. .COLLECTSUM. The number of cases that have been collected. 

For example, if a file is collected by household number: 

1. .COLLECTSIZE. The size of the household most recently collected (the current household) 

2. .COLLECTMIN. The smallest household 

3. .COLLECTMAX. The largest household 

4. .COLLECTIONS. The total number of households 

5. .COLLECTSUM. The number of people in all collected households 

These variables are reset as each new COLLECT occurs. Thus, if a file has 312 households, the 5 variables are 

reset 312 times. As a result, .COLLECTSIZE., for example, can by used in the PPL following a collect: 

LIST House 

( COLLECT 10, BY household ) 

( DO #j = 1, .COLLECTSIZE. ) etc.... 

The final settings remain until some later COLLECT begins reading cases anew.



Across-case modification and aggregation are facilitated by: 

SUMMARY 

• Scratch Variables, 

• the Permanent Vector, 

• user-defined multi-dimensional arrays, and 

• the programming language functions FIRST, LAST, SPLIT and COLLECT. 

Scratch Variables have no position in a file. They are created using GENERATE followed by a name 

starting with one or two pound signs (#). The values of scratch variables created with one pound sign 

remain only for the duration of a command or macro. The values of scratch variables created with two 

pound signs remain for the duration of the run. They are explicitly changed with SET. 

The Permanent Vector is similar in behavior to scratch variables except that it has the name P assigned 

to it, and individual positions in P are located by subscript. The subscripts can be calculated. Permanent 

variables are set with SET. The P vector may only contain numeric values but these values may be passed 

not only across cases of a file, but between commands. 

The wildcard character ? may be used to reference the suffixed variables created by COLLECT (as well 

as any other variables with a common prefix or suffix): 

[ COLLECT 20, BY Department; 

DO #J = 1, 20 ; 


SET #Mgr.Total.Pay = Total.Pay?(#J) ) 

ENDDO ] 

“Position?(#J)” refers in turn to the first #J variables (20 in the above example) that begin with “Position”. 

ARRAY Commands 

An array can have up to 7 dimensions, and can be character or numeric. Array names have two characters, 

the second being the same as the first, like XX or cc or Zz. Case doesn’t matter. There can be up to 

26 active arrays. 

DEFINE.ARRAY cc ( n,n,...) 

defines the array and (optionally) initializes it. Character arrays are declared by adding the desired character 

length immediately following the array name and a colon, i.e. AA:20 . 

DEFINE.ARRAY AA (5,8) TO 0 $ 

DEFINE.ARRAY CC:20 (13,3) to ' ' $ 

SHOW.ARRAYS 

reports on the status of all the defined arrays 

DROP.ARRAY aa zz 

requests that the listed arrays be dropped so that the space can be reused. 

nn=number list=variable list vn=variable name


DROP.P.VECTOR 

releases the space normally used for the P vector and makes it available for use in arrays. 

PPL Functions: Across-Case 

FIRST (vn or .FILE.) 

is evaluated as true if this is the first case in the subgroup specified in the expression, and false if it is not 

the first case. The required expression is a variable name (vn) or a list of variables, or the system value 

.FILE. (meaning the entire file): 

IF FIRST (District, Department), SET P(1) = 0; 

Changing values of the variable or variables in an ordered file define different subgroups. 

LAST (vn or .FILE.) 

is evaluated as true if this is the last case in the subgroup specified in the expression, and false if it is not 

the last case. The required expression is a variable name (vn) or a list of variables, or the system value 

.FILE. (meaning the entire file): 

IF LAST (.FILE.), RETAIN; 

Changing values of the variable or variables define different subgroups. 

COLLECT nn 

specifies the number of adjacent cases to collect into one case. Additional PPL may precede or follow 

the COLLECT function. PPL which follows COLLECT operates on the new longer case. A common 

usage is to COLLECT cases, do modifications, and then SPLIT the long case back into the original number 

of cases. Using (SPLIT) or (SPLIT *) “undoes” COLLECT. A number of additional options may be 

used. They must follow the COLLECT: 

LIST Patients [ COLLECT 4, BY Id, INDEX Visit ] $ 

The options in the following list specify the cases to be collected and the variables in the new case. 

1. BY vn or list 

specifies one or more character and/or numeric variables that identify the cases that belong 

to a subgroup. The input file should be grouped or sorted by these variables. 

Those cases with the same values of the BY variables, that is, members of the same 

subgroup, are collected into one case. Values of missing1, missing2 and missing3 also 

define membership in different subgroups. 

When BY is used, the number of cases to be collected must still be specified. That 

number defines the maximum number of members of a subgroup. BY variables appear 

(are carried) only once in the new case. 

2. CARRY vn or list 

implies that the values of the specified variables are the same for all members of a subgroup, 

and that those variables should appear only once in the new case produced by 

COLLECT. If a value is missing, the first non-missing value for a CARRY variable is 

used. If the values differ, an error occurs unless FIRST and LAST are used. 

3. FIRST 

specifies that the first case be selected for collection if values of the INDEX variable 

are repeated or if values of the CARRY variable differ. 

vn=variable name nn=number list=variable list


4. IGNORE 

specifies that any case with a value of the index variable that is repeated or out of range 

should be ignored. IGNORE can only be used with BY and INDEX. 

5. INDEX vn 

specifies a numeric variable whose values determine the order that the cases in a subgroup 

take in the collected case. INDEX values may not be missing or exceed the 

COLLECT counter (the number of cases to be collected), without the use of IGNORE, 

WARN, FIRST, or LAST as well. INDEX may not be used without BY. 

6. LAST 

specifies that the last case be selected for collection if values of the INDEX variable 

are repeated or if values of the CARRY variable differ. 

7. SORT vn or list 

requests that the collected cases be sorted by the specified variables before being 

placed in the new, long case. SORT variables may not be BY or CARRY variables. 

The use of both INDEX and SORT is redundant, and since sorting is done after indexing, 

SORT “undoes” INDEX. 

8. WARN 

COLLECT System Variables 

An upward sort (U) or a downward sort (D) may be specified: 

[ COLLECT 5, BY Household, 

CARRY Last.Name, SORT Age (D) ] 

An upward sort is assumed when sort order is not explicitly specified. 

requests that a warning be printed if a case has a value on the index variable that is repeated 

or out of range. WARN may not be used without BY and INDEX. 

1. .COLLECTSIZE. The number of cases in the most recent collect 

2. .COLLECTMIN. The size of the smallest collected group so far 

3. .COLLECTMAX. The size of the largest collected group so far 

4. .COLLECTIONS. The total number of collects that occured 

5. .COLLECTSUM. The number of cases that have been collected. 

SPLIT INTO nn 

requests that each current case be split into the specified number of cases. (The word INTO is optional.) 

Additional PPL may precede or follow the SPLIT function, but there may be only one SPLIT per command. 

A number of options may be used. They must follow the SPLIT: 

LIST Filename 

[ SPLIT INTO 2, CARRY Name, INDEX Term 2, 

CREATE Grade (Grade.2 TO Grade.4) STEP 2 ] $ 

The following options control the order in which the variables in SPLIT cases are placed into the output 

cases and the naming of the variables which are created: 

1. CARRY vn or list 

specifies one or more variables whose values are to be carried in every case created by 

SPLIT. 

2. CYCLE nn 

specifies the size of steps to be taken in selecting variables to be used in the SPLIT. 

CYCLE follows a USE or CREATE variable list without a comma preceding it. The 

nn=number list=variable list vn=variable name


3. CREATE 

first variable is used, the variable “nn” away from the first is used next, and so on. Multiple 

passes or cycles are made through the variable list until all of the variables in the 

list are used. 

new.vn vn or new.vn list 

provides a new variable name and gives the current variables whose values are to be 

used in the split cases. They will be the values of the new variable. The number of 

variables to be used must be the same as “nn” (the number of cases into which the current 

case is to be SPLIT). 

4. INDEX new.vn nn 

specifies that a new variable be present in each case created by SPLIT. That variable 

is an index with values going from 1 to “nn”. Multiple indices may be created, but the 

product of the index values (the “nn’s”) must be equal to the number of cases created 

by SPLIT. 

5. STEP nn 

specifies the size of steps to be taken in selecting variables to be used in the SPLIT. 

STEP follows a USE or CREATE list without a comma preceding it. The first variable 

is used, the variable “nn” away from the first is used next, and so on. Only one pass 

through the variable list is made. 

6. USE vn or list 

specifies the variables to be used in the split case. They must be a multiple of “nn” (the 

number of cases into which the current case is to be SPLIT). 

SPLIT and SPLIT * 

These are special versions of SPLIT that “uncollect” a case created by COLLECT: 

[ SPLIT ] or [ SPLIT * ] 

SPLIT produces only those cases that have at least one non-missing value of a suffixed variable, whereas 

SPLIT * produces all possible cases from a COLLECT, even if no such cases existed before the COL- 

LECT. For example, if COLLECT 10 has been used, SPLIT * results in ten cases. SPLIT, on the other 

hand, produces 10 cases only if there are some non-missing values of the suffixed variables (Test.10, 

Age.10 and so on). 

vn=variable name nn=number list=variable list

9 


Modification of Character Variables 

Character variables may be modified in many of the same ways that numeric variables are modified. However, 

since character and numeric variables have different properties, there are several operators and a number of functions 

that are specific to character variables. 

This chapter briefly discusses basic character procedures — the recoding of existing character variables, the 

generation of new character variables and the logical testing of character values. The major portion of the chapter 

deals with special character operators and functions that: 

• Test character variables 

• Trim and pad character strings 

• Left and right justify or center strings; 

• Extract substrings and access words within character strings; 

• Change character strings into numeric values and vice-versa; 

• Concatenate character strings. 

9.1 BASIC CHARACTER PROCEDURES 

Data may be entered in a P-STAT system file as either character strings — mixtures of letters, digits and other 

characters — or as numbers. Generally, it is clear which way a variable’s value should be entered. A person’s 

name is entered as a character variable, whereas his or her age is entered as a numeric variable. One can find a 

substring of Name and the mean Age, but the substring of Age and the mean Name do not make sense. 

Often there is not a totally clear-cut line between character and numeric data. There are situations in which 

a variable is coded with a character string when it really has some numeric attributes. The variable Sex, for example, 

may be coded with the numbers 1 and 2, or with the character strings “M” and “F”, or “Male” and 

“Female”. If such a variable is to be used in a listing, the character representation is preferred. If the variable is 

to be given to a correlation program, the numeric representation is necessary. In these situations, functions are 

used to convert character representations into numeric values, and numeric values into character strings. 

P-STAT distinguishes between character and numeric values by using single or double quotes to enclose character 

strings. Numeric values are not enclosed in quotes. Thus, ’Sam Davis’ and ’924’ are character strings, 

whereas 924 is a numeric value. 

9.2 Generating New Character Variables 

Character variables are generated much as numeric ones are. However, when a character variable is created, it is 

necessary to specify that it is a character variable and, if it is other than 40 characters long, to specify its size. This 

instruction: 

[ GENERATE New.Name:C32 = Name ] 

creates a new variable named “New.Name”, that is 32 characters long (defined size 32) and equal to the value of 

the existing character variable named “Name”. If the variable Name has a length less than 32, the new variable 

New.Name will be padded with blanks on the right end until it is 32 characters long. If Name is longer than 32, 

characters will be truncated from the right end until only 32 characters remain.

9.2 PPL: Modification of Character Variables 

Character variables may be generated equal to a specific value: 

[ GENERATE City:C = 'Houston' ] 

The character variable named City, of size 40, is created and set equal to the string or value “Houston”, followed 

by nine blanks. The size of City is 40, since a size was not specified. Character variables may be generated with 

default names: 

[ DO #J = 1, 6; GEN ?:C = CHARACTER ( V(#J) ); ENDDO ] 

A character variable may be up to 50,000 characters long. 

9.3 Modifying Existing Character Variables 

Character variables are modified in much the same manner as numeric variables. The modification may be the 

result of some logical test or may be an instruction by itself. A variable set equal to a character string must be a 

character variable. This instruction sets the variable State to the value “Iowa”: 

[ SET State = 'Iowa' ] 

This instruction sets the variable State to the value of the variable State.Name: 

[ SET State = State.Name ] 

Modification may occur as the result of a logical test: 

[ IF .N. EQ 10, SET Name = 'John Jones' ] 

The system variable .N., the case number, is tested as each case is processed. On the tenth case, the variable Name 

is set to the character string “John Jones”. 

9.4 Logical Selection of Character Variables 

The two logical operators that operate in an identical fashion for both numeric and character data are equal (EQ) 

and not equal (NE). The other logical operators work in a somewhat different manner. A character string being 

tested must be enclosed in quotes: 

[ IF Name EQ 'Jones', DELETE ] 

The concepts of less than and greater than are different. In P-STAT, these operators are honored for character 

data as a function of the sort sequence of each computing environment. The results may be different on different 

machines if the sort sequence of the characters is different. 

On computers using the ASCII character set (such as PC and SUN), the low to high order is numbers, uppercase 

letters, and lowercase letters. Most of the special characters, in particular, blank, are “lower” than either 

letters or numbers. 

P-STAT, in its character comparisons, treats uppercase and lowercase letters as identical characters, that is, 

“A” = “a”, unless exact comparisons are specified. Such case-respecting comparisons are specified by prefacing 

logical operators with “X” for eXact character comparisons: 

[ IF Grade XEQ 'f', SET Grade = 'I' ] 

The logical operators that may be prefaced with “X” when they are applied to character variables and values are: 

EQ, NE, LT, LE, GT, GE, AMONG and NOTAMONG. 

Logical operators that have a list as their argument may be used with character strings in the same way that 

they are used with numeric values: 

[ IF Name AMONG ( 'Jones' 'Smith' 'Wills' ), RETAIN ]

PPL: Modification of Character Variables 9.3 

Where the operation of the function depends on the concept of less than or greater than, the result will depend on 

the sort sequence on the individual computing environment. This example will continue all the cases that fall in 

the sort sequence between AAAA and AZZZ, that is, all the A’s: 

IF Name AMONG ( 'AAAA' TO 'AZZZ' ), RETAIN; 

The character operators that parallel numeric operators and that may be used in logical selection are: 

EQ NE LT LE GT GE 

XEQ XNE XLT XLE XGT XGE 

AMONG NOTAMONG GOOD MISSING 

XAMONG XNOTAMONG 

These operators are discussed in the first PPL chapter. XEQ — the most useful of the exact character operators, 

is further explained later in this chapter in the section on character operators. 

In addition to these operators, the operators CONTAINS and MATCHES, which are specifically for character 

data, may be used in logical selection. They test if a character value contains a specific character string and if a 

character value matches a character string/wildcard combination. CONTAINS and MATCHES are explained in 

the section on character operators. 

9.5 Locating Non-Missing Character Data 

The functions COUNT.GOOD, FIRST.GOOD and LAST.GOOD count or locate non-missing data values. These 

functions are used with character data the same way that they are used with numeric data. The arguments for these 

functions are lists of variable names or positions. 

The COUNT.GOOD function yields the number of non-missing (“good”) values of the variables specified in 

the list: 

[ GENERATE Count = COUNT.GOOD ( Midterm Final ) ] 

The variable Count is generated and set equal to the number of non-missing test scores of the variables Midterm 

and Final. Count is a numeric variable, even though Midterm and Final may be character variables. For each case 

in this example, the maximum possible value of Count is 2. 

The FIRST.GOOD and LAST.GOOD functions yield the first or last non-missing value of the variables specified 

in the list: 

[ GENERATE Name:C = 

FIRST.GOOD ( Last.Name First.Name Middle.Name ) ] 

The character variable Name is generated and set equal to the first non-missing value of the character variables 

Last.Name, First.Name and Middle.Name. The variables referenced in the function list can be referenced by name 

or position. The word TO, meaning all the variables from the first mentioned variable through the last mentioned 

variable, or the system variable .ON., meaning all the variables from the one mentioned through the last variable 

in the file, may be used in the function list. Either of these instructions: 

[ SET Last.Guess = LAST.GOOD ( V(1) TO V(9) ) ] 

[ SET Last.Guess = LAST.GOOD ( V(1) .ON. ) ] 

sets the variable Last.Guess to the value of the last non-missing variable in the list. 

9.6 CHARACTER OPERATORS 

Character operators and character functions modify character expressions, variables and strings. Operators generally 

have two operands, one before and one after the operator. Functions have single or multiple arguments that 

follow the function and are contained in parentheses.


9.7 The CONTAINS and XCONTAINS Operators 

The operator CONTAINS tests if a character string is contained within the value of a character variable: 

[ IF Address CONTAINS 'NJ', RETAIN ] 

If the string “NJ” is contained anywhere within the variable Address, the case is continued. 

CONTAINS tests for the presence or absence of a string; the location of the string may be anywhere within 

the specified variable. To test for the absence of a string, preface the consequence with “F.” to indicate that it is 

done only when the IF test is false — that is, only when the string is not contained in the variable: 

[ IF Address CONTAINS 'NJ', F.RETAIN ] 

Alternatively, DELETE could be used instead of F.RETAIN in this situation. 

The XCONTAINS operator specifies case-respecting tests. The argument string, exactly as specified, must 

be contained within the value of the character variable: 

[ IF Comment XCONTAINS 'STAT', SET Code = 1 ] 

CONTAINS and XCONTAINS are useful in locating cases with certain value strings when you do not know the 

complete string or when the remainder of the string differs from case-to-case. 

9.8 The Concatenate Operator 

Character strings can be joined using the concatenate operator //. This operator abuts the value of one variable to 

that of another: 

[ GENERATE Name:C32 = First.Name // Last.Name ] 

If First.Name is 16 characters and Last.Name is 16 characters — for example: 

First Name Last Name 

Abe Adams 

Millicent Murphy 

Sharon Elizabeth Johnson-Mayfield 

the variable Name, created by the concatenation of the two strings, produces the following results: 

Name 

Abe Adams 


Sharon ElizabethJohnson-Mayfield 

The concatenate operator joins the input strings in their entirety. The shorter first names may incorporate 

more than the desired number of blanks, and the longer names may have no intervening blanks. A blank could be 

included in the concatenation: 

[ GEN Name:C32 = First.Name // ' ' // Last.Name ] 

This instruction joins together the three strings, First.Name, “ ” (a blank), and Last.Name. There will be at least 

one blank between the first and last names. The following results would be obtained: 

Name 

Abe Adams 


Sharon Elizabeth Johnson-Mayfiel


Note the truncation that resulted because the variable Name has a defined size of 32 — the final letter on the right 

is missing. (The squeeze concatenate operator, discussed next, is more appropriate for an operation of this type). 

Any number of strings may be concatenated. f the total length of the concatenated strings exceeds that of the 

target variable, the output is truncated on the right. The things which may be joined include character variables, 

literals in quotes and character expressions. Character variables are items such as Name and Telephone. Literals 

are character strings such as “ ” (a blank), “Susan” and “945-5600”. Character expressions are the results of functions 

such as LAST.GOOD ( V(1) .ON. ). For example, this instruction: 

[ GENERATE Telephone:C11 = 

'1' // Area.Code // FIRST.GOOD ( Phone, Alt.Phone ) ] 

illustrates a literal concatenated with a variable concatenated with an expression. 

A character expression on the right of the equal sign, however simple or complex, produces a result whose 

width can range from 0 (a null string) to 50,000 characters. Only when the result is moved across the equal sign 

into the target variable does a Procrustean stretching (with blanks) or truncation take place. 

9.9 The Trim Concatenate Operator 

The trim concatenate operator /// joins strings by trimming out all leading and trailing blanks in each of the strings 

and inserting a single blank between the strings. The concatenation of first and last names, illustrated previously 

using the regular concatenate operator, produces different results when the trim concatenate operator is used. This 

instruction, 

yields this result: 

[ GENERATE Name:C32 = First.Name /// Last.Name ] 

Name 

Abe Adams 


Sharon Elizabeth Johnson-Mayfiel 

The leading (there were none) and trailing blanks of each name have been trimmed out, and one blank has 

been inserted between the names. The variable Name could be defined as :C with no specified length which provides 

a default value of 40. In other aspects, the /// operator works just like the // operator. 

9.10 Exactly Equal Operator 

The XEQ operator tests whether two character strings are exactly equal. The strings must be identical in case, as 

well as in specific characters. Normally, comparisons in P-STAT are case-independent — “BILL” equals “Bill” 

or “biLl”. This is useful in most situations: 

[ IF Last EQ 'Smithey' AND First EQ 'Bill', 

SET Dependents = 1 ] 

Occasionally, however, a comparison that respects case is required. The XEQ operator is used in those situations. 

It is functionally similar to IVAL, described in a subsequent section. 

Figure 9.1 illustrates using the XEQ operator. Character strings, containing information about logon and logoff 

activity on a mainframe computer, existed in a P-STAT file. Separate counts of logons and logoffs were 

desired. However, the logon and logoff instructions were typically abbreviated, and the abbreviations were differentiated 

only by case. The XEQ operator specifies a test of exact equality — one that respects the case of the 

character string. The operators XNE, XLT, XLE, XGT and XGE function similarly.


__________________________________________________________________________ 

Figure 9.1 The XEQ Operator for Tests that Respect Case 

FILE Filelog: 

Text 

L Fred Smith 

L Will Roys 

l 

disc 

L William 

l 

L Penelope Rt 

PROCESS Filelog 

[ IF FIRST (.FILE.), GEN #LogOn = 0, GEN #LogOff = 0 ; 

IF TOKEN (Text) XEQ 'L', INC #LogOn ; 

IF TOKEN (Text) XEQ 'l', INC #LogOff; 

IF LAST (.FILE.), 

PUT #LogOn ,> 

#LogOff > ] $ 

There were 4 logons and 2 logoffs. 

__________________________________________________________________________ 

9.11 CHARACTER FUNCTIONS 

There are a number of functions that are used only with character values. These functions, in alphabetical order, 

and the tasks they perform, are: 

1. BLANK Blank out specified characters within a string. 

2. XBLANK Like BLANK, but case respecting. 

3. CAPS Capitalize the first character of each token. 

4. CENTER Center a character string. 

5. CHANGE Correct a substring within a string. 

6. CLAG Performs a lag on a character argument. 

7. XCHANGE Like CHANGE, but case respecting. 

8. CHAREX Create a character value from digits in a number. 

9. CHARACTER Convert a number to a character string. 

10. COMPRESS Squeeze out specified characters. 

11. CVAL Give character equivalent of specified number. 

12. IVAL Give number equivalent of specified character. 

13. LEFT Left justify a character string. 

14. LENGTH Locate the last non-blank character in a string. 

15. LOWER Convert characters to lowercase equivalents. 

16. NUMBER Convert a character string to a number. 

17. PAD Pad a character string with specified characters. 

18. POSITION Give the position of one string within another.


19. XPOSITION Like POSITION, but case respecting. 

20. RIGHT Right justify a character string. 

21. SIZE Determine the defined size of a character variable. 

22. SUBSTRING Extract substrings from character strings. 

23. TOKEN Access “words” within character strings. 

24. TRIM Trim specified characters from strings. 

25. UPPER Convert characters to uppercase equivalents. 

26. VARNAME Convert a variable name to a character value. 

27. VERIFY Test for unexpected characters in a string. 

The function name is followed by parentheses containing one or more expressions or constants. All expressions 

may be complex and can consist of variable names, literals and other functions. Expressions may be nested 

within other expressions. The mode and number of the arguments permitted depend on the individual function. 

Any character expression on the right of the equal sign, no matter how simple or complex, produces a result 

whose width can range from 0 (a null string) to 50,000 characters. Only when the result is moved across the equal 

sign into the target variable does a padding (with blanks) or truncation take place, if necessary. 

9.12 Centering and Justifying Strings 

The functions CENTER, LEFT and RIGHT affect the position of a character string within its defined field. The 

CENTER function centers the string. The LEFT and RIGHT functions, respectively, left and right justify the 

strings within their fields: 

LEFT ( ' ABC' ) = 'ABC ' 

CENTER ( 'XYZ ' ) = ' XYZ ' 

RIGHT ( 'SPQR ' ) = ' SPQR' 

9.13 Changing the Case of Strings 

UPPER ( 'abc' ) = ABC 

LOWER ( 'ABC' ) = abc 

CAPS ( 'ann smith' ) = Ann Smith 

UPPER, LOWER and CAPS are the three functions which change the case of a value: 

SET Name = UPPER ( Name ) 

LOWER converts a value to all lowercase characters. It is possible to nest character functions. This permits conversion 

of all of a name to lowercase except the first letter: 

SET Name = SUBSTRING ( Name, 1, 1 ) // 

LOWER ( SUBSTRING ( Name, 2 ) ) 

(The SUBSTRING function is discussed subsequently.) 

CAPS capitalizes the initial letters of words in character variables: 

[ GEN Name:C = CAPS ( 'JOHN paul JoNeS' ) ] 

More exactly, CAPS puts all initial letters in upper case and all other letters in lower case. A blank is the assumed 

delimiter between tokens (words). The output appears like this: 

John Paul Jones 

Optionally, a second argument giving a replacement delimiter for the blank or an additional delimiter may be specified. 

This instruction: 

[ SET V(1) = CAPS ( 'abc,def,ghi', ',' ) ] 

produces:


Abc,Def,Ghi 

The comma is specified as the token delimiter. It is enclosed in single or double quotes. 

9.14 Length and Size of Strings 

LENGTH ( ' abc ' ) = 5 

SIZE ( ' abc ' ) = 8 

The functions LENGTH and SIZE yield information about the actual length and the defined size of character values. 

LENGTH gives the location of the right-most non-blank character: 

[ GENERATE Count = LENGTH ( Name ) ] 

The SIZE function yields a numeric value giving the defined size of a character value: 

[ GEN Width = SIZE ( Name ) ] 

The variable Width is generated and set equal to the longest possible length of the variable Name. This length is 

the defined size of Name or the size resulting after various character function procedures or operations. 

9.15 Locating Strings Within Variables 

POSITION ( 'ABC', 'B' ) = 2 

POSITION ( 'ABC', 'X' ) = 0 

XPOSITION ( 'ABab', 'ab' ) = 3 

VERIFY ( 'ABCDE', 'AEIOU' ) = 2 

The POSITION, XPOSITION and VERIFY functions yield a numeric value which is the location of a string within 

a character value. The simpler usage of POSITION has an expression and a character string as arguments: 

[ GEN Blank.Location = POSITION ( Name, ' ' ) ] 

The numeric variable Blank.Location is generated and set equal to the location of the first occurrence of a blank 

in the variable Name. The second argument may be a character variable whose value is the string to be located: 

[ GEN Locale = POSITION ( Address, Zip ) ] 

The variable Locale is the location of the value of Zip (the zip code string) within the variable Address. If the 

string is not located, the result is zero. Values match regardless of whether they are uppercase or lowercase. 

The more complex usage of POSITION permits searches for multiple strings. The left-most position of any 

successfully located string is given as the function result. The arguments for POSITION are the expression and 

the character strings to be located: 

[ GEN XX = POSITION ( Name, 'Jr.', 'Sr.', 'Esq.' ) ] 

The variable XX is the location of the left-most occurrence of any of the specified strings. 

An optional argument for length may be provided. It should be right-most in the argument list: 

[ SET Extra = POSITION ( Phone, ' ()-/.', 1 ) ] 

The contents of the character strings, whose positions are being sought, are divided into strings of the specified 

length. Thus, portions of strings are treated as separate arguments. In the preceding example, the variable Extra 

is set equal to the position of the first occurrence of any of the characters in the search string. The “1” specifies 

that each single character in the search string is itself a search string. The length argument must be an integer 

between 1 and 50,000. The number of characters in the search string must be evenly divisible by the length. 

XPOSITION is just like POSITION, except that the case (upper, lower or mixed) of the character string whose 

position is being sought is respected: 

[ GEN Fatal = XPOSITION ( Symptom, 'D' ) ]


The variable Fatal is generated and set equal to the position of upper-case “D”; lower-case “d” is ignored. XPO- 

SITION may be used with the same types of arguments as POSITION. 

The VERIFY function returns the location of the first character in the initial arguments that is not in any of 

the remaining arguments: 

[ GEN BAD = VERIFY ( 'ABCDE', 'EA', 'B' ) ] 

BAD is set to 3, since its third character, “C”, is not in any of the remaining arguments. Thus, the presence of only 

specified characters may be verified. Multiple character string arguments are permitted, although each character 

is considered as a separate string. 

9.16 Extracting Substrings and Words 

SUBSTRING ( 'ABCDE', 3, 2 ) = 'CD' 

TOKEN ( 'Ann Smith' ) = 'Ann' 

The SUBSTRING and TOKEN functions access portions of character strings. SUBSTRING accesses a string 

starting at a specified location and of a given length. TOKEN accesses “words” within a character string — that 

is, strings delimited by blanks or another specified character. 

SUBSTRING requires an expression, a start location and a length as arguments: 

[ GEN Initial:C1 = SUBSTRING ( First.Name, 1, 1 ) ] 

The variable Initial is generated as a character variable with a defined size of 1. It is set equal to the substring of 

First.Name, beginning at the character in position 1 and having a length of 1. The third argument giving the length 

is optional. When it is omitted, the assumption is that the rest of the string is needed. If the third argument is 

omitted when the second argument is 1, the entire input string is the substring. 

An expression may be used as either the location or length argument in SUBSTRING: 

[ GEN #Len = LENGTH ( Phone.No ) ] 

[ GEN Code:C2 = SUBSTRING ( Phone.No, #Len-1, 2 ) ] 

The variable Code is set equal to the two right-most digits in the telephone number. (The scratch variable #Len 

is the length of the phone number. #Len-1 identifies the start location as the next-to-last digit, and 2 is the length 

of the substring.) 

The TOKEN function accesses words or strings of characters within a character variable. The strings are typically 

separated by blanks: 

[ GEN First.Name:C = TOKEN ( Name ) ] 

The variable First.Name is the first word in the variable Name. 

Optional arguments for the TOKEN function access specific words and specify the delimiter between words. 

These instructions, 

[ GEN First.Name:C = TOKEN ( Name, 1 ) ; 

GEN Middle.Name:C = TOKEN ( Name, 2 ) ; 

GEN Last.Name:C = TOKEN ( Name, 3 ) ; 

IF Last.Name MISSING, 

SET Last.Name = Middle.Name, 

SET Middle.Name = ' ' ] 

access the second and third tokens in Name, as well as the first. When the second argument is omitted, the first 

token is accessed. Note that accessing a token that is not present (the third token in a name that has only two tokens) 

yields a missing value. 

A delimiter other than the blank may be specified: 

[ GEN Year.of.Birth:C = TOKEN ( Birthdate, 3, '/' ) ]


In this example, the slash is specified as the token delimiter. Assuming Birthdate has values such as 11/19/68, the 

value of the third token, 68, will be used as the value of Year.of.Birth. 

TOKEN accesses tokens counting from the left. TOKEN is a synonym for LTOKEN. RTOKEN accesses 

tokens counting from the right. In all other aspects, it is the same as LTOKEN. 

The NTOKEN function yields a count of tokens within a character string. The result is a numeric value, not 

a character string. This instruction: 

[ GEN #Number = NTOKEN ( Address ) ] 

generates a scratch variable #Number that equals the number of words in the variable Address. The delimiter is 

assumed to be a blank unless a second argument specifies one or more alternate delimiters: 

[ GEN Number.Read = NTOKEN ( Magazines, ', ' ) ] 

A comma and a blank are specified as the token delimiters. The variable Number.Read is the number of strings 

separated by commas and/or blanks in the variable Magazines. 

9.17 Blanking Out and Changing Strings 

BLANK ( 'abcde', 'bd' ) = 'a de' 

BLANK ( 'abcde', 2, 2 ) = 'a de' 

CHANGE ( 'abcde', 'bc', '999' ) = 'a999de' 

CHANGE ( 'abcde', 2, 3, '2222' ) = 'a222e' 

The BLANK, XBLANK, CHANGE and XCHANGE functions alter character variables by replacing portions 

with either blanks or specified new strings. This provides the ability to delete or replace substrings. BLANK and 

CHANGE ignore the case of character strings; XBLANK and XCHANGE respect the eXact case of character 

strings. 

The BLANK function has two usage modes. The simpler usage specifies an expression, a starting location 

for the blank string, and the length of the string. This instruction: 

[ GEN Birthday:C4 = BLANK ( Date.of.Birth, 5, 2 ) ] 

replaces the character string beginning with the fifth character and of length 2 with two blanks. The variable Birthday 

becomes '1215 ' instead of '121545'. 

The third argument, giving the length of the string being blanked out, may be omitted. The characters from 

the start location through the end will be replaced by blanks: 

LIST Vocab.Test 

[ KEEP Vocab.Words Definitions ; 

SET Vocab.Words = BLANK ( Vocab.Words, 2 ) ] $ 

This listing will have only the initial letter of each vocabulary word and the definitions. 

The alternate usage of the BLANK function specifies a particular character string to be replaced by blanks, 

rather than a location and length. XBLANK may be used with this type of argument. The first occurrence of the 

string is blanked out: 

BLANK ( 'abcde', 'CD' ) produces 'ab e' 

XBLANK ( 'abcde', 'CD' ) produces 'abcde' 

Multiple occurrences of a character string may be blanked out. An optional third argument to the BLANK 

function specifies the maximum number of occurrences of the string to be replaced: 

[ SET Comments = BLANK ( Comments, 'damn', 10 ) ] 

Up to ten occurrences of the word “damn” in the variable Comments will be replaced with an equivalent number 

of blanks. The size of the resultant variable always remains the same when the BLANK function is used.


The CHANGE and XCHANGE functions have two usage modes comparable to those of the BLANK function. 

The most common usage of CHANGE specifies an expression, an old string and a new string. The first 

occurrence of the old string is replaced with the new string: 

[ SET State = CHANGE ( State, 'TX', 'Texas' ) ] 

XCHANGE is the same as CHANGE, but the case of the old string must be exactly as specified or it is not replaced 

by the new string. 

The old and new strings may be specified with expressions, which are variable names, literals and other 

functions: 

[ IF New.Area.Code GOOD, SET Phone.No = 

CHANGE ( Phone.No, Area.Code, New.Area.Code ) ] 

The area code in the phone number is changed to the new area code, unless the new one is missing. The value of 

Area.Code is the old string and a good (non-missing) value of New.Area.Code is the new string. If the value of 

Area.Code is not found in Phone.No, no change is made. 

Multiple changes may be specified. An optional fourth argument gives the maximum number of changes: 

[ SET Title = CHANGE ( Title, ' ', '.', 999 ) ] 

All blanks are changed to periods, including leading or trailing blanks. 

If a new character string is not specified, the old substring is removed, making the length of the result smaller. 

This instruction removes occurrences of the word “damn”: 

[ SET Comments = CHANGE ( Comments, 'damn', 10 ) ] 

The alternate usage of the CHANGE function specifies an expression, a starting location of a string, the length 

of the string, and a new string. The new string may or may not be the same length as the old one: 

[ IF SUBSTRING ( Telephone, 4, 3 ) = '897', 

SET Telephone = CHANGE ( Telephone, 4, 3, '807' ) ] 

The CHANGE and XCHANGE functions may make a value longer or shorter. This does not matter until the 

final resulting value is moved across the equal sign. At that time, truncation or blank padding will occur as needed. 

9.18 Squeezing Out Specified Characters 

COMPRESS ( '12/35/95', '/' ) = '122595' 

COMPRESS ( '..AB...CD..', 1, '.' ) = '.AB.CD.' 

The COMPRESS function squeezes out either blanks or specified characters. This instruction: 

[ SET SS.Number = COMPRESS ( SS.Number ) ] 

squeezes out all leading, trailing and embedded blanks contained in the variable SS.Number. Only non-blank 

characters remain. 

An expanded usage mode of the COMPRESS function permits specification of the number of delimiters that 

may remain between tokens (strings or words), and the delimiter character or characters that separate tokens. This 

instruction will leave only a single blank between words wherever one or more blanks are found: 

[ SET Sentence = COMPRESS ( Sentence, 1 ) ] 

This next instruction will generate a numeric variable from a character one, after all the specified characters are 

squeezed out: 

[ GEN Income = NUMBER ( COMPRESS ( Income, ',$' ) ) ] 

The token delimiters are specified as the comma and the currency sign. Since the second argument, the number 

of delimiters that may remain, is missing, zero is assumed. All specified delimiters are removed, leaving only


numbers (it is hoped). The NUMBER function (discussed subsequently) converts the resultant string into a numeric 

value. 

9.19 Trimming Strings 

TRIM ( ' abcd ' ) = ' abcd' 

LTRIM ( ' abcd ' ) = 'abcd ' 

LRTRIM ( ' abcd ' ) = 'abcd' 

TRIM ( 'abc ***', '*' ) = 'abc ' 

TRIM ( 'abc ***', 2, '*' ) = 'abc *' 

The TRIM functions remove either blanks or one or more specified characters from one or both ends of a 

string. TRIM is a synonym for RTRIM — it trims blanks or characters from the right end of a string. LTRIM 

trims from the left end and LRTRIM trims from both ends. 

The TRIM functions have a character expression and a single character or a string of characters as arguments 

and an optional number that limits the number of characters to be trimmed. The string of trim characters is optional 

and, when it is not present, blank trim characters are assumed. In this example, blank characters are trimmed 

from the right end of the variable First.Name before concatenating it with a blank and variable Last.Name: 

[ SET Name = TRIM ( First.Name ) // ' ' // Last.Name ] 

Multiple trim characters may be specified: 

[ GEN Text:C = TRIM ( TRIM (Var1), '.,-' ) ] 

Any of the three specified punctuation marks that occur on the right end of values of Var1 will be trimmed off, 

yielding values of the new variable Text. Values of “hippo-”, “closing,” and “also...” will become “hippo”, “closing” 

and “also”. The resultant value may have a shorter length than it did prior to trimming. Notice that a simple 

TRIM is used to remove excess blanks first, so that the punctuation is right-most and therefore able to be trimmed. 

Note that: 

TRIM ( VAR1, '., -' ) 

which adds a blank to the trim characters, is slightly different; it not only trims initial blanks on the right, it also 

trims blanks after other trim characters have been found. I.e., 'ab. - , ' would become 'ab' instead of 

the'ab. - ' which results when the simple TRIM of blanks is done before the TRIM of the punctuation 

characters. 

9.20 Padding Strings 

PAD ( 'abcd', 6 ) = 'abcd ' 

LPAD ( 'abcd', 7, '-' ) = '---abcd' 

PAD ( 'abcd', 3 ) = 'abcd' 

LRPAD ( 'abcd', 9, '-' ) = '--abcd---' 

The PAD functions add blanks or a specified character to the right (PAD or RPAD), to the left (LPAD), or to both 

ends of a string (LRPAD). 

The PAD functions have a character expression, a minimum length, and a fill character as arguments. Only 

the character expression is required. When the third argument, the fill character, is omitted in any of the PAD 

functions, a blank character is assumed. 

[ GEN ABC:C = PAD ( TRIM ( V(5) ), 10, '-' ) ] 

The trimmed form of variable 5 is padded with dashes to a width of 10. If the trimmed form is already 10 or more, 

no dashes are added. Then, when the result is moved across the equal sign into variable ABC, it will be further 

padded with blanks if its length is less than 16. 

When the second argument, the length, is omitted, a length of 1 is assumed:


[ SET Heading = PAD ( TRIM (Heading), '.' ) ] 

Blanks are trimmed from the end of variable Heading. If necessary, the resultant value is padded with a dot to 

bring it up to a length of 1. Therefore, only values of Heading that are completely blank, and thus become null 

strings when they are trimmed, are padded. This usage of PAD may be useful in locating blank values or in avoiding 

production of null strings, which may interfere with other procedures. 

PAD is often used with TRIM. Consider: 

[ GEN aa:c16 = 'cow'; 

GEN bb:c16 = LRPAD ( aa, 16, '-' ) ] 

This LRPAD has no effect, because variable aa, being C16, literally contains 'cow '. Variable 

bb will contain the same thing because the input to LRPAD already has 16 characters. However: 

[ GEN aa:c16 = 'cow'; 

GEN bb:c16 = LRPAD ( LRTIRM ( aa ), 16, '-' ) ] 

gives just 'cow' to the LRPAD function, so it will set bb to '------cow-------' 

In this vein, if LRPAD ( LRTRIM ( aa ), 40, '-' ) were used, LRPAD would cheerfully produce 18 dashes 

'cow', and 19 dashes. Then, because bb is character 16, it is truncated to the first 16 of those 40 characters, i.e., a 

series of 16 dashes. 

9.21 Converting Numbers to Characters and Vice Versa 

NUMBER ( '12' // '3' ) = 123 

CHARACTER ( 123 ) = '123' 

CHAREX ( 122596, '00XX00' ) = '25' 

The NUMBER function converts character strings containing digits into numbers. The CHARACTER and CHA- 

REX functions convert numeric values into character representations. However, CHARACTER converts an entire 

numeric variable, whereas CHAREX extracts and converts only specified digits. 

The NUMBER function converts a character value containing digits into numeric form. NUMBER requires 

a character expression as its argument: 

[ SET Year = NUMBER ( SUBSTRING ( Date, 7, 2 ) ) ] 

This instruction takes the seventh and eighth characters from the character variable Date and converts them into 

numeric form. For example, when Date has the value “09/15/29”, Year has the value 29. 

If the character value of the argument is all blank, the number is set to missing type 1. If the result of the 

character expression is not numeric, the number is set to missing type 2. If the input character value is missing, 

the number is set to missing type 3. 

There are three forms of the number function. They differ only when an invalid (missing 2) result occurs. 

NUMBER does not print any warnings when invalid values are found. NUMBER.W prints a diagnostic warning. 

NUMBER.E produces an error message, which ends the command at that moment. 

A numeric variable for a date, suitable as input to the DAYS function, may be generated. Assuming Date is 

a character string of the form 11/19/68, this instruction: 

[ GENERATE Date2 = NUMBER.W ( COMPRESS ( Date, '/' ) ) ] 

squeezes out all slashes in the values of Date and converts them to numbers that can be input to the DAYS function 

to compute differences between dates. A warning message is issued if Date contains any invalid characters, such 

as “-” or “ ” (blank). 

The CHARACTER function converts numbers into character strings. It requires a numeric expression as its 

argument. This instruction:


[ GEN ID:C11 = 

CHARACTER ( Class ) // CHARACTER ( SS.Num ) ] 

generates a character variable named ID whose value is the concatenation of the character representations of the 

numeric variables Class and SS.Num. 

It is possible to use CHARACTER followed by a second argument indicating the number of decimal places 

to preserve in the expression. This is particularly useful if you have income values with decimal places carried up 

to four places and you wish to specify only two decimal places. The second argument indicates the number of 

places to carry in the expression: 

LIST Salary [ GEN Income:C = CHARACTER ( Salary, 2 ) ] $ 

CHARACTER may also be followed by a third argument which indicates the maximum number of places to preserve. 

In the following example, all values have a minimum of two decimal places and those with sufficient digits 

in the decimal portion have a maximum of three places: 

LIST Salary [ GEN Income:C = CHARACTER ( Salary, 2, 3 ) ] $ 

The CHAREX function extracts specific digits from a numeric value and yields a character representation of 

those digits. CHAREX operates only on the integer portion of the number — any fractional portion and sign are 

ignored. The two required arguments are a numeric expression and a character string selection mask enclosed in 

quotes: 

([GEN Month:C2 = CHAREX ( Date, 'XX00' ) ] 

The selection mask is composed of X and 0 (zero) characters and may be up to ten characters in length. An 

X retains a digit and a 0 drops a digit. The selection mask is aligned with the right-most digit of the numeric value. 

Thus, the selection mask “X0X” applied to the numeric value 840921 yields the character representation “91”. 

The selection mask “XX00X” applied to “156” yields “006” because lead zeros pad the numeric value until it is 

the length of the mask. The numeric function NUMEX is similar to CHAREX, but it yields a numeric result. 

9.22 Character/Integer Translation 

CVAL ( 65 ) = 'A' 

IVAL ( 'A' ) = 65 

The PPL functions, CVAL and IVAL, short for character value and integer value, translate a character to an integer 

and vice versa. This permits non-printing characters to be inserted into text strings output by the PUT instruction, 

the LIST command or the TITLES command. Also, all kinds of character values can be compared precisely by 

referring to their integer codes. 

The CVAL function requires an integer between 0 and 255 as its argument. It returns the character equivalent 

of that integer. The IVAL function requires a character value of any size as its argument. It returns the integer 

equivalent of the first character. 

Figure 9.2 illustrates using the CVAL function to output non-printing codes with text to a printer. The example 

codes are appropriate for a personal computer and an Lexmark printer. Despite the fact that the codes are 

specific for these machines, the example illustrates the general procedure of imbedding codes in text strings. The 

character equivalent of 27: 

CVAL (27) 

is an escape code. Many printers require that an escape code, a non-printing signal, precede alphanumeric characters 

to specify various printer parameters. An escape code followed by the character “G”: 

(CVAL(27)) 'G' 'Bold On' (CVAL(27)) 'H' 'Bold Off' 

turns on the double-strike print mode; escape H turns it off. All text printed after escape G is double-struck until 

escape H is processed. Other printer instructions require an escape code followed by the character equivalent of 

an integer. This is a form feed:


(CVAL(27)) (CVAL(12)) 

__________________________________________________________________________ 

Figure 9.2 The CVAL Function for Bells and Whistles 

Enter a command: 

>> PPL (PUT 

@PAGE (CVAL(27)) 'G' 'Bold On' (CVAL(27)) 'H' 'Bold Off' 

@SKIP (CVAL(27)) 'M' 'Elite On' (CVAL(27)) 'P' 'Elite Off' 

@SKIP (CVAL(27)) (CVAL(12)) 'Form Feed' 

@SKIP (CVAL(27)) 'R' (CVAL(7)) 'International Character Set On' 

@SKIP (CVAL(221)) 'Hable Ud. Espa' (CVAL(252)) 'ol?' 

@SKIP (CVAL(27)) 'R' (CVAL(7)) 'International Character Set Off' 

@SKIP (CVAL(27)) (CVAL(14)) 'Enlarged Print On' 

@SKIP (CVAL(27)) '@' 'Initialize Printer' 

@SKIP (CVAL(27)) (CVAL(7)) 'Ring Bell' ), 

PR LPT1 $ 

__________________________________________________________________________ 

Still other instructions require an escape code, a character, and the character equivalent of an integer as an 

option. This selects the Spanish character set on the printer: 

(CVAL(27)) 'R' (CVAL(7)) 'International Character Set On' 

(CVAL(221)) 'Hable Ud. Espa' (CVAL(252)) 'ol?' 

(CVAL(27)) 'R' (CVAL(7)) 'International Character Set Off' 

The upside-down question mark and the Spanish “n” print in the subsequent text, and then the default character 

set is restored. This re-initializes the printer to the normal defaults, 

(CVAL(27)) '@' 

so that subsequent users are not unduly surprised. 

Notice that the PPL command is used to process PPL (P-STAT programming language) instructions. An input 

file is not required and an output file is not produced. The PPL command exists solely to process PPL 

instructions, such as the PUT instruction used in this example. PUT places text strings on the output device, which 

is the printer in this example: 

PR LPT1 $ 

(“LPT1” is the default name for the printer on many personal computers.) The TEXT.WRITER or PROCESS 

commands could be used if the output text strings were to incorporate values from a P-STAT system file. Both 

TEXT.WRITER and PROCESS require an input file. Any P-STAT command could be used — the input filename 

is followed directly by PPL clauses containing PPL instructions. The instructions containing “@” control 

text placement. See the chapter on the TEXT.WRITER command for specifics.


__________________________________________________________________________ 

Figure 9.3 Nesting Functions 

File People: 

Name Birthdate 

Susan 07/08/56 

Marc 01/26/52 

David 03/31/59 

>> SORT People 

[ GEN #Day = 

DAYS ( NUMBER ( COMPRESS ( Birthdate, '/' ) ), 'MMDDYY' ) ; 

GEN #Today = DAYS ( .NDATE., 'YYYYMMDD' ) ; 

GEN Age = INT ( ( 1 + #Today - #Day ) / 365.25 ) ], 

BY Age, 

OUT People.By.Age $ 

>> LIST $ 

Name Birthdate Age 

David 03/31/59 53 

Susan 07/08/56 55 

Marc 01/26/52 60 

__________________________________________________________________________ 

9.23 Complex Character Expressions 

The arguments for character functions are expressions, which must be enclosed in parentheses and separated by 

commas. The simplest expression is a variable name or position. Complex expressions are nested functions, numeric 

constants, quoted character constants (literals or strings), or combinations of these. Combining character 

operators and functions in a series of instructions and procedures permits complex manipulation of character 

variables. 

Figure 9.3 illustrates how numeric and character functions can be used together to create a numeric variable, 

Age, given a character variable Birthdate. COMPRESS is used to squeeze out the slashes from Birthdate. The 

result from COMPRESS is the input to the NUMBER function. The result from the NUMBER function is the first 

argument for the DAYS function. 

Scratch variables are used for the intermediate computations. They are not necessary. The entire series of 

nested functions can be placed in a single PPL phrase: 

[ GEN Age = INT ( ( 1 + DAYS ( .NDATE., 'YYYYMMDD' ) - 

DAYS ( NUMBER ( COMPRESS ( Birthdate, '/' )), 'MMDDYY' )) 

/ 365.25 ) ], 

When functions are nested this way, the possibility for an error in logic is greater than it is when the process is 

broken up into several smaller steps.


The example in Figure 9.3 was run on June 22, 2012. It should be noted that the DAYS function for .NDATE. 

uses 'YYYYMMDD'. This is needed because .NDATE. produces a 4-digit year. 

9.24 Using the Name of a Variable as a Character Value 

The VARNAME function provides the name of a variable: 

[ GEN Last.Missing:C = .M. ; 

DO #L USING Test.1 TO Test.8 ; 

IF V(#L) MISSING, 

SET Last.Missing = VARNAME ( #L ) ; 

ENDDO ] 

The character variable Last.Missing is generated and set equal to missing. The values of Test.1 through Test.8 are 

tested — each one that is missing causes the recoding of Last.Missing to the name of the variable with the missing 

value. Last.Missing would have values of “Test.4”, “Test.7”, “-” (missing), and so on. 

Figure 9.4 illustrates a more complex usage of the VARNAME function. Here, a more compact and informative 

listing of the file Patients is desired. Five new variables named d.Heart, d.Liver, and so on, are created, 

each one set to the name of the corresponding variable in the DO loop: 

DO #J USING Heart TO Back; 

GEN ?( 'd.' & ):C = VARNAME (#J) ; 

ENDDO ; 

At the end of this step, the file has twelve variables, Id, Name, Heart through Back, and d.Heart through d.Back. 

The values of the first case are: 

1001 Jones 0 1 0 1 0 Heart Liver Kidney Brain Back 

The next step tests the five 0/1 variables Heart through Back, and, if any are equal to zero, sets the corresponding 

d.variable to missing: 

DO #J USING Heart TO Back ; 

IF V(#J) EQ 0, SET V( #J+5 ) = .M1. ; 

ENDDO ] 

At the end of the second step, the first case contains: 

1001 Jones 0 1 0 1 0 - Liver - Brain - 

Variables Heart through Back are no longer needed, so they are dropped from the file. 

SPLIT is then used to break each patient case into five cases, one for each of the disease situations. After the 

SPLIT, the cases pertaining to patient Jones are: 

Id Name Disease 

1001 Jones - 

1001 Jones Liver 

1001 Jones - 

1001 Jones Brain 

1001 Jones -


__________________________________________________________________________ 

Figure 9.4 Using VARNAME, SPLIT and COLLECT 


Id Name Heart Liver Kidney Brain Back 

1001 Jones 0 1 0 1 0 

1002 Brown 1 0 0 0 0 

1003 Davis 0 1 1 0 1 

1004 Mason 0 1 1 0 0 

1009 Smith 1 0 0 1 0 

LIST Patients 

[ DO #J USING Heart TO Back; 

GEN ?( 'd.' & ):C = VARNAME (#J); 

ENDDO; 

DO #J USING *; 

IF V(#J) EQ 0, SET V( #J+5 ) = .M1. ; 

ENDDO ] 

[ DROP Heart TO Back; 

SPLIT INTO 5, CARRY ( Id Name ), CREATE Disease d.?; 

IF Disease MISSING, DELETE ] 

[ COLLECT 3, BY Id, CARRY Name ] $ 

Disease Disease Disease 

ID Name .1 .2 .3 

1001 Jones Liver Brain - 

1002 Brown Heart - - 

1003 Davis Liver Kidney Back 

1004 Mason Liver Kidney - 

1009 Smith Heart Brain - 

__________________________________________________________________________ 

Next any case that has a missing value for Disease is deleted, leaving only two cases for Jones and no more than 

three cases for any patient (the maximum number of diseases observed for any patient). The final step: 

[ COLLECT 3, BY Id, CARRY Name ] 

collects the maximum of three cases for each patient back into a single case. Since Jones had only two medical 

problems, he has a missing value for Disease.3. 

9.25 The MATCHES and XMATCHES Operators 

The operator MATCHES tests if a pattern matches the value of a character variable. The pattern is composed of 

characters and symbols, such as the wildcard character “*”. The pattern is supplied within single or double quotes: 

[ IF Company MATCHES 'Con* Ed*', RETAIN ]


This instruction selects cases from the file shown in Figure 9.5 with the following values of Company: 

Con Ed coned Consolidated Education, Inc. 

Consulted, Inc. Connie Edward Conway Medics 

__________________________________________________________________________ 

Figure 9.5 File of Character Data for MATCHES and XMATCHES 

Company 

Con Ed 

coned 

Super Coned 

Corn Fed 

Ed Con 

Consolidated Education, Inc. 

Consulted, Inc. 

Con Ed 

Connie Edward 

Conway Medics 

*CON ED* 

Connie E. Dean, Assoc. 

__________________________________________________________________________ 

It does not select: 

Super Coned Corn Fed Ed Con 

Con Ed *CON ED* Connie E. Dean, Assoc. 

This simple, common usage of MATCHES, with the wildcard character “*” in the pattern, illustrates several 

basic rules of matching: 

1. The pattern is anchored — that is, a match of the first character in the pattern is sought in the first 

character of the character value. (This means that lead or left-most blanks count.) 

2. The wildcard “*” matches zero or more occurrences of any character, including blanks. 

3. Spaces or blanks inside the pattern are ignored, unless they are escaped — enclosed in < >. 

4. Case is not considered, unless XMATCHES is used. 

The pattern may be unanchored by using the wildcard “*” as the first character in the pattern: 

[ IF Company MATCHES '*Con* Ed*', RETAIN ] 

This instruction selects: 



Super Coned Con Ed *CON ED* 

Trailing or right-most blanks are ignored. This instruction, without the “*” as the final character in the pattern, 

selects: 

[ IF Company MATCHES '*Con* Ed', RETAIN ] 

Con Ed coned Super Coned Con Ed 

A complete set of pattern symbols (meta-characters) and syntax exists for MATCHES and XMATCHES, 

making possible any arbitrary selections. Additional PPL may be used with these operators to provide conse-


quences after selections, recode data values and further refine the selection criteria. The next section discusses the 

MATCHES meta-characters and syntax, and Figure 9.6 summarizes them. 

9.26 MATCHES: Meta-Characters and Syntax 

The asterisk “*”, which matches zero or more occurrences of any character, is the most useful and general wildcard 

character. Additional wildcard characters further limit the pattern to be matched. The at-sign “@” matches 

zero or more blanks. It is useful if there may be lead blanks that should be ignored. For example, this instruction: 

selects: 

selects: 

[IF Company MATCHES '@Con* Ed', RETAIN ] 

Con Ed coned 

Con Ed 

The question mark “?” matches any single character. This instruction: 

[ IF Company MATCHES '?Con* Ed?', RETAIN ] 

*CON ED* 

Other wildcards match specific single characters. The crosshatch or number sign “#” matches any single digit, the 

dollar sign “$” matches any single letter, and the underscore “_” matches a single blank. This instruction: 

selects this case: 

[ IF Company MATCHES 'Con_Ed', RETAIN ] 

Con Ed 

The underscore matches the single blank in the center. Strings with lead blanks are not selected because the pattern 

is anchored on the left. Trailing blanks are ignored. 

Character strings that contain meta-characters may be matched by escaping the meta-characters. Escaping 

removes the special meaning of a meta-character. The backslash “\” and the angle signs “< >” are escape characters. 

Any character directly after the slash or enclosed between the angle signs is treated as a literal character: 

[ IF Company MATCHES '\* * ', RETAIN ] 


*CON ED* 

The first and third asterisks in the pattern are literal characters. They match only asterisks. The middle asterisk 

is a meta-character that matches zero or more of any characters. Thus, a string of two or more characters that begins 

and ends with an asterisk is selected. 

The escape characters are also used to match characters that may not print. The characters are referenced by 

their decimal or octal (base 8) integer equivalents. The slash is used for octal numbers and the angle signs are used 

for decimal numbers. This instruction: 

[ IF Company MATCHES '* *', RETAIN ] 

selects character strings containing a tab character in them. (009 is the decimal equivalent of the tab character in 

the ASCII character codes.) 

Parentheses and square brackets are enclosures that specify, respectively, a literal string of characters and a 

single character to match. A literal string of characters may be specified with or without parentheses. These two 

instructions are equivalent:


__________________________________________________________________________ 

Figure 9.6 MATCHES and XMATCHES: Meta-Characters 

In General: 

< > [ ] * @ _ ? # $ 0 1 + 

Within [ ]: 

^ - _ # $ ] 

Within ( ): 

| _ ) 

Escape Characters: 

\ < > 

Pattern Syntax: 

@ zero or more blanks 

* zero or more of any character 

? a single character 

# a single digit 

$ a single letter 

_ a single blank 

a literal character (an asterisk ) 

\# a literal character (a crosshatch) 

a decimal number 

\009 an octal number 

Enclosures: 

( abc ) a literal string of characters 

abc same as ( abc ) 

( abc | xyz ) abc or xyz 

[ abc ] a single character: a or b or c 

[ a-z ] a single letter in the range a through z 

[ $ ] same as [ a-z ] 

[ 0-9 ] a single number in the range 0 through 9 

[ # ] same as [ 0-9 ] 

[ _ ] a single blank 

[ ^$ ] a single character that is NOT a letter 

Repetitions after Enclosures: 

1 1 a single match (the default) 

0 1 zero or one matches 

0 + zero or more matches 

1 + one or more matches 

0 same as 0 1 

+ same as 1 + 

__________________________________________________________________________


[ IF Company MATCHES '(Con) * (Ed) *', DELETE ] 

[ IF Company MATCHES ' Con * Ed *', DELETE ] 

Notice that the blanks in the pattern contribute to its readability. Blanks are ignored unless they are escaped, and 

they may be omitted if desired. 

Parentheses are typically used when one character string or another is to be matched: 

[ IF Company MATCHES '(Con | Corn) * Ed', RETAIN ] 


Con Ed coned Corn Fed 

The vertical bar character “|” means “or” and the parentheses limit the character strings that are to be “or-ed”. Note 

that merely the juxtaposition of character strings means “and” — that is, this pattern: 

[ IF Company MATCHES 'Con Corn * Ed', RETAIN ] 

matches only character values that have “ConCorn” followed by zero or more of any character followed by “Ed”. 

If a blank is sought between “Con” and “Corn”, the pattern should be specified with an underscore: 

[ IF Company MATCHES 'Con_Corn * Ed', RETAIN ] 

Square bracket enclosures specify a single character to match. Typically, that character may be one of several 

in the enclosure that is repeated a specified number of times. For example, this instruction: 

[ IF Company MATCHES 'Con [ _ n s ] * Ed *', RETAIN ] 

selects these values: 

Con Ed Consolidated Education, Inc. 

Consulted, Inc. Connie Edward 

The string “Con” is followed by a blank, an “n” or an “s” and that is followed by zero or more of any characters, 

the string “Ed” and zero or more of any characters. 

A repetition specification may follow either type of enclosure. Possible repetitions are: 11, 01, 0+ and 1+. 

The repetition 1 1 is the default that is assumed when nothing follows an enclosure — one match. The repetition 

0 1 means zero or one matches, 0 + means zero or more matches and 1 + means one or more matches. This 

instruction: 

selects: 

This: 

selects: 

[ IF Company MATCHES 'Co (n)1 + *', RETAIN ] 



Connie E. Dean, Assoc. 

[ IF Company MATCHES 'Co (n)0 1 *', RETAIN ] 



Corn Fed Connie E. Dean, Assoc. 

Notice that the repetition 1 + matches one or more occurrences of “n” and that 0 1 matches zero or one occurrence. 

Thus, Corn Fed is included in the second group of matches (it has zero occurrences of “n” after the “Co”). The


strings with two occurrences of “n” are also included because of the wildcard “*” that makes any character valid 

after the zero or one “n”. 

It is also possible to specify a range from which a single character is valid as a match and characters that are 

not valid as matches. The square brackets are used. This instruction: 

selects: 

[ IF Company MATCHES '[ ^$ ^# ] *', RETAIN ] 

*CON ED* 

Con Ed 

The meta-character “^” means not. Thus, the enclosure above specifies a single character that is not a letter and 

not a number as the first character. (The “$” means any letter and the “#” means any number.) This instruction 

means the same thing, but uses ranges in the pattern instead of the meta-characters “$” and “#”: 

[ IF Company MATCHES '[ ^a-z ^A-Z ^0-9 ] *', RETAIN ] 

The hyphen “-” is used in ranges. When the “^” is omitted, then any character in the specified range is a valid 

match. Figure 9.6 summarizes the meta-characters. 

9.27 CLAG: A Lag using a character argument 

CLAG is a function that performs a lag on a character argument, which can be an expression. 

GEN PREVIOUS.TITLE:c30 = CLAG( JOB.TITLE, 12 ) 

This would take the JOB.TITLE value from 12 cases ago and copy it into PREVIOUS.TITLE of the current case. 

The second argument, the lag depth, must be an integer constant from 1 to 500. 

9.28 CONCATENATION OF CHARACTER CONSTANTS 

There is a special operator (&&) that permits dynamic concatenation of character constants in a command or in 

PPL. It is most useful in situations such as macros where there is an 80 character limit on record size. 

MAKE Myfile, FILE 

 

&& 

; 

There is no particular limit on the number of pieces and they can be enclosed in either angle brackets or quotation 

marks. 

[ GEN text:c200 = 

 

&& 

"The sentence in the text contains a single >. " 

&& 

'A third piece is needed to complete the variable.' ] 

In command text and in PPL the && structure may be used anywhere that a character constant 

may be used.



SUMMARY 

Character variables are modified by functions and operators. Some are specifically for character variables 

and others may be used with either character or numeric variables. Functions and operators are 

grouped below according to their usages. 

PPL Functions: Character 

The arguments for character functions are expressions. The simplest expression is a variable name or 

position (vnp). Complex expressions are nested functions, numeric constants (nn), quoted character constants 

or strings ('cs'), or combinations of these. 

Abbreviations following the functions indicate the type of argument that should result from the evaluation 

of the expression. 

BLANK (exp, loc, len) 

specifies a character value that is to have blank characters replace existing characters. The second argument 

in the BLANK function gives the start location. The third argument, which is optional, gives the 

length of the area to be made blank. This example: 

[ GEN New.Tel:C8 = BLANK ( Tel, 5, 4 ) ] 

will replace characters five through eight of a telephone number with blanks. When the third argument 

is omitted, the expression is filled with blanks from the start location through the end. 

This example will blank out all but the first letter of Last.Name: 

LIST Diet.Clients 

[ KEEP Last.Name Weight Pounds.Lost ; 

SET Last.Name = BLANK ( Last.Name, 2 ) ] $ 

An alternate usage mode exists for the BLANK function. Its general format is: 

BLANK (exp, old, nn) 

The first argument specifies a character value that may contain a specified substring; if so, it is to be 

blanked. The second argument yields the character string that may be present in the first expression. 

When it is present, it is replaced by blank characters. The optional third argument is the number of times 

to find and replace the character string; one change is assumed. This PPL phrase: 

[ SET Comments = BLANK ( Comments, 'damn', 10 ) ] 

replaces up to ten occurrences of the word “damn” in the variable Comments with an equivalent number 

of blanks. 

XBLANK (exp, old, nn) 

blanks out specified characters — functions just like BLANK's alternate mode of operation, but respects 

the case (upper, lower or mixed) of the “old” string: 

[ SET Symptom = XBLANK ( Symptom, 'D', 9 ) ] 

Upper-case “D” is blanked out in values of Symptom; lower-case “d" is ignored. 

loc=location len=length lim=delimiter exp=expression nn=number cs=char string


CAPS (exp) 

capitalizes the first letter of each token (word) in a character value: 

[ GEN Name:C = CAPS ( 'JOHN paul JoNeS' ) ] 

Letters other than the first are changed to lower case. The default is that tokens are separated by blanks. 

The output appears like this: 

John Paul Jones 

In a more complex usage, an additional or replacement token delimiter is supplied in quotes as a second 

argument: 

[ GEN Name:C = CAPS ( 'ann hayden-jones', '- ' ) ] 

CENTER (exp) 

centers a character value in its field: 

[ SET Surname = CENTER ( Surname ) ] 

CHANGE (exp, old, new, nn) 

specifies a character value possibly containing an “old” character string that is to be changed to a “new” 

character string. The first argument to the CHANGE function is a character expressio.n The second argument 

is the old string and the third is the new string. An optional fourth argument is the maximum 

number of changes to make per value; one change is assumed. 

In this example: 

[ SET College = CHANGE ( College, 'University', 'Univ', 3 ) ] 

the character variable College will have old values of “University” changed to new values of “Univ”. A 

maximum of three such changes per value of College is specified. The resultant values of College will 

be shorter wherever this change is made, and the listing may be more attractive with the abbreviation. 

CHANGE, without a new argument, removes the old string: 

[ SET College = CHANGE ( College, 'ersity', 3 ) ] 

The old string “ersity” is changed to a null string. This (probably) achieves the same result as the previous 

example. 

An alternate usage of the CHANGE function has the format: 

CHANGE ( exp, loc, len, new ) 

The first argument is a character expression, the second is the start location of the old string, the third is 

the length of the old string, and the fourth is the new character string: 

[ SET Date = CHANGE ( Date, 7, 2, '85' ) ] 

Values of Date in the form 11/08/84 are changed to 11/08/85. 

XCHANGE (exp, old, new, nn) 

changes character strings — functions just like CHANGE, but respects the case of the “old” string: 

[ SET Sal = CHANGE ( Sal, 'm', 'Ms.', 1 ) ] 

Only lower-case “m” would be changed. 

CHARACTER (exp) 

converts a number into its character equivalent: 

[ GEN Code:C3 = CHARACTER ( Area.Code ) ] 

exp=expression nn=number cs=char string loc=location len=length lim=delimiter


It is possible to use CHARACTER followed by a second argument indicating the number of decimal 

places to preserve in the expression. This might be useful if you have income values with decimal places 

carried up to four places and you wish to specify only two decimal places. The second argument indicates 

the number of places to carry in the expression: 

LIST Salary [ GEN Income:C = CHARACTER ( Salary, 2 ) ] $ 

CHARACTER may also be followed by a third argument that indicates the maximum number of places 

to print: 

LIST Salary [ GEN Income:C = CHARACTER ( Salary, 2, 3 ) ] $ 

MAKE.CHARACTER described in the manual “P-STAT: File Management” can be used to change a 

numeric variable into a character variable or to resize a character variable. 

CHAREX (exp, 'XX00') 

extracts specific digits from a numeric value and yields a character representation of those digits. CHA- 

REX operates only on the integer portion of the number — any fractional portion and sign are ignored. 

The two required arguments are a numeric expression and a character string mask enclosed in quotes: 

[ GEN Month:C2 = CHAREX ( Date, 'XX00' ) ] 

The selection mask is composed of X and 0 (zero) characters and may be up to twelve characters in 

length. An X retains a digit and a 0 drops a digit. The selection mask is aligned with the right-most digit 

of the numeric value. The numeric function NUMEX does much the same thing but yields a numeric 

result without the lead zeros. 

CLAG (exp, nn ) 

CLAG is function that performs a lag on a character argument, which can be an expression. 

COMPRESS (exp) 

squeezes all blanks out of a character value: 

[ SET Text = COMPRESS ( Text ) ] 

Leading, trailing and embedded blanks are removed. 

An expanded mode of usage has optional arguments: 

COMPRESS ( exp, nn, lim ) 

The second argument is the number of delimiter characters that should remain between tokens (“words”). 

The third argument is an alternate delimiter character or characters other than the blank — a token separator. 

This example: 

[ SET Text = COMPRESS ( Text, 1 ) ] 

squeezes out all blanks but one from between words. This: 

[ GEN Amount = NUMBER ( COMPRESS ( Money, ', $' ) ) ] 

generates a numeric variable, Amount, equal to the character variable Money after all commas, blanks 

and currency signs are compressed out. 

CVAL (nn) 

gives the character equivalent of the specified decimal number. This is used when unusual characters 

that cannot be entered or printed on the terminal screen are desired. Often these characters can be produced 

on a printer. An example of this is: 

[ PUT (CVAL(27)) 'R' (CVAL(7)) 

(CVAL(221)) “Hable Ud. Espa” (CVAL(252)) “ol?” ] 



where the CVAL of the number 27 followed by “R” and the CVAL of 7 specifies the Spanish international 

character set. The CVAL of the numbers 221 and 252 yields the upside-down question mark and 

the Spanish “n”, respectively. 

IVAL ('c') 

gives the integer equivalent of the first character of a character value. This is the opposite of CVAL, described 

above. Since the integer equivalents of uppercase and lowercase characters are different, IVAL 

can be used in tests of equality of character values that respect case. (See the XEQ operator also.) 

LEFT (exp) 

left-justifies a character value in its field: 

[ SET Street.Address = LEFT ( Street.Address ) ] 

LENGTH (exp) 

yields a numeric value length, which is the location of the right-most non-blank character: 

[ GEN HS.Length = LENGTH ( High.School ) ] 

(Leading and embedded blanks are included in the count.) 

LOWER (exp) 

converts a character value to lowercase characters: 

[ SET Region = LOWER ( Region ) ] 

NUMBER (exp) 

converts a character value into a number: 

[ GEN Year = NUMBER ( SUBSTRING ( Date, 7, 2 ) ) ] 

If the character value is all blank, the result is set to missing type 1. If the character value contains characters 

other than numbers, the result is set to missing type 2. If the character value is missing, the result 

is set to missing type 3. 

NUMBER may be suffixed with either “.W” or “.E”. NUMBER.W issues a warning and NUMBER.E 

stops the command with an error message when the character value contains characters other than 

numbers. 

If you wish to change the type of a character variable to numeric, you may use the MAKE.NUMERIC 

command which is described in manual “P-STAT” File Management”. 

PAD (exp, len, fill) 

specifies a character value which is to be “padded” on the right side. The first argument to PAD is the 

character expression to be padded, the second is the minimum length, and the third is an optional fill character 

to be used for padding. If only one argument is supplied, a minimum length of 1 and a blank fill 

character are assumed. This example: 

[ GEN Zip:C10 = PAD ( Zipcode, 10, '-' ) ] 

pads the variable Zipcode with dashes on the right side. If Zipcode initially has a length of five, it is padded 

until its length is ten. If Zipcode initially has a length of ten or more characters, it is not be padded. 

RPAD is a synonym for PAD. 

LPAD (exp, len, fill) 

specifies a character value which is to be “padded” with blanks or a supplied fill character on the left side. 

This example: 



[ SET Message = LPAD ( LRTRIM ( Message ), 16, '>' ) ] 

pads the trimmed variable Message with the “>” character on the left side. See also PAD and LRPAD. 

LRPAD (exp, len, fill) 

specifies a character value, a minimum length, and an optional fill character to be used for padding the 

character value. Padding will occur evenly on both the left and right sides of the expression. The right 

side will be padded first. If the character expression is already equal to or greater than the specified 

length, no padding will take place. A length of 1 and a blank fill character are assumed when none are 

specified. See also LPAD and PAD. This is often used on an LRTRIM result. 

POSITION (exp, 'cs') 

yields a numeric value which is the position of the character string within the character value: 

[ GEN Blank.Location = POSITION ( Name, ' ' ) ] 

Values match regardless of whether they are uppercase or lowercase. If the second value is not found, 

the result is zero. 

A more complex usage of POSITION is also possible. The general form is: 

POSITION ( exp, 'cs', 'cs', 'cs',..., len ) 

Additional optional arguments are multiple character strings whose positions in the character variable are 

sought. Only the left-most position of any successfully located string is given. 

An integer between 1 and 50,000 may be supplied as the right-most argument giving a length. It permits 

the contents of the character strings, whose positions are being sought, to be divided into strings of the 

specified length. Each portion of the divided string is treated as a separate argument and its position is 

located. The character values must be evenly divisible by the length. 

Examples of usages include: 

POSITION ( City.State, ',' , '.' ) 

POSITION ( 'ABCDEF', 'AC', 'BC', 'DE', 'DF' ) 

POSITION ( City.State, 'NYNJ', 2 ) 

POSITION ( 'ABCDEF', 'AEIOU', 1 ) 

XPOSITION (exp, 'cs') 

gives the position of the specified character string in the value — works just like POSITION, but respects 

the case of the character string in searching. 

RIGHT (exp) 

right-justifies a character value in its field: 

[ SET Zip = RIGHT ( Zip) ] 

SIZE (exp) 

yields a numeric value giving the size of the specified character value. The size includes any blanks, embedded 

or otherwise. It is typically either the defined size or the size resulting after various character 

function procedures or operations. 

SUBSTRING (exp, loc, len) 

yields a character value which is the string beginning in the location specified by the second argument 

and of the length specified by the third argument. If the optional starting location is not given, it is assumed 

to be 1. If the optional length is not given, it is assumed to be the remainder of the string. For 

example: 



[ GEN Initial:C1 = SUBSTRING ( LEFT ( Name ), 1, 1 ] 

yields the substring of name beginning at the first character and one character long. 

TOKEN (exp, nn, lim) 

accesses a portion of a longer character string: 

[ GEN First.Name:C = TOKEN ( Name ) ] 

The first token starts with the first non-delimiter on the left and continues until a subsequent delimiter is 

found. TOKEN accesses the first token unless the optional second argument specifies a token in another 

position. The assumed delimiter is a blank unless the optional third argument specifies another delimiter 

or delimiters. LTOKEN is a synonym for TOKEN. 

RTOKEN (exp, nn, lim) 

accesses tokens counting from the right: 

[ GEN Last.Name:C = RTOKEN ( Name ) ; 

IF RTOKEN ( Name ) CONTAINS '.', 

SET Last.Name = RTOKEN ( Name, 2 ) ] 

NTOKEN (exp, lim) 

yields a numeric value that is a count of the tokens in the first character value. If the optional second 

argument is not provided, the delimiter between tokens is assumed to be the blank: 

[ GEN #Middle.Name:C; 

GEN #Number = NTOKEN ( Name ) ; 

IF #Number GT 2, 

SET Middle.Name = TOKEN ( Name, 2 ) ] 

TRIM (exp, nn, 'cs') 

specifies a character value that may have characters trimmed from the right side, and the characters to 

trim. An optional number limits that number of characters to be trimmed. The resultant character value 

will have a shorter size if the specified characters exist on the right and trimming occurs. If a trim character 

is not specified, the blank character is assumed: 

[ SET Name = TRIM ( Name ) ] 

In this example, the variable Name will be set equal to Name with all blank characters trimmed off from 

the right side. Multiple trim characters may be specified: 

[ GEN Text:C = TRIM ( Var1, '.,-' ) ] 

Any of the specified characters occurring on the right end of the variable Var1 will be removed. RTRIM 

is a synonym for TRIM. See also LTRIM and LRTRIM. 

LTRIM (exp, nn, 'cs') 

specifies a character value that may have characters trimmed from the left side, and the characters to trim. 

All matching characters will be trimmed unless the optional second argument specifies a limit. See also 

RTRIM and LRTRIM. 

LRTRIM (exp, nn, 'cs') 

specifies a character value that may have characters trimmed from both the left and right sides. The characters 

to be trimmed are an optional argument. If no characters are specified, blank characters will be 

trimmed. When trimming takes place, the resultant variable size may be shorter than it was initially. LR- 



PAD is often done on an LRTRIM result. The second argument is optional. If it is used, it limits the 

number of characters that are trimmed. 

UPPER (exp) 

converts a character value to uppercase characters: 

[ SET State = UPPER ( State ) ] 

VARNAME (exp) 

yields a character value that is the name of the variable in the expression: 

[ GEN Primary.Disease:C = .M. ; 

DO #J USING Heart TO Skin ; 

IF V(#J) EQ 1 AND Primary.Disease EQ .M. , 

SET Primary.Disease = VARNAME ( #J ); 

ENDDO ] 

The new character variable, Primary.Disease, will have values of missing, unless any of the variables 

Heart through Skin has a value of 1. Then Primary.Disease will have the name of the first of those variables 

as its value. 

VERIFY (exp, 'cs', 'cs') 

yields a numeric value which is the location of the first character in the initial argument which is NOT 

found in any of the remaining arguments. Thus, the presence of only specified characters may be 

verified: 

[ GEN Error = 

VERIFY ( Char.Income, '0123456789', ' $.,' ] 

Arguments 2 and 3 could have been combined into a single argument. 

PPL Functions: Character and Numeric 

The following functions operate on either character or numeric variable lists. However, numeric and 

character variables may not be combined in one list. The list may reference variables by name or position. 

The functions operate on character variables in the same manner that they operate on numeric 

variables. 

COUNT.GOOD (vnp, vnp) 

gives the number of non-missing values of the variables specified in the list. Only variable names or positions 

may be included in the list. 

FIRST.GOOD (vnp, vnp) 

gives the first good (non-missing) value of the variables specified in the list. Only variable names or positions 


LAST.GOOD (vnp, vnp) 

gives the last good (non-missing) value of the variables specified in the list. Only variable names or positions 


PPL Operators: Character 

Concatenation operators are used to combine character values or expressions. 



MODIFY List 

[ GEN Mail.Name:C36; 

IF Sex EQ 1, SET Mail.Name = 

'Mr. ' // First.Name /// Last.Name ; 

IF Sex EQ 2, SET Mail.Name = 

'Ms. ' // First.Name /// Last.Name ], OUT Mail $ 

// concatenation 

connects the character strings before and after the double slashes: 

[ GEN Name:C32 = First.Name // ' ' // Last.Name ] 

The double slash operator abuts the character strings end-to-end. Blank portions of each field are 

included: 

Jennifer Smith 

/// trim concatenation 

concatenation connects the two character strings after trimming leading and trailing blanks. 

[ GEN Name:C32 = First.Name /// Last.Name ] 

The triple slash operator abuts the trimmed character strings end-to-end and then inserts a blank between 

the strings: 

Jennifer Smith 

&& dynamic concatenation of character constants 

character constants can be dynamically concatenated in the command language and PPL by using the && 

operator. 

[ GEN Cvar:c130 = && "bbb" && 'ccc' ] 

PPL Operators: Logical 

In general, the following logical operators evaluate two expressions. The expressions may be variables, 

values and functions, except for: 

• AMONG and NOTAMONG, whose arguments are lists of values and variables, 

• GOOD and MISSING, which do not have arguments, and 

• MATCHES, which has a character string argument. 

All of these logical operators are appropriate for character data. AMONG, NOTAMONG, GOOD and 

MISSING are also appropriate for numeric data. Numeric and character expressions may not be mixed 

in one argument list. Character constants must be enclosed in quotes. 

AMONG (list of values and variables) 

is true, false or missing depending on whether a value is one of the specified values: 

[ IF State AMONG ( 'NJ', 'N.J.', 'New Jersey' ), RETAIN ] 

or in the specified range: 

[ IF Name AMONG ('A' TO 'FZZ' ), RETAIN ] 



Inclusion in the range is based on the sort order of the character strings, which may differ among 

computers. 

XAMONG (list of values and variables) 

specifies case-respecting comparisons — like AMONG in all other aspects. 

CONTAINS 'cs' or exp 

is true, false or missing, depending on whether the character value argument is present: 

[ IF Address CONTAINS '08540' , RETAIN ] 

[ IF Address CONTAINS TRIM( Zip ), RETAIN ] 

In the first example, cases with values of Address containing “08540” are retained; in the second, cases 

in which the Zip characters are also present in Address are retained. 

XCONTAINS 'cs' or exp 

specifies case-respecting evaluations — like CONTAINS in all other aspects. 

XEQ exactly EQ 

GOOD 

tests whether two character expressions are exactly equal in both specific characters and case. 

[ IF Initials XEQ 'JW', SET Name = 'Jim Wolf, Sr.' ] 

[ IF Initials XEQ 'jw', SET Name = 'Jim Wolf, Jr.' ] 

The operators XNE, XLT, XLE, XGT and XGE are similar — case and characters must be identical in 

string comparisons. 

is true or false depending on whether the value is present (good) or missing. GOOD combines = and .G. : 

[ IF Address GOOD , RETAIN ] or 

[ IF Address = .G. , RETAIN ] 

MATCHES 'cs' 

is true, false or missing, depending on whether the character string argument matches the value of a character 

variable. The case of the characters is not significant. The character string argument may include 

meta-characters that define or limit matches: 

[ IF Food MATCHES '*beef*', RETAIN ] 

In this example, the meta-character “*” is a wildcard that matches zero or more occurrences of any character. 

Thus, any cases in which the variable Food contains the string “beef” are continued. Values of 

Food such as the following are considered matches: 

Beef Roast Beef Beefsteak 

Some of the meta-characters that may be used in the character string argument are: 

* zero or more of any character @ zero or more blanks 

? a single character # a single digit 

_ a single blank $ a single letter 

\# a literal character (the #) a literal character (the *) 

(abc) a literal string (ab|bc) ab or bc 

abc same as (abc) [abc] a or b or c 

[a-z] a single letter in this range [$] same as [a-z] 



[0-9] a single number in this range [#] same as [0-9] 

[ _ ] a single blank [^$] a single character that is not a letter 

[#]11 a single match [#]01 zero or one matches 

[#]0+ zero or more matches [#]1+ one or more matches 

XMATCHES 'cs' 

MISSING 

specifies case-respecting matches — like MATCHES in all other aspects. 

is true or false depending on whether the value is present (good) or missing. MISSING combines = and 

.M. : 

[ IF Address MISSING , DELETE ] or 

[ IF Address EQ .M. , DELETE ] 

NOTAMONG (list of values and variables) 

is true, false or missing depending on whether a value is not among the specified values: 

[ IF State NOTAMONG 

( 'NJ', 'N.J.', 'New Jersey' ), DELETE ] 

or not in the specified range: 

[ IF Name NOTAMONG ( 'A' TO 'FZZ' ), DELETE ) 

XNOTAMONG (list of values and variables) 

specifies case-respecting comparisons — like NOTAMONG in all other aspects. 


10 

PPL: Date and Time 

Commands and Functions 

The first section of this chapter describes the default format of date values, and describes the extensive set of date 

and time functions such as ADD.DAYS. The second section describes eight commands that may be used to 

change the default ordering and appearance of new date values. The third section describes the six date-related 

logical operators in PPL that compare dates. The final section contains complete details on the FORMAT.DATE 

function which is used to provide templates that describe exactly how a date should appear in the printout. 

10.1 DATE ANDTIME FUNCTIONS 

A date value is not a special datatype, it is simply a P-STAT character value that contains a 4-digit year from 1753 

to 2999, a month value and a day from 1 to 31. It may have time in hh:mm:ss or hh:mm form. The seconds may 

have up to 3 places, like 12:13:14.567 . It may also have the day of the week. 

Most date functions read an input date, do something to it, and write a date result, formatted in the same way 

as the input value. By formatting, we mean the ordering of the fields within the date value, such as ‘jan 1 1992’ 

or ‘1992 January 1’ or such. 

A function like CURRENT.DATE has no input to serve as a format for the output, so it uses the default format. 

A P-STAT run begins with the default format looking like 

'Tues Jan 1, 2002 18:52:04' . 

Note the size: these formats can use 30 or more characters. 

This default appearance can be changed by the DATE.ORDER command, and by several other commands 

that control things like a month name appearing as Jan or jan or January, etc. FORMAT.DATE, described in the 

final section is a general and powerful function for specifying the exact appearance to be used when including 

dates in the printed output. 

10.2 Functions Which create or Use Dates 

1. DAY.MONTH.YEAR creates date from integer or character argument. 

2. DAY.YEAR.MONTH creates date from integer or character argument. 

3. MONTH.DAY.YEAR creates date from integer or character argument. 

4. MONTH.YEAR.DAY creates date from integer or character argument. 

5. YEAR.DAY.MONTH creates date from integer or character argument. 

6. YEAR.MONTH.DAY creates date from integer or character argument. 

7. MAKE.DATE creates a date from numeric input. 

8. CURRENT.DATE provides today’s date and time. 

9. REFORMAT.DATE changes the format of a date value. 

10. STATUS.DATE shows if a date is valid, if it has time, etc.

10.2 PPL:Date and Time Commands and Functions 

11. DAYS returns days since 1/1/1753 for a date. 

12. SECONDS returns seconds since 1/1/1753 for a date. 

13. SECONDS.MIDNIGHT returns seconds since midnight for a date. 

14. UNDO.DAYS reverses the DAYS function. 

15. UNDO.SECONDS reverses the SECONDS function. 

16. FISCAL.YEAR returns the fiscal year of a date. 

17. FISCAL.QUARTER returns the fiscal quarter of a date. 

18. QUARTER returns the calendar quarter of a date. 

19. DAY.WITHIN.WEEK returns 1 to 7, the day within a week.. 

20. DAY.WITHIN.YEAR returns 1 to 366, the day within a year 

21. WEEK.WITHIN YEAR retursn 0 to 53, the week withi 

22. ADD.MONTHS add some months to a date. 

23. ADD.DAYS add some days to a date. 

24. ADD.HOURS add some hours to a date. 

25. ADD.MINUTES add some minutes to a date. 

26. ADD.SECONDS add some seconds to a date. 

27. SUBTRACT.YEARS subtract some years from a date. 

28. SUBTRACT.MONTHS subtract some months from a date. 

29. SUBTRACT.DAYS subtract some days from a date. 

30. SUBTRACT.HOURS subtract some hours from a date. 

31. SUBTRACT.MINUTES subtract some minutes from a date. 

32. SUBTRACT.SECONDS subtract some seconds from a date. 

33. EXTRACT.YEARS return numeric years from a date. 

34. EXTRACT.MONTHS return numeric months from a date. 

35. EXTRACT.DAYS return numeric days from a date. 

36. EXTRACT.HOURS return numeric hours from a date. 

37. EXTRACT.MINUTES return numeric minutes from a date. 

38. EXTRACT.SECONDS return numeric seconds from a date. 

39. EXTRACT.CC return 2-digit numeric century from a date. (19 from 1983) 

40. EXTRACT.YY return 2-digit numeric year within century. (83 from 1983) 

41. EXTRACT.DATE return a copy of the input, dropping time. 

42. EXTRACT.TIME return a copy of the input, dropping date.

PPL: Date and Time Commands and Functions 10.3 

43. EXTRACT WEEKDAY return the weekday name. 

44. CHANGE.YEARS change the years field in a date. 

45. CHANGE.MONTHS change the months field in a date. 

46. CHANGE.DAYS change the days field in a date. 

47. CHANGE.HOURS change the hours field in a date. 

48. CHANGE.MINUTES change the minutes field in a date. 

49. CHANGE.SECONDS change the seconds field in a date. 

50. DIF.YEARS difference between 2 dates in years. 

51. DIF.MONTHS difference between 2 dates in months. 

52. DIF.DAYS difference between 2 dates in days. 

53. DIF.HOURS difference between 2 dates in hours. 

54. DIF.MINUTES difference between 2 dates in minutes. 

55. DIF.SECONDS difference between 2 dates in seconds. 

10.3 Six Simple Date Functions 

The 6 simple date functions make a character date value from either numeric input like 12252005 or 

20051225, or from character input like ’12/25/2005’. The order of the three segments should be consistent 

with the function name. In other words, if 12252005 is meant to be month 12, day 25, and year 

2005, the MONTH.DAY.YEAR function should be chosen. 

For example: 

MONTH.DAY.YEAR ( 12252005 ) = ’Sun Dec 25, 2005’ 

MONTH.DAY.YEAR ( ’12--25--2005’ ) = ’Sun Dec 25, 2005’ 

A numeric argument can be an integer of 3 to 8 digits. If 3, 4 or 5 digits, lead zeros are assumed to 

bring it up to 6 total digits, and the yy form of year is assumed. In that case, the default is to assume 

20yy. Thus, year.month.day( 225) produces Friday Feb 25, 2000. 

If 7 digits, one lead zero is assumed, and the yyyy form of year is assumed. 

A character argument should contain either one or three integers, like ’12252005’ or ’12/25/2005’. 

If just one integer, it is treated like the numeric input. 

Non-digits are treated as separators. Thus, ’***12abc25///2005 ’ will produce 12, 25 and 2005. 

A second argument may be supplied to specify the century to be used for 2-digit years. Centuries from 

1700 through 2900 are allowed, shown by values of 17 throught 29 , or by 1700, 1800, 1900, etc. 

PUT ( DAY.MONTH.YEAR ( 10042006 )) $ or 

PUT ( DAY.MONTH.YEAR ( "4/10/2006" )) $ 

produces “Mon April 10, 2006” 

PUT ( DAY.YEAR.MONTH ( 10200604 )) $ 

also produces Mon April 10, 2006. 

PUT ( MONTH.YEAR.DAY ( 4200610 )) $


You can forget the lead zero and the function can figure out what you mean. But note in the first 2 

examples above the 0 in 04 for the day is necessary. The following 3 examples also produce the same 

result. 

PUT ( MONTH.YEAR.DAY ( 4102006 )) $ 

PUT ( YEAR.DAY.MONTH ( 20060410 )) $ 

PUT ( YEAR.MONTH.DAY ( 20061004 )) $ 

10.4 DATE and TIME function details. 

1. MAKE.DATE (year, month, day ) >>> date 

MAKE.DATE (year, month, day, hour, minute, second) >>> date 

MAKE.DATE (year, month, day, hms, 'mask' ) >>> date 

MAKE.DATE (ymd, 'mask' ) >>> date 

MAKE.DATE (ymd, 'mask', hour, minute, second) >>> date 

MAKE.DATE (ymd, 'mask', hms, 'mask' ) >>> date 

This makes a character date value from numeric values. The function must have 2 or 3 date arguments, 

and may also have 2 or 3 time arguments. These examples show the input as constants, but 

they can be variables or expressions of any complexity. 

The result is a character date, formatted in the current default format. P-STAT starts a run with the 

default format set to the following template: 

'Tues Jan 1, 2002 18:52:04' . 

The date can be provided using separate arguments for year, month and day. Alternatively, those three 

values can be compressed into one integer, like 19971225, followed by a mask or template in quotes 

to show how to parse the compressed integer. 

Time is provided similarly. However, seconds may have a fractional part of up to three places; in that 

case the three argument form must be used. Some examples: 

MAKE.DATE ( 1997, 12, 25 ) = 'Thurs Dec 25, 1997' 

MAKE.DATE ( 1997, 12, 25, 

23, 59, 59 ) = 'Thurs Dec 25, 1997 23:59:59' 

A mask like ‘yyyymmdd’ is used when year, month and day are combined into one integer, like 

19971225. 

A mask like ‘hhmmss’ is used when hour, minute and second are combined into one integer, like 

235959. 

MAKE.DATE ( 12251997, 'mmddyyyy', 

235959, 'hhmmss' ) 

= 'Thurs Dec 25, 1997 23:59:59' 

If the year mask in a template has yy (rather than yyyy), the yy may be preceded by a 2-digit century 

from 17 to 29; if not, 20 is assumed. For example: 

MAKE.DATE ( 122595, 'mmdd19yy' ) 

MAKE.DATE ( 122595, 'mmddyy' ) 

The first would produce 'Mon Dec 25, 1995'. 

The second would produce 'Sun Dec 25, 2095'. 

2. CURRENT.DATE () >>> date 

Make a character date value from the current date and time. The current default format is used to construct 

the result.


CURRENT.DATE () = ‘Sun June 23, 2002 12:26:49’ 

The empty parentheses are needed to persuade the PPL processor that this is indeed a function. 

3. REFORMAT.DATE ( date1 ) >>> date 

REFORMAT.DATE ( date1, date2 ) >>> date 

Take a date input and reformat it. If a second argument is supplied, it will be used as a formatting template. 

If not, the current default format is used. 

REFORMAT.DATE ( 'June 23, 2002' ) = 'Sun June 23, 2002' 

Several points about the above example: 

(1) There was no second argument, therefore the current default format would be used which, unless 

changed earlier in the run, would produce the above result. 

(2) The default has day-of-week, therefore this result gets day-of-week also. 

(3) The default has time, but this input does not have time, so none is put into the result. 

REFORMAT.DATE ( 'JUNE 23, 2002', '1997, dec, 25' ) = 

'2002, June, 23' 

Note in the above that the use of ‘dec’ was used for the ordering of elements, but did not change the 

naming style. The MONTH.LENGTH and similar commands can be used to change how names are 

written. 

4. STATUS.DATE ( date ) >>> integer 

Take a date input and produce a numeric result, from -3 to 2, which indicates the usability of the input. 

STATUS.DATE ( 'jan 1, 1997 12:13:14' ) = 2 

STATUS.DATE ( 'jan 1, 1997 ' ) = 1 

STATUS.DATE ( 'jan 1 ' ) = 0 

STATUS.DATE ( .M1. ) = -1 



A result of 2 means the input is a valid date value which also contains time. 

A result of 1 means the input is a valid date value but does not contain time. 

A result of 0 means the input is not missing, but is nonetheless not a valid date value (missing year). 

A result of -1, -2 or -3 means the imput is missing: -1 is missing 1, etc. 

5. DAYS ( date ) >>> integer 

The DAYS function takes an input date value and produces the number of days in that value since Jan 

1, 1753. Time, if there, is ignored. The result of the DAYS function can be used to sort on the date, 

with no concern about time within date. 

DAYS ( 'jan 1, 1753' ) = 1 

DAYS ( 'jan 1, 2002 12:13:14' ) = 90,946 

There is an older form of DAYS function that has two arguments, a 6 or 8 digit year-month-day integer 

like 19981225, and a mask like ‘yyyymmdd’. This form, which returned days since jan 1,1900, is 

being de-documented but will be supported for some years. 

The new form is recognized by its having just 1 argument.


6. SECONDS ( date ) >>> number 

The SECONDS function takes an input date value and produces the number of seconds since 00:00:00 

on Jan 1, 1753. If the input lacks a time field, the result is missing. The result of the SECONDS function 

can be used to sort on time within date. 

SECONDS ( 'jan 1, 2002 12:13:14' ) = 7,857,691,994 

7. SECONDS.MIDNIGHT ( date ) >>> number 

The SECONDS.MIDNIGHT function takes the time element of an input date value and produces the 

number of seconds since midnight. The date element is ignored. If time is not there, the result is missing. 

The result of the SECONDS.MIDNIGHT function can be used to sort on the time, with no 

concern about the date. 

SECONDS.MIDNIGHT ( 'jan 1, 2002 00:00:00' ) = 0 

SECONDS.MIDNIGHT ( 'jan 1, 2002 12:13:14' ) = 43,994 

SECONDS.MIDNIGHT ( 'jan 1, 2002 23:59:59.12' ) = 86,399.12 

8. UNDO.DAYS ( number ) >>> date 

The UNDO.DAYS function takes the result of the DAYS function and re-creates the date. Note, only 

the date is recovered, since time information is not carried in the DAYS result. 

UNDO.DAYS ( 90946 ) = 'Tues Jan 1, 2002' 

UNDO.DAYS ( DAYS('jan 1 2002 12:13:14') = 'Tues Jan 1, 2002' 

9. UNDO.SECONDS ( number ) >>> date 

The UNDO.SECONDS function takes the result of the SECONDS function and re-creates the date 

and time. 

UNDO.SECONDS ( 7857691994 ) = 'Tues Jan 1, 2002 12:13:14' 

10. FISCAL.YEAR ( date, integer ) >>> integer 

The FISCAL.YEAR function requires a second argument: the ending month of the fiscal year. This 

can be 1 through 12, but is usually 6, 9 or 12: 

6 for fiscal years ending on June 30. 

9 for the Sept 30 fiscal year end (U.S.Government). 

12 for a Dec 31 ending of a calendar year. 

FISCAL.YEAR ( 'sept 15,2001', 9 ) = 2001 

FISCAL.YEAR ( 'oct 15,2001', 9 ) = 2002 

FISCAL.YEAR ( 'dec 31,2001', 6 ) = 2002 

FISCAL.YEAR ( 'dec 31,2001', 12 ) = 2001 

11. FISCAL.QUARTER ( date, integer ) >>> integer 

The FISCAL.QUARTER function requires a second argument: the ending month of the fiscal year. 

This can be 1 through 12, but is usually 6, 9 or 12: 

6 for fiscal years ending on June 30. 

9 for the Sept 30 fiscal year end (U.S.Government). 

12 for a Dec 31 ending of a calendar year. 

FISCAL.QUARTER ( 'jan 10,2001', 6 ) = 3 

FISCAL.QUARTER ( 'jan 10,2001', 9 ) = 2 

FISCAL.QUARTER ( 'jan 10,2001', 12 ) = 1


12. QUARTER ( date ) >>> integer 

The QUARTER function returns the calendar year quarter; it is the same as FISCAL.QUARTER with 

a second argument of twelve. 

QUARTER ( 'jan 10,2001' ) = 1 

QUARTER ( 'dec 10,2001' ) = 4 

13. DAY.WITHIN.WEEK ( date ) >>> integer 

The DAY.WITHIN.WEEK function returns an integer from 1 to 7. The default is for the week to begin 

on Monday, so that a Monday returns 1, Tuesday 2, and Sunday 7. If a weekday name in quotes 

is given as a second argument, that day will be treated as day 1 in the function. 

Note, Dec 25,2002 is a Wednesday. 

DAY.WITHIN.WEEK ( 'dec 25, 2002' ) = 3 

DAY.WITHIN.WEEK ( 'dec 25, 2002', 'Sunday' ) = 4 

DAY.WITHIN.WEEK ( 'dec 25, 2002', 'sat' ) = 5 

14. DAY.WITHIN.YEAR ( date ) >>> integer 

The DAY.WITHIN.YEAR function returns an integer from 1 to 366. January 1 is always 1, and December 

31 will return 365 in non-leap years, and 366 in leap years. 

DAY.WITHIN.YEAR ( 'jan 11, 2001' ) = 11 

DAY.WITHIN.YEAR ( 'feb 11, 2001' ) = 42 

DAY.WITHIN.YEAR ( 'dec 25, 2001' ) = 359 

DAY.WITHIN.YEAR ( 'dec 25, 2004' ) = 360 

15 WEEK.WITHIN.YEAR ( data, integer) >>> integer 

This function returns the week number within the year for the supplied date. The range can be 0 to 53, 

depending on the date and on the calculation method. 

There are two methods for determining what constitutes week one of a given year. 

The first method is simple: the first week goes from Jan 1 through Jan 7. This can be called an AB- 

SOLUTE week. 

The second method makes use of a calendar week, which is defined by ISO 8061 as going from Monday 

through Sunday. 

The first week is the first CALENDAR week that contains a sufficient number of days within the current 

year. Sufficient can be set to 1 through 7; the ISO standard is 4. This function assumes a Mon- 

Sun calendar week; a different calendar week can be given in the function. The arguments are: 

1. A character date value, variable or expression like ’Jan 4,2004’. 

2. An integer constant from 0 to 7. This selects the method to be used to define the first week of the 

year. 

0: This uses the absolute week. The first week is Jan 1 through Jan 7. The result can be from 

1 to 53. The third argument, if provided, is ignored. 

1-7: These use the calendar week. The 1 to 7 specify the minimum number of days needed to 

constitute an acceptable first week. 

For example, suppose the calendar week is Mon-Sun and Jan 4 is a Sunday. Is an initial 4-day 

week sufficient to be used as week 1 ? If this argument is 1 to 4, yes. If insufficient, the partial 

week becomes week 0, and the next calendar week is week 1.


The ISO 8061 standard is 4, which therefore accepts the first calendar week that has a majority 

of its days in the current year. 

Using 7 would cause the first full calendar week to be week 1. 

3. An optional character constant which contains the starting day of the calendar week to be used 

instead of the default Monday to Sunday week. This can be a full name like ’Tuesday’, or an 

abbreviation like ’Wed’. 

*********************************** 

* examples using ABSOLUTE weeks * 

*********************************** 

Week.within.year ( ’jan 3 2004’, 0 ) = 1 



Week.within.year ( ’dec 31 2004’, 0 ) = 53 

************************************ 

* examples using the default * 

* Monday to Sunday calendar week * 

************************************ 

mon tue wed thu fri sat sun 

1 2 3 4 

5 6 7 8 9 10 11 

12 13 14 15 16 17 18 






************************************** 

* examples using an alternative * 

* Sunday to Saturday calendar week * 

************************************** 

sun mon tue wed thu fri sat 

1 2 3 

4 5 6 7 8 9 10 

11 12 13 14 15 16 17 

Week.within.year ( ’jan 3 2004’, 1, ’sun’ ) = 1 






Week.within.year ( ’jan 4 2004’, 7, ’sun’ ) = 1


16. ADD.YEARS ( date, 1 to 6 numbers ) >>> date 

17. ADD.MONTHS ( date, 1 to 5 numbers ) >>> date 

18. ADD.DAYS ( date, 1 to 4 numbers ) >>> date 

19. ADD.HOURS ( date, 1 to 3 numbers ) >>> date 

20. ADD.MINUTES ( date, 1 to 2 numbers ) >>> date 

21. ADD.SECONDS ( date, 1 number ) >>> date 

22. SUBTRACT.YEARS ( date, 1 to 6 numbers ) >>> date 

23. SUBTRACT.MONTHS ( date, 1 to 5 numbers ) >>> date 

24. SUBTRACT.DAYS ( date, 1 to 4 numbers ) >>> date 

25. SUBTRACT.HOURS ( date, 1 to 3 numbers ) >>> date 

26. SUBTRACT.MINUTES ( date, 1 to 2 numbers ) >>> date 

27. SUBTRACT.SECONDS ( date, 1 number ) >>> date 

Each of these has a date value as its first argument, followed by one or more date/time amounts to be 

added or subtracted. 

The ADD.YEARS function, for example, treats the required second argument as the number of years 

to be added; months, days, hours, minutes and seconds can also be supplied. For example: 

ADD.YEARS ( 'jan 1, 1991', 3 ) = 'Jan 1, 1994' 

ADD.YEARS ( 'jan 1, 1991', 3, 1 ) = 'Feb 1, 1994' 

ADD.YEARS ( 'jan 1, 1991', 3, 1,10 ) = 'Feb 11, 1994' 

Since the function in the above 3 examples was ADD.YEARS, the initial element (i.e., argument two) 

is years. If yet another argument follows, it is treated as months, and so on. 

ADD.YEARS ( 'jan 1, 1991 10:10:10', 1,2,3,4,5,6) 

= 'March 4, 1992 14:15:16' 

The above adds 1 year, 2 months, 3 days, 4 hours, 5 minutes and 6 seconds to jan 1,1991 at 10:10:10. 

A subtract of the same amount could be done: 

SUBTRACT.YEARS ( 'march 4, 1992 14:15:16', 1,2,3,4,5,6) 

= 'Jan 1, 1991 10:10:10' 

Some additional examples using scratch variable ##d, which is set to ‘jan 1 1991 10:10:10’ for the 

function input: 

ADD.DAYS ( ##d, 100 ) = 'April 11 1991 10:10:10' 

ADD.DAYS ( ##d, 1000 ) = 'Sept 27 1993 10:10:10' 

ADD.DAYS ( ##d, 1000,0,0,1 ) = 'Sept 27 1993 10:10:11' 

ADD.DAYS ( ##d, 1000,0,0,1.5) = 'Sept 27 1993 10:10:11.5' 

ADD.MINUTES ( ##d, 1000 ) = 'Jan 2 1991 02:50:10' 

ADD.MINUTES ( ##d, -20 ) = missing, invalid argument 

ADD.MINUTES ( ##d, 1,2,3 ) = error, too many arguments 

These functions process the years field first, then the months field (which could further change the 

years field), and so on.


********************************* 

* limitations in using * 

* ADD.YEARS SUBTRACT.YEARS * 

* ADD.MONTHS SUBTRACT.MONTHS * 

********************************* 

These 4 functions are of limited usefulness because they can quite easily produce an invalid date, 

which causes the function to issue a missing result. 

For example, adding one year to feb 29,1992 would produce feb 29 in 1993, which is invalid because 

1993 was not a leap year. Similarly, adding 1 month to aug 31,2001 produces sept 31,2001, invalid 

because september hath but 30 days. 

These functions first check the date validity after processing year and month; it is checked again after 

any additional elements have been processed. 

The other 8 functions in this group (ADD.DAYS and such) all produce sensible, reversible results. 

28. EXTRACT.YEARS ( date ) >>> integer, 1753 to 2999 

29. EXTRACT.MONTHS ( date ) >>> integer, 1 to 12 

30. EXTRACT.DAYS ( date ) >>> integer, 1 to 31 

31. EXTRACT.HOURS ( date ) >>> integer, 0 to 23 

32. EXTRACT.MINUTES ( date ) >>> integer, 0 to 59 

33. EXTRACT.SECONDS ( date ) >>> integer, 0 to 59 

34. EXTRACT.CC ( date ) >>> integer, 17 to 29 

35. EXTRACT.YY ( date ) >>> integer, 0 to 99 

36. EXTRACT.DATE ( date ) >>> character date value 

37. EXTRACT.TIME ( date ) >>> character time value 

38. EXTRACT.WEEKDAY ( date ) >>> character weekday name 

EXTRACT.YEARS ( 'jan 5 1991 10:11:12' ) = 1991 

EXTRACT.MONTHS ( 'jan 5 1991 10:11:12' ) = 1 

EXTRACT.DAYS ( 'jan 5 1991 10:11:12' ) = 5 

EXTRACT.HOURS ( 'jan 5 1991 10:11:12' ) = 10 

EXTRACT.MINUTES( 'jan 5 1991 10:11:12' ) = 11 

EXTRACT.SECONDS( 'jan 5 1991 10:11:12' ) = 12 

EXTRACT.CC ( 'jan 5 1991 10:11:12' ) = 19 (century) 

EXTRACT.YY ( 'jan 5 1991 10:11:12' ) = 91 

EXTRACT.DATE ( 'jan 5 1991 10:11:12' ) = 'jan 5 1991' 

EXTRACT.TIME ( 'jan 5 1991 10:11:12' ) = '10:11:12' 

EXTRACT.WEEKDAY( 'jan 5 1991 10:11:12' ) = 'Sat' 

The result of extract.date will contain the day of week only if the input argument has day-of-week.


39. CHANGE.YEARS ( date, 1 to 6 numbers ) >>> date 

40. CHARGE.MONTHS ( date, 1 to 5 numbers ) >>> date 

41. CHANGE.DAYS ( date, 1 to 4 numbers ) >>> date 

42. CHANGE.HOURS ( date, 1 to 3 numbers ) >>> date 

43. CHANGE.MINUTES ( date, 1 to 2 numbers ) >>> date 

44. CHANGE.SECONDS ( date, 1 number ) >>> date 

These six functions are used to change specific elements within a date-time value without affecting 

the other elements of the value. 

Each of these has a date value as its first argument, and then one or more date or time elements as 

subsequent arguments. 

In the CHANGE.MONTHS function, for example, the argument after the input date must be an integer 

from 1 to 12. This provides the changed month element to be placed into the function result. A 

third argument, if given, would then be treated as a days element, and so forth. 

In these examples, we assume that character scratch variable ##d has been set to: 

'jan 1 1991 10:10:10'. 

CHANGE.YEARS ( ##d, 1992 ) = 'Jan 1 1992 10:10:10' 

CHANGE.MONTHS ( ##d, 2 ) = 'Feb 1 1991 10:10:10' 

CHANGE.DAYS ( ##d, 8 ) = 'Jan 8 1991 10:10:10' 

CHANGE.HOURS ( ##d, 11 ) = 'Jan 1 1991 11:10:10' 

CHANGE.MINUTES( ##d, 12 ) = 'Jan 1 1991 10:12:10' 

CHANGE.SECONDS( ##d, 13 ) = 'Jan 1 1991 10:10:13' 

As with functions like ADD.YEARS, additional arguments can be supplied to change several fields 

at once. 

CHANGE.YEARS ( ##d, 1992, 2, 8, 11, 12, 13 ) = 

'Feb 8 1992 11:12:13' 

CHANGE.HOURS (##d, 11, 12, 13) = 'Jan 1 1991 11:12:13'. 

CHANGE.DAYS (##d, 8, 11 ) = 'Jan 8 1991 11:10:10'. 

The above change.hours example has three values after the input date. Since the function is 

change.hours, 

argument 2 (11) is treated as an HOURS change, 

argument 3( 12) is treated as a MINUTES change, and 

argument 4( 13) is treated as a SECONDS change. 

The above change.days has two arguments after the input date; these are treated as days (because of 

the function name) and hours (the next time element after days). 

CHANGE.DAYS ( 'Jan 1 1991', 3, 21, 22, 23 ) = 

'Jan 3 1991 21:22:23'. 

If the function has values for hours, minutes and seconds, they are placed in the result even when the 

input did not have any time fields.


45. DIF.YEARS ( date, date ) >>> number 

46. DIF.MONTHS ( date, date ) >>> number 

47. DIF.DAYS ( date, date ) >>> number 

48. DIF.HOURS ( date, date ) >>> number 

49. DIF.MINUTES ( date, date ) >>> number 

50. DIF.SECONDS ( date, date ) >>> number 

The first two arguments are the date values being compared. It does not matter which is the first argument, 

i.e., 

DIF.DAYS( date1, date2 ) = DIF.DAYS( date2, date1 ). 

An optional third argument can be used to limit the calculation; using 2, for example, causes only the 

first two elements, years and months, to be looked at. 

DIF.YEARS ( 'jan 1,1992', 'feb 3,1993' ) = 1.090411 

DIF.YEARS ( 'jan 1,1992', 'feb 3,1993', 1 ) = 1. 

DIF.MONTHS( 'jan 1,1992', 'feb 3,1993' ) = 13.071429 

DIF.MONTHS( 'jan 1,1992', 'feb 3,1993', 2 ) = 13. 

DIF.DAYS ( 'jan 1,1992', 'feb 3,1993' ) = 399. 

DIF.DAYS ( 'jan 1,1992 12:00:00', 

'feb 3,1993 15:00:00' ) = 399.125 

DIF.HOURS ( 'jan 1,1992 12:00:00', 

'feb 3,1993 15:00:00' ) = 9,579. 

DIF.MINUTES('jan 1,1992 12:00:00', 

'feb 3,1993 15:00:00' ) = 574,740. 

DIF.SECONDS('jan 1,1992 12:00:00', 

'feb 3,1993 15:00:00' ) = 34,484,400. 

DIF.SECONDS('jan 1,1992 12:00:00.2', 

'feb 3,1993 15:00:00' ) = 34,484,399.8 

*********************************************** 

* NOTE: DIF.YEARS and DIF.MONTHS are both * 

* counting time elements of varying lengths * 

*********************************************** 

Since years can have differing lengths ( 365 or 366 days), and months are even worse ( 28 or 29 or 30 

or 31 days), the dif.years and dif.months functions produce results which reflect the somewhat arbitrary 

choices on how to compute them. 

DIF.YEARS ( 'feb 4,1992', 'mar 7,1993' ) = 1.0849315 

DIF.MONTHS( 'feb 4,1992', 'mar 7,1993' ) = 13.0967742 

In the above dif.years example, there is one full year from feb 4,1992 to feb 4,1993, and then 31 more 

days to march 7,1993. The fractional year is then 31/365, which is 0.0849315. The 365 is the distance 

from feb 4,1993 to the end of the next full year, feb 4,1994. 

If the earlier date is a feb 29, one day is subtracted from both dates to simplify the calculations. 

In the above dif.months example, there are 13 full months from feb 4,1992 to march 4,1993. The fractional 

part is 3/31, or 0.0967742. The 3 is the distance from march 4 to march 7, and the 31 is the 

distance from march 4 to april 4, the end of the next full month.


If the day of the month of the earlier date is more than 28, from one to three days are subtracted from 

both dates to simplify the calculations. 

These two functions could well be coded in a different manner that gives slightly different results in 

the fractional part. The coding and results of DIF.DAYS, DIF.HOURS, DIF.MINUTES and 

DIF.SECONDS, on the other hand, are straightforward. 

******************************************* 

* using the third argument: * 

* doing DIF.YEARS, DIF.DAYS, etc. * 

* on just the initial parts of the date * 

******************************************* 

A third argument may be supplied: it is the extent of the year-month-day-hour-minute-second fields 

that should be used in computing the difference. The fields beyond that level will be ignored. 

DIF.YEARS ( 'feb 4,1992', 'mar 7,1993', 1 ) = 1. 

DIF.YEARS ( 'feb 4,1992', 'mar 7,1993', 2 ) = 1.0833333 

DIF.YEARS ( 'feb 4,1992', 'mar 7,1993', 3 ) = 1.0849315 

The limit of 1 in DIF.YEARS nullifies all but the year field in the two arguments. Therefore, the difference 

between 1992 and 1993 is simply 1. 

The limit of 2 in DIF.YEARS nullifies all but the year and month fields in the two arguments. Therefore, 

the difference is calculated between feb 1992 and mar 1993. The result is 1 1/12 years. 

The limit of 3 in DIF.YEARS nullifies all but the year, month and day fields in the two arguments. As 

it happens, the arguments did not have time fields so using the limit had no effect. 

The result is 1 plus 32/366, the 32 because the days field is being used. 

********************************************* 

* doing DIF.YEARS, DIF.MONTHS or DIF.DAYS * 

* while ignoring the time fields * 

********************************************* 

Suppose you are using DIF.YEARS, DIF.MONTHS or DIF.DAYS and have no interest in the time 

fields of the arguments. If the arguments lack time fields anyhow, there is obviously no problem, but 

suppose some do and some don’t ? 

The default for a function like DIF.DAYS is to use all available fields, so if one argument has a time 

field and the other does not, the result will be set to missing. You could use 

DIF.DAYS ( extract.date( arg1 ), extract.date( arg2) ). 

However, using a limit of 3 does the same thing. 

DIF.DAYS ( arg1, arg2, 3). 

10.5 DATE AND TIME COMMANDS 

A P-STAT run begins with the date language set to English. Therefore, date values being read are expected to have 

English month and weekday names, and date values being created will be given English names. Also, names being 

written will be capitalized and abbreviated, like Jan or Tues. 

If a function takes an input date value and creates a resulting date value, the result will be ordered in the same way 

as the input. In other words, if the input starts with the monthname, so will the output. 

If there is no input to be used as a format, a default ordering that looks like ‘Wed Aug 7, 2002 10:15:22’ is used. 

The following eight commands may be used to change the default language, ordering and name style of dates.


DATE.LANGUAGE changes the language of month and weekday names. English and German 

are supported. 

DATE.ORDER changes the default format ordering. 

MONTH.CASE changes the case of month names. Uppercase, lowercase and capitalized are 

supported. 

WEEKDAY.CASE same for weekday names. 

MONTH.LENGTH changes the length of month names. Full length or abbreviated are 

supported. 

WEEKDAY.LENGTH same for weekday names. 

MONTH.NAMES provides month name abbreviations to be used. 

WEEKDAY.NAMES provides weekday name abbreviations to be used. 

10.6 The DATE.LANGUAGE Command 

P-STAT carries full and abbreviated month and weekday names in both English and German. The default language 

is English. 

DATE.LANGUAGE GERMAN $ 

would switch the active language to German. 

A function like ADD.DAYS ( ‘Oct 10, 1992’, 1 ) will compare OCT to the full month names of the currently 

active language, and accept a full match or the best partial match. It will use the abbreviated names or, if requested, 

the full names of the current language to construct a result date. 

The default English month name abbreviations are: 

jan feb march april may june july aug sept oct nov dec. 

The default English weekday name abbreviations are: 

mon tues wed thurs fri sat sun. 

The default German month name abbreviations are: 

jan feb marz apr mai juni juli aug sept okt nov dez. 

The default German weekday name abbreviations are: 

mo di mi do fr sa so. 

When reading a date in German, both SAMSTAG and SONNABEND are recognized as Saturday, but what 

of abbreviations like SO, SON or SONN? They are all accepted as SONNTAG (Sunday). 

10.7 The DATE.ORDER Command 

DATE.ORDER '2 june 2002 (sun) 12:12:12' $ 

The DATE.ORDER command changes the default ordering for a date to the order shown in the command. 

Blanks, dashes, commas, slashes and parentheses may be freely used to create a particular date appearance. 

When one of the date functions writes a date value, the components of the value will be written in a certain 

ORDER. The order determines things like where the year should be, if the weekday name should be included, 

and if time should be included. 

Also, the value will be written in a certain style. Style consists of language (English or German), case (MAY 

or may or May), and length (Jan or January). For example, 

’Wed Aug 7, 2002 10:05:26’


is an ordering that consists of weekday, month, day, comma, year and time, with blanks as shown. The names are 

abbreviated and capitalized (first letter uppercase, the rest lowercase). This is, in fact, the default date format. 

The default date order is used only when there is nothing else to use. If a date function has an input date, like 

ADD.DAYS, the result will have the same ordering as the input. However, the naming style of the input can be 

ambiguous: is May an abbreviation or a full month name? Therefore, the default style (case, length and language) 

is used for names. If a date function does not have an input date, like CURRENT.DATE(), default ordering and 

style are used. 

Therefore, using 

1. The default ordering is: 'Tues Jan 1, 2002 12:34:56'. 

2. The default style is: English, abbreviated, capitalized. 

PUT (CURRENT.DATE())$ 

would produce something like: 

Wed June 2, 2002 14:01:19. 

The time field can be omitted from dates, as can the weekday name. 

DATE.ORDER 'june 2 (mon) 1999' $ 

Here, since time is not included, functions that do not have a character date input to use as an output template will 

write a date output that does not include time. 

10.8 Changing the Case and Length of names 

The default for both month names and weekday names is capitalized and abbreviated (ie.e, Jan or Tues). There 

are 2 commands which affect the case of names as they are written. MONTH.CASE affects month names, 

WEEKDAY.CASE affects weekday names. 

MONTH.CASE upper $ 

WEEKDAY.CASE capitalized $ 

UPPER causes names to be entirely in upper case. LOWER causes names to be entirely in lower case. CAP- 

ITALIZED causes names to have the initial letter in upper case, and the rest in lower case. 

The following commands affect the length of names as they are written. FULL causes names to be written 

in their entirety: January. ABBREVIATED causes names to be written in a short form: Jan. 

MONTH.LENGTH FULL $ 

WEEKDAY.LENGTH ABBREVIATED $ 

10.9 Month and Weekday Names 

There are 2 commands which can be used to alter the default abbreviations: MONTH.NAMES and WEEK- 

DAY.NAMES. These commands override the default abbreviations. They must, however themselves be 

abbreviations of the current full names. MONTH.NAMES requires 12 arguments and WEEKDAY.NAMES requires 

7 arguments. 

MONTH.NAMES jan feb mar apr may jun jul aug sep oct nov dec $ 

WEEKDAY.NAMES mon tue wed thu fri sat sun $ 

The names can each be quoted or unquoted, or the entire set of names can be in one quoted string. 

WEEKDAY.NAMES mon 'tue' wed thu fri 'sat' sun $ 

WEEKDAY.NAMES 'mon tue wed thu fri sat sun' $


__________________________________________________________________________ 

Figure 10.1 DATE Logical Operators 

Test Values 

date1 = ’jan 12,1991 12:01:00’ 

date2 = ’may 23,1991 12:08:00’ 

date3 = ’may 23,1991 12:08:00’ 

date4 = ’may 23,1991 22:08:00’ 

date5 = ’may 23,1991 ’ 

Tests using logical operators 

The tests The Result 

[ if date1 DATE.GT date2, false ] 

[ if date1 AFTER date2, false, same as DATE.GT ] 

[ if date3 DATE.GE date2, true ] 

[ if date1 DATE.EQ date2, false ] 

[ if date1 DATE.LE date2, true ] 

[ if date1 DATE.LT date2, true ] 

[ if date1 BEFORE date2, true, same as DATE.LT ] 


[ if date4 DATE.EQ date5, missing ] 

[ if extract.date(date4) DATE.EQ date5, true ] 

__________________________________________________________________________ 

10.10 DATE LOGICAL OPERATORS 

There are 6 logical operators that can be used to compare date values. They are: 

1. DATE.GT (AFTER can also be used) 

2. DATE.GE 

3. DATE.EQ 

4. DATE.NE 

5. DATE.LE 

6. DATE.LT (BEFORE can also be used) 

Each examines two date values, which can be expressions. A date value MUST have a date field (year-monthday) 

and MAY have a time field (hour-minute-second). These date and time fields are treated in the date compares 

as if they were two separate BY variables in a sort. 

If the two date values differ at the year-month-day level, there is no need to look at time, so it doesn’t matter 

if one value has a time field and the other does not. However, if the two year-month-day fields are the same, what 

happens if one value has a time field and the other does not? 

1. If neither has time, the result is equal.


2. If both have time, the times are compared, yielding a result. 

3. If one has time and the other doesn’t, the result is missing 

The final three examples in Figure 10.1 deal with TIME issues. 


[ if date4 DATE.EQ date5, missing ] 

[ if extract.date(date4) DATE.EQ date5, true ] 

When we compare date1 with date5, the year-month-day values differ, so we can get a FALSE result even though 

one has time and the other does not. Date4 and date5, however, do not differ on year-month-day. If neither had 

time, the result would be equal, but since one has time and the other does not, the result is missing. 

Using EXTRACT.DATE gets rid of the time field in date4, so the compare with timeless date5 produces a 

non-missing result. 

10.11 FORMAT.DATE 

FORMAT.DATE is a date/dime function that provides considerable flexibility in formatting a date-time value. 

It has two arguments: the character value to be formatted, and the format to be used for it. A P-STAT date/ 

time value is an ordinary variable, often sized character*40, that holds date time information. Creating date-time 

variables was covered in considerable detail in the early parts of this chapter. This section describes how to print 

this information in the formats that you prefer. 

PUT ( Current.date ( ) )$ results in something like 

Mon Oct 24, 2011 11:21:36 

The current.date function has no arguments. It produces the current date and time in the default form. FOR- 

MAT.DATE expects the initial argument to hold a value in a similar format. A format consists of format specifiers 

(like dd for days) and separator characters (like :). The format determines which date/time elements are separator 

characters. 

The format will often be provided by a character constant within the function. It can, however, be placed in 

a permanent character scratch variable, as in FORMAT.DATE ( ddd, ##someformat ) . Blanks are significant. 

Given aug 28, 2011, 

'yyyymmdd' produces 20110828 . 

'yyyy mm dd' produces 2011 08 28 

The caret (^) will not be placed in the result, and can therefore be used to make a format more readable. 

'yyyy^mm^dd' does the same thing as 'yyyymmdd'. 

Any other character is copied as is, such as the : in hh:mm:ss or the / in mm/dd/yyyy. 

yyyy year, in 4-digit form, like 2011. 

yy year, in 2-digit form, like 11. 

month month, full name, like september. 

mon month, abbreviated name, like sept. 

n.month month, 1 to 12, ie, numeric month. 

mm month, 1 to 12 if usage is clear, 

like yy/mm/dd. same as n.month . 

dd day, 1 to 31. 

hh hour, 0 to 23. 

n.minute minute, 0 to 59. ie, numeric minute. 

mm minute, 0 to 59 if usage is clear, 

like hh:mm:ss. same as n.minute . 

ss second, 0 to 59, can have up to 3 places, 

like 34.178 .


ord ordinal, the day within year, 1 to 366. 

jjj (for julian) does the same. 

Ordinal has become the accepted name. 

day.of.week weekday, full name, like monday. 

dow weekday, abbreviation, like mon. 

am puts hours in 1-12 form, and then uses 

am, pm, noon, midnight to clarify. 

These are placed where the 'am' was found. 

a.m. same thing, but uses a.m. and p.m. . 

date causes mm/dd/yyyy to be used. 

time causes hh:mm:ss to be used. 

The default is to show hours in 0 to 23 form, which is sometimes called military time. The format specifier 'am' 

causes hours to appear in 1 to 12 form, along with one of am, pm, noon, and midnight. The am (or pm, etc) is 

placed where the specifier was. Using a specifier of a.m. causes a.m. (or p.m.) to be used instead. 

Examples of converting 24-hour to 12-hour mode. 

00:00:00 becomes 12:00:00 midnight. 

00:00:01 becomes 12:00:01 am. 

01:00:00 becomes 01:00:00 am. 

12:00:00 becomes 12:00:00 noon. 

12:00:01 becomes 12:00:01 pm. 

13:00:00 becomes 01:00:00 pm. 

The case used for names like Monday in the result is controlled by the case of the format word that was used. 

Using day.of.week will get 'monday'. Using Day.of.week will get 'Monday'. Using DAY.OF.WEEK will get 

'MONDAY'. This is done for full and abbreviated month names, full and abbreviated weekday names, and for a.m. 

and am. Lead zeros are printed, except for days when month is a name. Consider Jan 2, 1995 5:06:07. 

'date time' gets 01/02/1995 05:06:07 . 

However 

'Month dd, yyyy' gets January 2, 1995. 

__________________________________________________________________________ 

Figure 10.2 FORMAT.DATE 

MAKE work1, VARS year month day hour min sec; 

1995 3 1 10 13 15 

2004 2 9 21 22 23 $ 

LIST work1 [ GENERATE dt1:c40 TO MAKE.DATE 

(year, month, day, hour, min, sec ) ] 

[ GENERATE dt2:c40 TO FORMAT.DATE 

( dt1, 'yyyy-mm-dd time a.m. dow' ) ] 

[ KEEP dt1 dt2 ] $ 

dt1 dt2 

Wed March 1, 1995 10:13:15 1995-03-01 10:13:15 a.m. wed 

Mon Feb 9, 2004 21:22:23 2004-02-09 09:22:23 p.m. mon 

__________________________________________________________________________


The first step in using P-STAT’s date routines is to store the date in date variable format. 

The second step if to provide one or more date templates to use then the date is printed. Here are four different 

date templates and the resulting character string stored in variable “this.date” for ##FMT1 and ##FMT3 

__________________________________________________________________________ 

Figure 10.3 FORMAT.DATE Example 

GEN ##DAT1:c = DAY.MONTH.YEAR ( 13042012 ) 

##FMT1:c = 'Month-dd-yyyy' April-13-2012 

##FMT2.c = 'mon dd yy’ april 13 12 

##FMT3.c = 'dd/n.month/yy’ 13/05/12 

##FMT4.c = 'Dow Mon dd yyyy; Fri April 13, 2012 

GEN ##this.date:c = FORMAT.DATE ( ##dat1, ##FMT1 ) $ 

PUT ##this.date $ 

APRIL-13-2012 

GEN ##this.date:c = FORMAT.DATE ( ##dat1, ##FMT3 ) $ 

PUT ##this.date $ 

13/04/12 

__________________________________________________________________________


DATE AND TIME FUNCTIONS 

DAY.MONTH.YEAR nn or “cs” 

converts an integer or character argument day.month.year order to a character date. 

DAY.YEAR.MONTH nn or “cs” 

converts an integer or character argument in day.year.month order to a character date. 

MONTH.DAY.YEAR nn or “cs” 

converts an integer or character argument in month.day.year order to a character date. 

MONTH.YEAR.DAY nn or “cs” 

converts an integer or character argument in month.year.day order to a character date. 

YEAR.DAY.MONTH nn or “cs” 

converts an integer or character argument in year.day.month order to a character date. 

YEAR.MONTH.DAY nn or “cs” 

MAKE.DATE 

converts an integer or character argument in year.month.day order to a character date. 

creates a date from numeric input. 

SUMMARY 

MAKE.DATE (year, month, day ) >>> date 

MAKE.DATE (year, month, day, hour, minute, second) >>> date 

MAKE.DATE (year, month, day, hms, ‘mask’ ) >>> date 

MAKE.DATE (ymd, ‘mask’ ) >>> date 

MAKE.DATE (ymd, ‘mask’, hour, minute, second) >>> date 

MAKE.DATE (ymd, ‘mask’, hms, ‘mask’ ) >>> date 

CURRENT.DATE 

provides today’s date and time. 

REFORMAT.DATE ( ddd, ddd ) 

changes the format of a date value. If the second argument is supplied it is used as a formatting template. 

REFORMAT.DATE ( date1, date2 ) >>> date 

REFORMAT.DATE ( ‘June 23, 2002’ ) >>> date 

STATUS.DATE ( ddd ) 

shows if a date is valid, if it has time, etc. Produces a number from 2 to -3 which indicates the useability 

of the date. 2 indicates both date and time. 1 indicates date only. 0 indicates invalid date value. -1, -2, 

and -3 indicate missing values. 

DAYS ( ddd ) 

returns days since 1/1/1753 for a date. 

nn=number nopt=optional number ddd=date variable copt=optional char constant


SECONDS ( ddd ) 

returns seconds since 1/1/1753 for a date. 

SECONDS.MIDNIGHT ( ddd ) 

returns seconds since midnight for a date. 

UNDO.DAYS ( nn ) 

reverses the DAYS function. 

UNDO.SECONDS ( nn ) 

reverses the SECONDS function. 

FISCAL.YEAR ( ddd, nn ) 

returns the fiscal year of a date. 

FISCAL.QUARTER ( ddd, nn ) 

returns the fiscal quarter of a date. 

QUARTER ( ddd ) 

returns the calendar quarter of a date. 

DAY WITHIN.WEEK ( ddd, 'name' ) 

returns an integer from 1 to 7. Name is an optional weekday name such as ‘Sunday’. 

DAY.WITHIN.YEAR ( ddd ) 

returns 1 to 366, the day within a year. 

WEEK.WITHIN YEAR ( ddd, nn, occ ) 

ADD.YEARS ( ddd, nn, nopt, nopt, nopt, nopt, nopt ) 

add some years to a date. 

ADD.MONTHS ( ddd, nn, nopt, nopt, nopt, nopt ) 

add some months to a date. 

ADD.DAYS ( ddd, nn, nopt, nopt, nopt ) 

add some days to a date. 

ADD.HOURS ( ddd, nn, nopt, nopt ) 

add some hours to a date. 

ADD.MINUTES ( ddd, nn, nopt ) 

add some minutes to a date. 

ADD.SECONDS ( ddd, nn ) 

add some seconds to a date. 

SUBTRACT.YEARS ( ddd, nn, nopt, nopt, nopt, nopt, nopt ) 

subtract some years from a date. 

ddd=date variable copt=optional char constant nn=number nopt=optional number

SUBTRACT.MONTHS ( ddd, nn, nopt, nopt, nopt, nopt ) 

subtract some months from a date. 

SUBTRACT.DAYS ( ddd, nn, nopt, nopt, nopt ) 

subtract some days from a date. 

SUBTRACT.HOURS ( ddd, nn, nopt, nopt ) 

subtract some hours from a date. 

SUBTRACT.MINUTES ( ddd, nn, nopt ) 

subtract some minutes from a date. 

SUBTRACT.SECONDS ( ddd, nn ) 

subtract some seconds from a date. 

EXTRACT.YEARS ( ddd ) 

return numeric years from a date. 

EXTRACT.MONTHS ( ddd ) 

return numeric months from a date. 

EXTRACT.DAYS ( ddd ) 

return numeric days from a date. 

EXTRACT.HOURS ( ddd ) 

return numeric hours from a date. 

EXTRACT.MINUTES ( ddd ) 

return numeric minutes from a date. 

EXTRACT.SECONDS ( ddd ) 

return numeric seconds from a date. 

EXTRACT.CC ( ddd ) 

return 2-digit numeric century from a date. 

EXTRACT.YY ( ddd ) 

return 2-digit numeric year from a date. 

EXTRACT.DATE ( ddd ) 

make a copy of the input, dropping time. 

EXTRACT.TIME ( ddd ) 

make a copy of the input, dropping date. 

EXTRACT.WEEKDAY ( ddd ) 

return the character weekday name. 

CHANGE.YEARS ( ddd, nn, nopt, nopt, nopt, nopt, nopt ) 

change the years field in a date.


CHANGE.MONTHS ( ddd, nn, nopt, nopt, nopt, nopt ) 

change the months field in a date. 

CHANGE.DAYS ( ddd, nn, nopt, nopt, nopt ) 

change the days field in a date. 

CHANGE.HOURS ( ddd, nn, nopt, nopt ) 

change the hours field in a date. 

CHANGE.MINUTES ( ddd, nn, nopt ) 

change the minutes field in a date. 

CHANGE.SECONDS ( ddd, nn ) 

change the seconds field in a date. 

DIF.YEARS ( ddd, ddd, nn ) 

difference between 2 dates in years. The optional numeric argument can be used to limit the elements of 

the data that are to be looked at, thus a 2 means use just years and months 

DIF.MONTHS ( ddd, ddd, nopt ) 

difference between 2 dates in months. The optional numeric argument can be used to limit the elements 

of the data that are to be looked at. 

DIF.DAYS ( ddd, ddd, nopt ) 

difference between 2 dates in days. The optional numeric argument can be used to limit the elements of 

the data that are to be looked at. 

DIF.HOURS ( ddd, ddd, nopt ) 

difference between 2 dates in hours. The optional numeric argument can be used to limit the elements 


DIF.MINUTES ( ddd, ddd, nopt ) 

difference between 2 dates in minutes. The optional numeric argument can be used to limit the elements 


DIF.SECONDS ( ddd, ddd ) 

difference between 2 dates in seconds. 

DATE FORMATTING COMMANDS 

FORMAT.DATE ( ddd, date.format ) $ 

the first argument is a P-STAT date variable. The second argument is a character variable that contains 

the desired format. Almost any arrangement of numeric or character day/month values, dates and years 

can be specified. The following are 3 simple examples. 

‘Month-dd-yyyy’ ‘mon dd yy’ dd/n.month/yy 

DATE.LANGUAGE 

DATE.LANGUAGE GERMAN $ 



DATE.LANGUAGE ENGLISH $ 

Select the language for the dates. GERMAN and ENGLISH are supported. 

DATE.ORDER 

DATE.ORDER 'Jan 1, 2002 12:34:56' $ 

DATE.ORDER changes the default format ordering. The supplied date must be a legal date. The default 

order is: 

'Tues Jan 1, 2002 12:34:56' 

The default style is: English, abbreviated, capitalized. 

MONTH.CASE 

MONTH.CASE UPPER $ 

Changes the case of month names. UPPER, LOWER and CAPITALIZED are supported. 

WEEKDAY.CASE 

WEEKDAY.CASE LOWER $ 

Changes the case of weekday names. UPPER, LOWER and CAPITALIZED are supported. 

MONTH.LENGTH 

MONTH.LENGTH FULL $ 

Changes the length of month names. FULL and ABBREVIATED are supported. 

WEEKDAY.LENGTH 

WEEKDAY.LENGTH ABBREVIATED $ 

Changes the length of weekday names. FULL and ABBREVIATED are supported. 

MONTH.NAMES 

MONTH.NAMES jan feb mar apr may june july aug sept oct nov dec $ 

Changes the default month names. These 12 names must be abbreviations of the current fullmonth 

names. 

WEEKDAY.NAMES 

WEEKDAY.NAMES mo tu we th fr sa su $ 

changes the default weekday names. These 7 names must be abbreviations of the current full weekday 

names. 

DATE LOGICAL OPERATORS 

Each date logical operator examines two date values, which can be expressions. A date value MUST have a date 

field (year-month-day) and MAY have a time field (hour-minute-second). These date and time fields are treated 

in the date compares as if they were two separate BY variables in a sort. The 6 operators are: 

1. DATE.EQ 

nn=number nopt=optional number ddd=date variable copt=optional char constant


2. DATE.NE 

3. DATE.LE 

4. DATE.LT (BEFORE can also be used) 

5. DATE.GT (AFTER can also be used) 

6. DATE.GE 


11 

TEXTWRITER: 

Report Writing 

The TEXTWRITER command produces text or reports that summarize the data in a P-STAT system file. The 

text is formatted much the same way as text produced by a word processing software package, with justification, 

paragraphs and pagination. In addition, the reports can include character strings, values from the file, and evaluations 

of complex expressions containing functions and operators. 

TEXTWRITER uses the P-STAT Programming Language (PPL) instructions PUT and PUTL to specify the 

strings, values and expressions. Additional PPL may be included to test values and output appropriate strings. 

Thus, if Sex equals “M”, the string “Mr.” is written, but if Sex equals “F”, “Ms.” is output. Control words format 

the text and position it in specific columns and lines. (The previous eight chapters explain all aspects of PPL. The 

first two PPL chapters cover the basics which, with this chapter, provides sufficient information for using 

TEXTWRITER.) 

11.1 OVERVIEW 

TEXTWRITER requires an input file and instructions specifying the contents of a report. The input file, whose 

data values are typically included or summarized in the report, is named directly after the command: 

TEXTWRITER Accounts 

Here the input file is named “Customers”. (No comma follows the filename.) The report instructions are PPL 

clauses enclosed in brackets: 

TEXTWRITER Accounts 

[ IF FIRST ( .FILE. ), PUT @SKIP ; 

PUT @JUST Company Tel.No 

First.Name Last.Name ], 

WIDTH 56 $ 

The bulk of a TEXTWRITER command is the set of PPL instructions that follow the input filename. Many 

of these are PUT statements. Identifiers specific to TEXTWRITER may follow the PPL in the usual manner. 

The format of a PUT is: 

1. PUT (or PUTL) 

2. one or more values, character strings and control words (like @20 or @NEXT) 

3. the PUT phrase end character which, depending on the context, is a comma, a semicolon or a right 

bracket. 

In the PUT instructions, character strings are enclosed in quotes or between the directional signs “ > ”. Variable names are not in quotes or directional signs. Expressions are enclosed in parentheses. Control 

words (beginning with “@”) specify placement and format options. 

The report produced by this TEXTWRITER command might look like:

11.2 TEXTWRITER: Report Writing 

A REPORT: 

At Smith and Brothers, Inc., telephone: (312) 457-8700, 

the person to contact is Jim Glidden. 

A similar sentence is then output for each case in the input file. 

11.2 Justification 

When justification is specified, the text in the report is aligned at the right edge as well as the left edge. Extra 

blanks are inserted after certain punctuation and between words to achieve justification. Up to a maximum of five 

blanks may come between words, although a smaller number may be specified. Typically, only two blanks appear 

between some of the words. The concluding line of a paragraph, as well as any single line, is not justified. 

To avoid excess blank spaces in the report, trailing blanks are trimmed off character values and character expressions. 

Thus, the variable University occupies only three columns when its value is “MIT”, but nine columns 

when its value is “Princeton”. Similarly, the values of numeric variables occupy only as many columns as necessary 

for a specific value, not the number of columns needed for the largest value. A blank space is automatically 

inserted between successive values of variables and expressions. 

A large print buffer accumulates text. This permits the formatting and justification of large blocks of text. 

Strings that belong together, such as a word and its apostrophe, are kept together even though they may be specified 

in separate instructions. Each text string follows the next, until control words such as @PARA (new 

paragraph) or @NEXT (next line) cause the start of a new line. Then the text in the buffer is flushed (emptied 

out) and printed, and accumulation of text for subsequent lines begins. 

11.3 The “No-Break” Character 

The “not” sign, which is generally a caret in the ASCII character set and a bar-like character in the EBCDIC set, 

is the no-break character. It keeps two character strings together on the same line and translates to a single blank 

space when printing takes place. Thus, “Mr.^Lee” prints as “Mr. Lee” and does not break or widen between the 

two words. 

11.4 PPL INSTRUCTIONS PUT AND PUTL 

PUT and PUTL, two PPL instructions, specify character strings, values of variables and scratch variables, and expressions 

to position in the text. These instructions produce the actual report. PUT places only the values of 

variables in the text, whereas PUTL places the names of the variables as well as the values in the text. 

PUT can be used in the same way that, for example, SET is used, either starting a new PPL instruction or as 

a consequent of an IF. The PUT is followed by character strings, variable names and expressions. Many PUT 

items (strings and variable names) may follow one PUT. Control words, such as @NEXT, may be used as needed: 

[ PUT .DATE. @SKIP; 

Here a character string, a system variable, and a control word follow one PUT instruction. The @SKIP, described 

later, causes the current line and then a blank line to be written. 

In addition, any other PPL instructions, functions and operators may be included in the PPL clauses. This is 

typically the case when the choice of which character string to place in the report depends on testing and evaluating 

values. For example, this instruction tests the value of the scratch variable “#Recent”: 

IF #Recent EQ , INCREASE #Count, PUT Hospital 

> Date.Last.Call ; 

The scratch variable “#Count” is increased and an appropriate character string is put in the output line of text when 

the result of the IF test is true.

TEXTWRITER: Report Writing 11.3 

11.5 Character Strings 

Any set of arbitrary characters, enclosed in single or double quotes or between the directional signs “ > ”, is a character string. The string may contain letters, numbers, punctuation and blanks, and it may be from 

1 to 50,000 characters long. The string should not contain the names of variables, scratch variables and expressions, 

because these items will be printed literally — substitution of their appropriate values will not take place. 

The instruction: 

PUT First Last ; 

yields a line such as this in the report: 


yields: 

The client is: Mary Roberts. 

PUT ; 

The client is: First Last. 

Character strings should contain only the exact text desired in the report. Some TEXTWRITER applications have 

hundreds of lines of PPL. Using instead of 'string' or “string” helps you see the strings more easily. 

Also, is more quickly 

flagged than omitting a string-terminating ’ or ”. 

11.6 Values of Variables 

The current values of variables, scratch variables, system variables and positions in the V vector (case vector) or 

P vector (permanent vector) may be placed in reports. None of these values is enclosed in quotes or between directional 

signs. Complex expressions must be enclosed in parentheses. This instruction includes a variable, a 

scratch variable and a system variable, as well as four quoted strings: 

[ PUT 'The balance for account number ' Acct.Number 

' is $' #Balance “ on ” .DATE. '.' ] 

This is the same instruction using directional signs instead of quotes: 

[ PUT Acct.Number 

> #Balance > .DATE. ] 

Quotes or directional signs enclose only the character strings. Given a width of 50 and this data, the previous instruction 

yields: 

The balance for account number 1268004 is $752.35 

on Apr 22, 1986. 

Note that a sentence such as this is produced for each case in the input file. The value of the variable 

Acct.Number is likely to change as each case is processed. The value of the scratch variable #Balance will not 

change unless it is reset for each case. This instruction would reset #Balance: 

[ SET #Balance = Balance + Interest ] 

The SET should precede the PUT instruction that places the value of #Balance in the report. The value of the 

system variable .DATE. will not change unless the TEXTWRITER command is run again on another day. 

A blank is automatically inserted after a variable or expression value if the next value is another variable or 

expression, and the final character of the current value is not a blank or a “not” sign.


11.7 Expressions and Functions 

Complex expressions containing functions and operators, as well as variables and values, may be included in PUT 

instructions. Expressions must be enclosed in parentheses and many nested levels of parentheses may be used. 

The expressions are evaluated and the result is placed in the output line. 

The ability to use expressions makes it possible to use the full power of PPL in report writing. Complex numeric 

items and trigonometry functions may be computed, character strings may be padded and concatenated, and 

values in cases may be tested and recoded, all within the PPL instructions that comprise the bulk of 

TEXTWRITER. 

A sampling of expressions that may included in PUT instructions are: 

(CAPS (Name) ) (12 ** 3) 

(Salary + Commission) (LOG10 (Value1 / Value2) ) 

(MEAN (Test.?) ) (CHAREX (Date, 'XX00') ) 

(V(4) - P(Area + 2) ) (SUBSTRING (LEFT (Name), 1, 1 ) ) 

__________________________________________________________________________ 

Figure 11.1 Producing a Report: The Input Files 

File Hospital.lab 

Hospital (1) Mercy Hospital (2) Children's (3) Eye and Ear 

(4) Cranston Memorial (5) Willis (6) St Agnes / 

File Sales 

Date Date Amt 

Last Last Last Sales 

Hospital Call Order Order No Salesman 

4 86-03-17 86-04-15 318.00 2 Will Moore 

2 85-09-20 85-06-25 112.60 4 Ted Ryan 

3 86-01-12 - - 4 Ted Ryan 

6 86-05-15 85-06-11 430.99 4 Ted Ryan 

5 86-02-07 86-02-10 775.25 6 Liz Brown 

1 86-04-12 86-06-01 450.67 6 Liz Brown 

__________________________________________________________________________ 

11.8 A Sample Report 

Figures 11.1, 11.2 and 11.3 illustrate producing a report using PUT instructions, quoted strings, values and expressions. 

Figure 11.1 shows the input files. The file Hispital.lab contains value labels for the Hospital variable 

in file Sales. A preliminary SORT by Sales.No and Date.Last.Call has grouped together each salesperson’s customers 

and orders them by their date of the last sales call. (The report has paragraphs for each salesperson, with 

the sentences describing the status of each account.) 

Figure 11.2 shows the TEXTWRITER command and the PUT instructions. (The numbers at the left are not 

part of the commands, but merely correspond to the subsequent explanation.) Notice the general format of the 

entire command — TEXTWRITER is followed by the input filename, and the filename is followed directly by 

PPL clauses of PUT (and other) instructions. Quoted strings, variables and expressions are included in the PUTs. 

Control words (beginning with "@") specify text placement. Notice also that the command identifiers (LABELS, 

STREAM, JUSTIFY and WIDTH) follow the PPL and are themselves preceded by commas. (STREAM mode 

groups information from several customers into a single paragraph.)


__________________________________________________________________________ 

Figure 11.2 Producing a Report: The TEXTWRITER Command 

TEXTWRITER Sales 

1. [ IF FIRST (.FILE.), PUT @PAGE 

.DATE. @SKIP ] 

2. [ IF FIRST (Sales.No), GEN #Count = 0, PUT @PARA ] 

3. [ GEN #Recent:C = 'no'; 

IF Date.Last.Call GT '86-05-00', SET #Recent = 'yes' ] 

4. [ IF #Recent EQ THEN; 

INCREASE #Count, PUT Hospital 

> Date.Last.Call ; 

5. IF Date.Last.Order GOOD, PUT 

> Date.Last.Order 

@PLACES=0 Amt.Last.Order ; 

6. IF Date.Last.Order MISSING, 

PUT > ; 

ENDIF ] 

7. [ IF #Count GT 0 AND LAST (Sales.No), 

PUT Salesman > ] , 

LABELS 'Hospital.lab', STREAM, JUSTIFY, WIDTH 61 $ 

__________________________________________________________________________ 

The general procedure of the instructions in Figure 11.2 is: 

1. Supply a heading for the report. This is done only once, when the first case or customer in the file 

is processed. 

2. Generate a scratch variable #Count to keep track of the number of customers a salesperson has. A 

salesperson is in the report only if he has some customers without recent sales calls. 

3. Generate a scratch variable #Recent and reset it if a customer has had a recent sales call. Only customers 

without recent calls are to be in the report. Note: The two digit year will be a problem after 

1999. 

4. Specify the text strings to go in the report for customers without recent calls. Also, put the date of 

their call in the report. 

5. If the customer has placed an order as a result of a prior sales call, put an appropriate text string and 

the date of that last order in the report. Also, put the amount of that order in the report. 

6. If there has not been an order, put a text string saying so in the report. 

7. When the last of a salesperson’s customers is processed and at least one has not had a recent sales 

call, place the salesperson’s name in the report. 

The report produced is shown in Figure11.3. Wherever variable Hospital is referenced, the text in the labels 

file is used instead of the numeric value. A report of this type often conveys information more easily than a table


or listing of numbers. It is also obvious how to read it. On the other hand, a report summarizing many cases could 

be lengthy and repetitive. 

__________________________________________________________________________ 

Figure 11.3 Producing a Report: The Report 

Hospital Supply Sales Report: Apr 16, 1986 

Cranston Memorial has not received a sales call since 

86-03-17 and has not placed an order since 86-04-15. That 

order totalled $318. Will Moore is their salesperson. 

Children’s has not received a sales call since 85-09-20 

and has not placed an order since 85-06-25. That order 

totalled $113. Eye and Ear has not received a sales call 

since 86-01-12 and has not placed a subsequent order. Ted 

Ryan is their salesperson. 

Willis has not received a sales call since 86-02-07 and 

has not placed an order since 86-02-10. That order totalled 

$775. Mercy Hospital has not received a sales call since 

86-04-12 and has not placed an order since 86-06-01. That 

order totalled $451. Liz Brown is their salesperson. 

__________________________________________________________________________ 

Report writing shines when the output report is actually many reports, each summarizing a single case (or 

related group of cases) and possibly going to different recipients. Figures 11.4, 11.5 and 11.6 illustrate a more 

complex report. Test results for each case are summarized and a separate report is produced about each individual. 

To make the report more readable, the text is changed slightly for sentences after the first one. 

11.9 Comments in PPL Clauses 

There may be many PPL clauses in a single TEXTWRITER command. Comments interspersed among the clauses 

document what is being done. They begin with /* and end with */: 

[ /* Comment: Generate a scratch variable counter.*/; 

GEN #Counter = 0 ] 

Any text may come between the beginning and end of the comment, and the comment may extend across records 

(lines). Comments may be part of the PPL clauses in any command, as well as in TEXTWRITER. The lengthy 

TEXTWRITER command in Figure 11.10 includes numerous comments to document the PPL instructions. 

11.10 OPTIONAL IDENTIFIERS 

TEXTWRITER has optional identifiers that control its operation and some format features. The CASE and 

STREAM identifiers specify the mode of operation of the TEXTWRITER command. The JUSTIFY, BLANKS, 

PUTL.CHAR and SPREAD identifiers control the way text is output in the report. MARGIN, LEADBLANK and 

WIDTH alter the format of the report. LABELS and OUT refer to optional files. The LABELS file is an input 

file of value labels. The OUT file is a P-STAT system file. PostScript identifiers are discussed later: 

11.11 CASE and STREAM: The Modes of Operation 

The CASE mode is assumed by the TEXTWRITER command, and thus the CASE identifier does not need to be 

explicitly included in the command. In CASE mode, the text starts on a new line at the start of each case. All


accumulated text is flushed and printed, and then accumulation of text for the next case begins. STREAM mode 

is specified by using the STREAM identifier. In STREAM mode, text prints continuously. 

Often it is unnecessary to specify a mode when a control word such as @PAGE causes a page change as each 

new case is processed. This is the situation in both Figure 11.5 and Figure 11.10, when @PAGE is the initial control 

word after the first PUT instruction. @PAGE flushes and prints all accumulated text before moving to a new 

page. However, CASE mode also resets all control words to their initial default values as processing of each new 

case starts. STREAM does not reset the indent or the line width. This is discussed further in the summary portion 

of the section “CONTROL WORDS.” 

11.12 JUSTIFY, BLANKS, PUTL.CHAR and SPREAD 

The JUSTIFY identifier specifies that the text in the report is to have the right as well as the left edge aligned (justified). 

Justification is achieved by the addition of extra blank spaces after certain punctuation and between words. 

Two blank spaces are inserted after periods, exclamation points and question marks, when they end sentences. 

Blanks are not inserted after ellipses (...) and other punctuation, or if the user has already included two blanks after 

periods in the text strings. If necessary, an additional blank is inserted between one or more words. After each 

space has an extra blank, additional blanks are inserted if justification has not yet been achieved. 

When JUSTIFY is not specified, only the left edge of the text is aligned. A line is filled with text until no 

room remains for the next word, and that word is placed in the next line. The right edge of the text has a slightly 

ragged appearance due to the differing amounts of blank spaces remaining at the end of each line. 

The BLANKS identifier may be used when justification is in effect to reset the maximum number of blanks 

that may be added to the space between words. The argument for BLANKS is an integer whose smallest value 

may be 1. When BLANKS is not used, a maximum of four blanks is assumed. Thus, to achieve justification, 

TEXTWRITER may add up to four additional blanks to the existing single blank space between words. Typically, 

it is necessary to add only one extra blank to the spaces between some of the words to justify a line. However, if 

a line contains many long words or if the next word is very long, up to four additional blanks may need to be inserted 

between words. 

When PUTL is used to format a variable name and its value, the name and value are separated by the three 

characters “ = “. This can be changed by providing an alternate set of 1-3 characters. Given the variable name 

Age and the value 15 the control sequence: 

PUTL Age prints Age = 15 

The following examples of PUTL.CHARS produce: 

PUTL.CHARS ' ' Age 15 

PUTL.CHARS '---' Age---15 

PUTL.CHARS ' / ' Age / 15 

SPREAD and NO SPREAD are additional controls over the insertion of blanks. SPREAD is assumed. If NO 

SPREAD is used the values of adjacent variables will be concatenated without the usual intervening blank. 

11.13 MARGIN, LEADBLANK and WIDTH 

The MARGIN identifier specifies the number of columns that text is to be indented from the left. MARGIN 0 is 

assumed when MARGIN is not used and the text is not indented. The control word “@INDENT”, discussed in 

the subsequent section, specifies an additional indent that is measured from the current margin setting. It is used 

within a PUT instruction. 

Usually TEXTWRITER output is printed with a blank at the beginning of each line. This is useful when the 

output is sent to a printer which uses the first character in the line for carriage control instructions. It is not needed 

and probably not wanted when the output is saved in a diskfile for use in another program or document. The identifier 

NO LEADBLANK can be used to remove that blank. LEADBLANK, the assumed setting can also be used.


The WIDTH identifier specifies the number of columns to be used for a report; that is, the width of an output 

line in columns. (Note that the width is measured from the first column, not from the end of the margin or indent.) 

When WIDTH is not used, the current output width defines the width of the report up to a maximum of 400. 

WIDTH, in the TEXTWRITER command, overrides the output width setting. It can be from 2 to 400 The 

WIDTH identifier may be overridden by the control word @WIDTH used within PUT clauses. 

Regardless of how the width of a report is set, one column of that width is reserved for carriage control characters 

(necessary to tell a printer when to page or skip lines). Thus, the actual report has a width one column less 

than the specified width. This is generally not of concern, but if it is, WIDTH 71, for example, should be specified 

for a report of actual width 70. 

11.14 Optional Files: LABELS and OUT 

The LABELS identifier is used to provide the names of one mor more labels files. If the labels files contains values 

for a numeric variable, that text is used in place of the number. The TEXTWRITER command does NOT 

make use of the extended labels when variable names are used in a PUTL or VARNAME reference. 

The OUT identifier is used to produce an output file that contains any modifications that are made to the input 

file. 

11.15 CONTROL WORDS 

Control words, used in PUT instructions, control the formatting and placement of text. They begin with “@” and 

may be anywhere in the PUT clause, except before the PUT. The basic control words are: 

@nn @PARA @PAGE @TRIM @COMMAS @MISS 

@PLUS @NEXT @INDENT @JUST @PLACES @M @M1 

@MINUS @SKIP @WIDTH @BEFORE @EQUAL @M2 @M3 

@LABEL @SPREAD 

(“nn” represents a positive whole number.) 

Some control words require a numeric argument, such as the number of lines to skip. This number may follow 

directly after the control word or after an equal-sign. These are equivalent instructions: 

@SKIP3 @SKIP=3 

Either one skips three lines. Although the argument directly following the control word is typically a number, it 

may be any expression that evaluates to a numeric value: 

@PLUS(#Count-1) @PLUS=(#Count-1) 

Either of these instructions moves the column pointer to the right the number of spaces specified by the value of 

#Count minus one. 

When a control word has a simple argument such as the number 3, it may be placed directly (no spaces) after 

the control word or it may be separated from the control word by an equal sign. Again there are no spaces around 

the equal sign. When the argument is an expression it must be enclosed in parentheses. The parentheses must 

immediately follow the control word or the equal sign. However, the expression within the parentheses may contain 

spaces for readability. 

11.16 Control Words to Produce a Letter 

Figures 11.4, 11.5, and 11.6 illustrate the use of TEXTWRITER to produce a form letter. The letter is personalized 

by including information specific to each case in the input file. Control words described in the sections which 

follow position the heading, the salutation, the body and the closing portions of the letter. Each letter is on a separate 

page.


The TEXTWRITER command and PPL clauses can select cases from a file, calculate information, write appropriate 

text and control the placement of that text to produce suitably personalized letters and reports. Various 

tasks that could be done include: 

1. Billing 

Calculate amount due, date due and discount for early payment, and write bill with correct name, 

address and aligned dollar amounts. 

2. Reminding 

Select patients with upcoming appointments and write letters reminding patients of appointment 

date, time, physician and procedure. 

3. Fund Raising 

Select past donors and write solicitations for funds, including in the letters the number of years of 

support, the maximum previously given, and the number of supporters in this individual’s class or 

organization. 

4. Claims Processing 

Select pending insurance claims and write letters giving the current status of the claim, including 

deductible amount, amount covered, amount payable, and remaining coverage. 

__________________________________________________________________________ 

Figure 11.4 A Form Letter: The Input File 

File MailList 

Last First Sex Company Street 

Greene Sharon F Pierce & Co. P.O. Box 365 

Smyth William - Devon Industries 126 West 46th St. 

City State Zip Copier 

New York NY 10003 Kanon Premiere 

Brooklyn NY 11234 Shape 100 

__________________________________________________________________________ 

11.17 Positioning Columns 

The control word character “@” may be followed directly by a number to specify an exact column location. This 

positions the first letter of the variable Name in the fifth column: 

[ PUT @5 Name ] 

The column location is measured from the start of the line. The column pointer moves to the specified column 

and subsequent text begins in that column. 

@PLUS and @MINUS move the column pointer right and left, respectively, from its current position. 

@PLUS moves the pointer right the specified number of columns; @MINUS moves it left. Thus, if the current 

column is number 20, @PLUS7 moves the pointer seven columns to the right to column 27. 

Note that the pointer moves only in the current line. Thus, if text is already in the specified column, it is overwritten 

by the new text. Also, when using @PLUS and @MINUS, the pointer moves relative to its current 

location, which may be dependent upon the length of the last value it printed. This instruction, 

[ PUT @10 First Last @25 Phone ]


might produce: 

Susan Wells 205-672-9122 

Thomas Bretchei617-926-0106 

12345678901234567890123456789012345678901234567890 

The second phone number has overwritten the remainder of that name. (A scale numbering the columns is included 

just for illustration.) 

11.18 Positioning Lines 

The control words @PARA, @NEXT, @SKIP, @PAGE, @INDENT and @WIDTH specify where a subsequent 

line prints. @PARA starts a new paragraph by moving the pointer to the next line and indenting three 

columns. Text starts in the fourth column. Alternate styles of paragraphia may be obtained using @SKIP (or 

@SKIP @5) instead of @PARA. 

@NEXT positions subsequent text in the next line. @SKIP skips the specified number of lines or, if no number 

directly follows, skips one line. @PAGE positions text at the top of the next page. 

The control word @INDENT specifies an additional number of columns to indent text from the current margin 

setting. (The value after @INDENT is added to the current margin setting and text is indented that many 

columns from the left.) The current margin is that specified by the MARGIN identifier in the TEXTWRITER 

command or the default margin of zero indent. @IN is an abbreviation for @INDENT. @NOINDENT resets the 

indentation to that specified by the MARGIN identifier or to 0 if MARGIN was not used. 

The @WIDTH control word sets the width of the report. The width is measured from column one, not from 

the current margin or indent setting. Thus, when the indent is increased, the line length is shortened. @WIDTH 

overrides any previous output width settings defined by the OUTPUT.WIDTH command or the identifier WIDTH 

in TEXTWRITER. The integer following @WIDTH may range from 2 to 400. @NOWIDTH turns off the current 

setting, and the width of the report reverts to that set by the WIDTH identifier or the OUTPUT.WIDTH 

command. 

Each of these control words flushes the text buffer and prints accumulated text before moving to the specified 

line. When these control words are not used, text prints continuously until the current line is full and then text 

continues on the next line. 

11.19 Positioning Words 

@TRIM is assumed by TEXTWRITER. Trailing blanks are automatically trimmed from character strings 

before they are positioned in the text. This avoids having many blanks following a short name. @NOTRIM may 

be specified to turn trimming off. Then, a character value occupies as many columns as its defined length, even 

though a particular value may be blank or only a few characters long. (Numbers do not have trailing blanks.) 

@JUST specifies that text be right as well as left justified. The lines of the report are aligned at both the left 

and right margins. When @JUST is not used and the JUSTIFY identifier is not included in the TEXTWRITER 

command, as many words as fit in a line are printed and then a new line is started. Thus, the right margin is jagged 

or unaligned. @NOJUST turns justification off, overriding the JUSTIFY identifier if it has been used. 

@BEFORE places the next value or string to be written immediately before the specified column. It affects 

only the text or variable following directly after it, and it does not reset the right edge of the report. This 

instruction, 

[ PUT @BEFORE20 City @24 Area.Code ] 

given this data, produces: 

Princeton 609 

Somerville 201 

Trenton 609


123456789012345678901234567890 

Notice that City is right aligned before column 20; column 20 itself is blank. Area code starts in column 24. (The 

scale is not part of the output.) 

@COMMAS requests that subsequent numeric values print with commas every three digits (counting from 

the decimal point to the left). This makes reading large numbers, such as population figures or dollar amounts, 

easier. @NOCOMMAS turns this off. 

__________________________________________________________________________ 

Figure 11.5 A Form Letter: The TEXTWRITER Command 

TEXTWRITER MailList 

[/* RETURN ADDRESS */; 

PUT @PAGE @SKIP4 @40 

@NEXT @40 

@NEXT @40 

@SKIP @40 (LTRIM (.DATE.) ) @SKIP2 ; 

/* CUSTOMER ADDRESS */; 

IF Sex EQ 'M', T.PUT , F.PUT ; 

PUT First Last @NEXT Company @NEXT Street 

@NEXT City State > Zip ; 

/* SALUTATION */; 

PUT @SKIP ; 

IF Sex EQ 'M', T.PUT , F.PUT , M.GOTO Sir; 

PUT Last ; GOTO Continue ; 

Sir: PUT ; 

/* BODY OF LETTER */; 

Continue: ; 

PUT @PARA 

> Copier > 

> City ; 

PUT @PARA Copier 

; 

/* CLOSING */; 

PUT @SKIP2 @40 @SKIP4 @40 

@NEXT @40 @SKIP2 @NEXT ], 

MARGIN 5, JUSTIFY, WIDTH 71 $ 

__________________________________________________________________________


__________________________________________________________________________ 

Figure 11.6 A Form Letter: One Letter 

Ms. Sharon Greene 

Pierce & Co. 

P.O. Box 365 

New York, NY 10003 

Dear Ms. Greene: 

GREAT Copier Supplies 

123 First Street 

New York, NY 11001 

May 7, 1986 

Thank you for calling GREAT Copier. We stock all supplies 

for the Kanon Premiere copier at discount prices, and we deliver 

them right to your business in New York. 

Enclosed is a price list for the Kanon Premiere. Should you 

have any questions, please give us a call. 

Sincerely yours, 

Sam Right 

Sales Manager 

SR:ms 

Enc: pl 

__________________________________________________________________________ 

@PLACES specifies the number of decimal places (counting from the decimal point to the right) with which 

to print subsequent numeric values. The numbers are rounded if they have more than the specified number of 

places or zeros are added if they have less than the specified number of places. This instruction: 

[ PUT @PLACES2 Income ] 

prints the variable Income with two decimal places. @PL is an abbreviation. @NOPLACES turns off the prior 

places specification. Numbers then print with their actual number of decimal places. (@NOPLACES is not the 

same as @PLACES0. Zero decimal places print when @PLACES0 is specified. The number of places actually 

in a numeric value print when @NOPLACES, the initial default setting, is in effect.) 

The PLACES function is distinct from the @PLACES control word. Both may be used in PUT clauses, if 

desired: 

[ PUT 'Total income, to the nearest dollar, is: ' 

@PLACES2 (PLACES (Income, 0)) ]


In this example, the PLACES function rounds the number of decimal places in Income to zero and the @PLACES 

control word sets the number of places to two. Thus, Income is shown to the nearest dollar, but with two zeros 

included after the decimal point in the common dollar and cents pattern. 

@SPREAD is the assumed setting. @SPREAD causes a single blank to be placed between variables. @NO 

SPREAD can be used to change this so that the blank is omitted. For example: 

TEXTWRITER Tests [ PUT @NOSPREAD First Last ] $ 

produces the following output: 

JamesWilmot 

SheilaHiggin 

Variables are usually trimmed of outside blanks before printing. The combination of @NOTRIM and @NO- 

SPREAD causes TEXTWRITER to leave things alone and print the values exactly as they are stored in the file 

with no intervening blanks. 

11.20 Labeling Values 

The PUTL instruction and the control word @EQUAL align variable names as well as their values about their 

equal-signs. This is useful for embedding lists within reports or dumping values in cases with inconsistent data. 

PUTL requests that the variable name as well as its value be placed in the output line: 


PUT 'The following accounts are overdue:' @SKIP1; 

IF DAYS (.NDATE., 'YYYYMMDD') - DAYS (Due, 'YYMMDD') GT 30, 

PUT Acct.Number Co.Name, PUTL @26 Billed Due ] 

Here is a possible report using these PUTL statements: 

The following accounts are overdue: 

1205 Jones & Sons, Inc. Billed = 860112 Due = 860210 

1231 Birchwood Lumber Billed = 860210 Due = 860305 

The @EQUAL control word aligns labeled variables by specifying the column location of the equal-sign. 

Each line of text has that variable label and value with the equal-sign in the same column. @EQUAL is used after 

PUTL: 

[ PUTL @EQUAL15 Systolic ] 

For two cases this produces: 

Systolic = 96 

Systolic = 82 

123456789012345678901234567890 

Multiple locations may be specified. This instruction: 

or the equivalent: 

[ PUTL @EQUAL15:35 Systolic Diastolic ] 

[ PUTL @EQUAL=15:35 Systolic Diastolic ] 

aligns the variable names and values about the equal-signs, which are positioned in columns 15 and 35. Either 

instruction produces:


Systolic = 96 Diastolic = 123 

Systolic = 82 Diastolic = 114 

1234567890123456789012345678901234567890 

If the variable name and value does not fit in the line when the equal-sign is positioned in the specified column, 

it is placed in the next line of text. @NOEQUAL turns off alignment about the equal-sign. 

A very useful enhancement to the PUTL control word is the .ALL. system variable. This causes all the variables 

in the file to be printed. Figure 11.7 shows the command and the printout that results. 

__________________________________________________________________________ 

Figure 11.7 TEXTWRITER: Displaying all the Variables 

TEXTWRITER Tests [ PUTL @EQUAL=15:40:60 .ALL. @SKIP ] $ 

SS.Number = 243-24-5007 Last = Wilmot First = James 

Vocab = 97 Riding = 90 Tea = 98 

Hockey = 78 Car = 71 Beer = 85 

Juggling = 64 Affairs = 97 Memo = 93 

SS.Number = 311-04-8831 Last = Higgin First = Sheila 

Vocab = 96 Riding = 89 Tea = 54 

Hockey = 86 Car = 70 Beer = 91 

Juggling = 82 Affairs = 96 Memo = 86 

__________________________________________________________________________ 

You will almost always want to use @EQUALS with the .ALL. PUTL combination. If you do not specify 

where the equal signs should be placed, the variables and values are printed one after another with 4 spaces between 

the value and the next variable name. The amount that is printed on each line is determined by the current 

output width setting up to a maximum of 400 characters. 

PUTL prints a value with its variable name. PUT prints a value. If you are providing a series of variable 

names some of which are to have labels and some of which should be printed without labels, you may use the 

@LABEL and @NOLABEL control words. @LABEL in a PUT is the equivalent of a PUTL. @NOLABEL in 

a PUTL is the equivalent of a PUT. The command: 

TEXTWRITER Tests [ PUTL Last First @NONAME SS.Number ] $ 

produces the following output. 

Last = Wilmot First = James 243-24-5007 

Last = Higgin First = Sheila 311-04-8831 

11.21 Specifying Missing Characters 

The @MISS control word specifies a character or a character string to print in place of the dash or dashes that 

usually print for missing values. @MISS, followed by a character or string in quotes, requests that character or 

string print for any of the three types of missing values: 

[ PUT @MISS


[ PUT 'The customer response was ' 

@MISS1"don’t know" @MISS2'no answer' @MISS3'refuse to answer' 

Response4 '.' ] 

@M1, @M2 and @M3 are abbreviations for the three types of missing control words. 

The missing specifications are in effect until they are reset or turned off with @NOMISS. The @NOMISS 

control word returns the missing character to the dash, the initial default character. 

@NOMISS1, @NOMISS2 and @NOMISS3 may be used to selectively reset specific missing control words. 

__________________________________________________________________________ 

Figure 11.8 A Complex Report: The Input and Labels Files 

File Tests 

SS Number Last First Vocab Riding Tea Hockey Car 

243-24-5007 Wilmot James 97 90 98 78 71 

311-04-8831 Higgin Sheila 96 89 54 86 70 

Beer Juggling Affairs Memo 

85 64 97 93 

91 82 96 86 

File Tests.lab 

Score (1)superior (2)excellent 

(3)above average (4)average 

(5)marginal (6)poor / 

__________________________________________________________________________ 

11.22 A Complex Report 

This section discusses the more complex report shown in Figures 11.8,11.9 and 11.10. More of the potential of 

PPL is illustrated. Figure 11.8 shows a portion of an input file containing scores for various aptitude tests (some 

of which are perhaps a little silly). Figure 11.9 is the desired output and Figure 11.10 contains the command. 

In the first case, scores on Tea and Vocab and Affairs, which are all above 95, are recoded as a 1. Therefore, 

they will be represented in the first sentence that is constructed. Riding and Memo, which are recoded as a 2, are 

represented in the second sentence. Beer is recoded as a 3 and is represented in the third sentence. For each case, 

the contents of each sentence and even the number of sentences is different. 

The general procedure of the instructions is: 

1. Generate all of the variables and scratch variables that will be used in the command. Since none of 

them are given initial values they are set to missing. 

2. Select the desired variables and do any necessary recoding. Specify a heading for each report (each 

case is a separate report on a new page).


__________________________________________________________________________ 

Figure 11.9 A Complex Report: The Report (Two Pages) 

James Wilmot (243-24-5007) 

Comments on INTELLECTUAL MEASURES 

Vocabulary skills, tea pouring and knowledge of current affairs 

are in the superior range by Company standards. Horseback riding and 

memo writing are excellent. Beer drinking is above average. Field 

hockey play is average. Car parking is marginal. Juggling is poor. 

.............. 

Sheila Higgin (311-04-8831) 


Vocabulary skills and knowledge of current affairs are in the 

superior range by Company standards. Horseback riding and beer 

drinking are excellent. Field hockey play, juggling and memo writing 

are above average. Car parking is marginal. Tea pouring is poor. 

__________________________________________________________________________ 

3. Initialize #Sentence to 0. Start the All.Scores DO LOOP through the 6 possible scores with the 

scratch variable #S. 

4. Count the number of tests that match the current value of #S. For example, for the first case when 

#S = 1, #NV.of.Score = 3; James Wilmot has three values of score 1 (97, 98 and 97 after recoding). 

5. Set #Used to zero. Start the All.Tests DO loop through the 9 test scores. When a value of a test 

equals #S, increase #Used and start writing. 

When processing the first case, 97 (the value of variable Vocab) is recoded as a 1. The KEEP reorders 

the file so that Vocab is the first variable in the file. The first time through the All.Scores loop 

#S=1. Within that loop, the first time through the All.Tests loop #J=1, the position of Vocab. Since 

the value of Vocab is a 1 which equals the 1 of #S, we have a match for our first sentence. 

6. Generate #Name to hold the desired name for each test. The first variable Vocab is called “vocabulary 

skills” in the report. 

7. If this is the first item in a sentence (#Used = 1), capitalize the first letter of the name. 

8. Depending on the number of items that are to go in the sentence (#NV.of.Score) and the number already 

in (#Used), put the name of the aptitude test and possibly a comma or the word “and”. 

9. Put “is” or “are” in the sentence, depending on how many items (#Used) have been used in the sentence 

already. 

10. Change the text slightly for sentences after the first one to make the report less repetitious. When 

the sentence for a given score has been written, the All.Tests loop is complete. When all the 6 scores 

have been evaluated, the All.Scores loop is complete.


__________________________________________________________________________ 

Figure 11.10 A Complex Report: The TEXTWRITER Command 

TEXTWRITER Tests 

[ 

[ 

[ 

/* 1. Recoding, Beginning Each Case */; 

GEN Score; 

GEN #Sentence, GEN #NV.of.score, GEN #USED, GEN #Name:C32 ; 

KEEP Vocab TO Memo .OTHERS.; 

DO #J = 1, 9; 

SET V(#J) = RECODE ( V(#J), 

0 TO 65 = 6, 65 TO 73 = 5, 73 TO 80 = 4, 

80 TO 88 = 3, 88 TO 95 = 2, 95 TO 100 = 1 ) ; 

ENDDO; 

PUT @PAGE First Last > SS.Number @NEXT ; 

PUT @15 @PARA ] 

SET #Sentence = 0 ; 

/* #Sentence is the count of the number of sentences 

#S controls the loop through the 6 possible scores 

#NV.of.Score is number of variables with a given score */; 

DO All.Scores #S = 1, 6; 

SET #NV.of.Score = 0 ; 

DO #J = 1, 9; 

IF V(#J) EQ #S, INC #NV.of.Score ; 

ENDDO; 

/* No tests had this score, get the next one */; 

IF #NV.of.Score EQ 0 NEXTDO; ] 

/* #Used is number of items used in sentence 

#J is the position (1-9) of a test variable */; 

SET #Used = 0; 

DO All.Tests #J = 1, 9; 

IF V(#J) NE #S, NEXTDO; 

INC #Used; 

/* #Name is the name of each test item */; 

SET #Name = RECODE ( #J, 

1 = , 2 = ,


[ 

3 = , 4 = , 

5 = , 6 = , 

7 = , 

8 = , 

9 = ) ; 

/* Capitalize the name of the first test item */; 

IF #Used EQ 1, SET #Name = 

CHANGE ( #Name, 1, 1, UPPER (SUBSTRING (#Name, 1, 1) ) ); 

/* Use commas and “and” appropriately 

then get the next test */; 

IF #NV.of.Score EQ 1, PUT #Name, NEXTDO; 

IF #Used LE #NV.of.Score - 2, PUT #Name ; 

IF #Used EQ #NV.of.Score - 1, PUT #Name ; 

IF #Used EQ #NV.of.Score, PUT > #Name ; 

All.Tests: ENDDO ] 

/* Write sentence, set score to #S for labelling */; 

INC #Sentence ; 

/* Use “is” and “are” appropriately */; 

IF #Used EQ 1, PUT > ; 

IF #Used GT 1, PUT > ; 

/* Different text for the first sentence only */; 

SET Score = #S; 

IF #Sentence EQ 1, PUT Score 

> ; 

IF #Sentence NE 1, PUT Score ; 

All.Scores: ENDDO ], 

JUSTIFY, WIDTH 60, LABELS 'Tests.lab' $ 

__________________________________________________________________________ 

11.23 Control Word Summary 

Control words all begin with “@” and many of them are followed by a number giving a column location or other 

value. The number may follow directly after the control word (@SKIP2) or after an equal-sign (@SKIP=2). Although 

the argument directly following the control word is typically a number, it may be any expression that 

evaluates to a numeric value. 

The following control words remain in effect throughout the processing of a case by the TEXTWRITER 

command, unless they are specifically changed or turned off: 

@INDENT @EQUAL 

@WIDTH @MISS 

@JUST @COMMAS


@TRIM @PLACES 

@LABEL @FONT1 - @FONT9 

Thus, specifying @JUST means that all the text specified in this and subsequent PUT clauses will be justified; 

specifying @COMMAS means that all numeric values will have commas in them. Prefacing the control word with 

“NO” (@NOJUST or @NOCOMMAS) turns off a prior setting. It resets the setting to that initially assumed by 

the TEXTWRITER command: 

[ PUT 'Invoice number ' Inv.No ', for $' 

@COMMAS Inv.Amt ', dated ' @NOCOMMAS Inv.Date 

' is past due.' ] 

Commas are inserted only in values of the numeric variable Inv.Amt and not in Inv.No or Inv.Date, which are 

character variables. 

The following control words do not remain in effect throughout the processing of a case. They apply only 

to the variable expression or character string that directly follows: 

@nn @PAGE 

@PLUS @NEXT 

@MINUS @PARA 

@BEFORE @SKIP 

These control words must be reissued to produce the desired results again. (The “nn” above represents a positive 

whole number.) 

The STREAM and CASE identifiers also affect the action of control words. Remember that in CASE mode, 

text is flushed and a new line is started when processing of a new case is begun. In STREAM mode, text is flushed 

only when processing of all cases is complete. CASE is assumed when neither identifier is used in the TEXT- 

WRITER command; STREAM must be specified if it is desired. 

When STREAM mode is specified, all control words that typically remain in effect, except @INDENT and 

@WIDTH that flush and print text, are reset when processing of a new case begins. In CASE mode, all control 

words, including @INDENT and @WIDTH, are reset to their initial default values. 

Thus, a PUT instruction such as this, 

[ PUT 'Student ' ID.No ' has an outstanding bill of $' 

@COMMAS @PLACES2 Balance '.' ] 

which specifies that the variable Balance print with commas and two decimal places, does not need to be initialized 

for each case. After the first student is processed, the variable ID.No for the second student will not print 

with commas and two places. The @COMMAS and @PLACES2 control words are reset to their default values 

at the start of each case. However, they do remain in effect throughout a case. Thus, if the prior PUT instruction 

was followed by another PUT instruction, any numeric values specified in that instruction would print with commas 

and two decimal places. @NOCOMMAS and @NOPLACES would have to precede those numeric variables 

to reset these control words. 

11.24 COMPARING TEXTWRITER AND OTHER COMMANDS 

Any of the control words, except @INDENT, @JUST, @SPREAD, @WIDTH, and the PostScript controls such 

as @FONT1, may be used in PUT clauses following any command. They are not exclusive to the TEXTWRITER 

command. Thus, brief reports or “dump messages” may be produced as a system file is processed by any command. 

However, as with all P-STAT commands, the TEXTWRITER identifiers can be used only in the 

TEXTWRITER command. 

There is a basic difference between TEXTWRITER and other commands in the way text is output. When 

TEXTWRITER is used, text prints continuously — a new line is not started unless the prior line is full or control


words specify a new line. Text is flushed and a new line is started only when processing of a new case is begun, 

unless STREAM mode is specified. Then text is flushed only when processing of all cases is complete. 

When another command (not TEXTWRITER) is used, a new line is started and text is written whenever a 

PUT clause ends unless the final character in the clause is an “@” by itself. This causes the line to be held for 

the possible addition of more text. When processing of a case is finished, all text is written, regardless of whether 

the line was held or not. 

The control character “@” is used at the end of a PUT instruction to hold the column pointer in the same line: 

PROCESS Class102 

[ PUT ID 

(ROUND (MEAN.GOOD (Test?) ) ) @ ; 

IF (COUNT.GOOD (Test?) ) NE 8, PUT ] $ 

The “@” after the period keeps the column pointer in the same line. If there are not eight good test scores, an 

additional text string is put in that line. This report is produced: 

1022: Average is 85. 

1248: Average is 88. Missing some tests. 

This command, without the “@”, 

produces: 

PROCESS Class102 

[ PUT ID 

(ROUND (MEAN.GOOD (Test?) ) ) ; 

IF (COUNT.GOOD (Test?) ) NE 8, PUT ] $ 



Missing some tests. 

The text about the missing tests appears on a new line. (It would not be necessary to use the control character “@” 

in the TEXTWRITER command, which assumes continuous printing of text. 

11.25 OPTIONAL IDENTIFIERS: PostScript 

The TEXTWRITER command can use any font that is available on your PostScript printer to produce cameraready 

printout. Figure 11.11 shows the output that results when PostScript controls are added to the complex report 

described in Figures 11.8, 11.9, and 11.10. The controls for the page and paragraph are changed to request 

fonts: 

PUT @PAGE @FONT2 

First Last > SS.Number @NEXT @FONT1 ; 

PUT @15 @PARA @FONT3; ] 

The following PostScript identifiers are added before the final “$”. LEFT.EDGE, which uses inches, replaces 

WIDTH 70, which uses number of characters, to determine the width of the printout. 

POSTSCRIPT, PORTRAIT, FONT1 TIMES ROMAN BOLD 14, 

FONT2 TIMES ROMAN BOLD 12, FONT3 ARIAL 12, 

FONT4 TIMES ROMAN BOLDITALIC 12, LEFT.EDGE 2., PR 'Test.ps' $ 

The identifier POSTSCRIPT causes the initial control codes that PostScript requires to be written to the output 

file. You should always use a PR identifier as PostScript output written to a non-PostScript device such as the 

terminal prints the control words rather than implementing them. Usually one of the two identifiers PORTRAIT


or LANDSCAPE is used. PORTRAIT is used when the printout is going to paper that is 8.5 nches wide and 11 

inches high. LANDSCAPE, the assumed orientation, is used for output that is 11 inches wide by 8.5 inches high. 

If you have paper that is a different size, the P-STAT POSTSCRIPT.SETUP command can be used to set the paper 

size. POSTSCRIPT.SETUP can also be used to set fonts and margins. 

The area that is to be used on the paper can be controlled by using the TOP.EDGE. BOTTOM.EDGE, 

LEFT.EDGE and RIGHT.EDGE identifiers. The arguments are given in inches. 1 inch margins are assumed for 

any edge that is not supplied. In Figures 11.1 through 11.4, the identifiers used to make the printed output fit nicely 

are: 

TOP.EDGE .5, BOTTOM.EDGE 5.5, LEFT.EDGE 1.5, RIGHT.EDGE 1.5 

11.26 PostScript Page Changes 

The assumption is that each time you use the @PAGE control word, the PostScript page will be sent to the printer. 

This setting is controlled by the SHOWPAGE identifier. However, using PostScript is more like drawing on a 

slate than writing lines on a page. It is possible to move around the page, overwrite, draw lines or print text. In 

the P-STAT implementation any command that has PostScript support can be combined on a single page with any 

other such command. 

When NO SHOWPAGE is used, a page is not sent to the printer until a subsequent command uses the SHOW- 

PAGE identifier. SHOWPAGE is assumed unless NO SHOWPAGE is used. An automatic page change occurs 

when a block of text extends beyond the defined bottom of the page unless NO SHOWPAGE is in effect. 

__________________________________________________________________________ 

Figure 11.11 PostScript Output 



Vocabulary skills, tea pouring and knowledge of current affairs are in 

the superior range by Company standards. Horseback riding and memo 

writing are excellent. Beer drinking is above average. Field hockey play is 

average. Car parking is marginal. Juggling is poor. 



Vocabulary skills and knowledge of current affairs are in the superior 

range by Company standards. Horseback riding and beer drinking are 

excellent. Field hockey play, juggling and memo writing are above 

average. Car parking is marginal. Tea pouring is poor. 

__________________________________________________________________________ 

11.27 Setting the Fonts 

The identifiers that are used to set the fonts are: FONT and FONT1 through FONT9. When FONT is used by 

itself it sets all 9 of the available fonts to the supplied setting. If FONT is not supplied the assumed font is Times- 

Roman. The assumed pointsize depends on the combination of orientation (LANDSCAPE or PORTRAIT) and


the output width. In PORTRAIT orientation with an output width less than 80, the default pointsize is 10. In 

LANDSCAPE orientation with a width that is greater than 80, the pointsize is set to 8. 

Font names must be correctly spelled. Several of the more common font names are available as keywords so 

that you need not remember the exact form (upper and lower case and hyphenation). The available combinations 

are: 

TIMES HELVETICA COURIER 

TIMES BOLD HELVETICA BOLD COURIER BOLD 

TIMES ITALIC HELVETICA OBLIQUE COURIER OBLIQUE 

TIMES BOLDITALIC HELVETICA BOLDOBLIQUE COURIER BOLDOBLIQUE 

These must be preceded by one of the FONT identifiers and optionally followed by the desired pointsize. 

FONT1 HELVETICA 10, FONT3 TIMES ITALIC, FONT4 COURIER, 

Helvetica and Times are proportional fonts. In a proportional font each letter takes up an appropriate amount of 

space so that an i is not as wide as a W. Courier is a monospace font and each letter takes up the same amount of 

room. 

You may use any font that is available on your laser printer. However, if it is not in the list of keywords, it 

must be enclosed in quotes. For example: 

FONT9 'ZapfChancery-MediumItalic' 10, 

Fonts can be only be defined with the TEXTWRITER identifiers or in a previous POSTSCRIPT or POST- 

SCRIPT.SETUP command. Their usage in the textwriter output is done by using TEXTWRITER control words. 

11.28 TEXTWRITER Control Words: The Fonts 

The font control words are any of @FONT1 through @FONT9. The font change takes effect immediately and 

remains in effect until the next font control word is specified. If a number of fonts have been specified but no font 

control word is used, FONT1 is assumed. 

The output in Figures 11.1 to 11.4 use four fonts defined by identifiers in the command: 

FONT1 TIMES BOLD 14, FONT2 TIMES BOLD 12, 

FONT3 ARIAL 12, FONT4 TIMES BOLDITALIC 12 

Within the TEXTWRITER PPL, the control words @FONT1, @FONT2, @FONT3 and @FONT4 are used to 

specify which of the defined fonts to use for a particular piece of text. 

In Figure 11.11 the 2 paragraphs are in a regular Arial font while the two heading lines are in Times Bold 12 

and Times Bold 14. Justific1ation is not requested for Figure 11.11. In Figure 11.12 the text in the paragraphs is 

right/left justified. This is done by adding “, JUSTIFY” to the end of the TEXTWRITER command. 

In Figure 11.13 the paragraphs are not justified but have font changes in the middle of the paragraph. A 

change was made in the TEXTWRITER command to isolate the variable “Score” so that it could be printed in a 

bold italic font: 

SET Score = #S; 

IF #Sentence EQ 1, PUT ; 

PUT @FONT4; PUT Score; 

IF #Sentence NE 1, PUT ; 

PUT @FONT3; 

IF #Sentence EQ 1, PUT >; 

__________________________________________________________________________


Figure 11.12 Justification in PostScript Text 





writing are excellent. Beer drinking is above average. Field hockey play is 

average. Car parking is marginal. Juggling is poor. 




range by Company standards. Horseback riding and beer drinking are 

excellent. Field hockey play, juggling and memo writing are above 


__________________________________________________________________________ 

__________________________________________________________________________ 

Figure 11.13 Changing Fonts Text in a PostScript Paragraph 





writing are excellent. Beer drinking is above average. Field hockey play 

is average. Car parking is marginal. Juggling is poor. 




range by Company standards. Horseback riding and beer drinking 

are excellent. Field hockey play, juggling and memo writing are above 


__________________________________________________________________________


In Figure11.14, the text of the paragraphs is both justified and has font changes in the middle of the text. As 

you can see the spacing is not as good as it is in Figure 11.12. This is because any font change, color change or 

underline causes a flush. The program does the justification by estimating what might come next. This usually 

results in somewhat more space between the words. Text with many font changes may result in somewhat less 

attractive results than text without intermediate font changes. Text with many long words will also tend to have 

less attractive results when justified. 

__________________________________________________________________________ 

Figure 11.14 Font Changes in a Justified PostScript Paragraph 

James Wilmot (243-24-5007 

Comments on INTELLECTUAL MEASURE 





Sheila Higgin (311-04-8831 

Comments on INTELLECTUAL MEASURE 





11.29 Control Words: Positioning the Text 

There are two types of text to consider: 

1. text that spans multiple lines like the paragraphs in the previous figures and 

2. tables or short pieces of text which need to be positioned at particular places. 

When PostScript is in effect, the control words such as @ and @BEFORE do not work well with proportional 

fonts. To account for the effects of proportional fonts and the fact that a PostScript page is not written from top 

to bottom but “drawn” on the page, there are many TEXTWRITER control words specifically for use with 

PostScript. 

The following control words use inch measurements to specify where the string or number that follows is to 

be placed and how it is to be placed in relation to that location. 

3. @CINCH=nncenters the text that follows at inch location nn. The text ends with the next control 

word of a type that causes a buffer flush such as @NEXT, @FLUSH, or another 

@CINCH type. Suppose var1 equals 111.222 . 

@CINCH=3.1 @PLACES2 var1 @SKIP2; 

produces “variable one = 111.22” and puts is so that its middle point is 3.1 

inches into the current line.


Note: @PLACES is a textwriter control word that does not cause a flush of the 

current text buffer. Since @SKIP flushes the text buffer, it ends the CINCH. 

4. @CINCH.U=nncenters and also underlines the string at inch nn. 

5. @RINCH=nnputs the textwriter text right justified at the specified location. This works well for 

numbers if the number of decimal places is controlled. 

6. @RINCH.U=nnright justifies and underlines the text. 

7. @LINCH=nn left justifies text at the specified location. 

8. @LINCH.U=nnleft justifies and underlines the text 

__________________________________________________________________________ 

Figure 11.15 TEXTWRITER: Tabular Ouput with PostScript 

LIST numbers $ 

Var1 Var2 Var3 

123.11 168.50 568.12 

12.239 45.67 33.20 

123.45 211.99 444.44 

TEXTWRITER numbers 

[ IF FIRST ( .FILE. ) THEN; 

PUT @LEADING=2 @Y1 @NEXT 

@CINCH.U=2.5 

@RINCH.U=4.7 

@LINCH.U=5.7 @SKIP=2; 

ENDIF; 

PUT @NOPLACES; 

IF var1 GE 35 THEN; 

PUT @PINCH=2.5 var1; ELSE; PUT @PINCH.U=2.5 var1; ENDIF; 

PUT @PLACES2; 


PUT @RINCH=4.5 var2; ELSE; PUT @RINCH.U=4.5 var2; ENDIF; 


PUT @LINCH=5.7 var3; ELSE; PUT @LINCH.U=5.7 var3; ENDIF; 

PUT @NEXT; 

IF LAST ( .FILE. ) THEN; 

PUT @NEXT @Y2 @LINEWIDTH=1.5 @DRAW.BOX @LINEWIDTH; 

PUT @X1=3.4 @DRAW.V @X1=5.2 @DRAW.V; 

ENDIF; ], 

POSTSCRIPT, PORTRAIT, 

LEFT.EDGE .5, RIGHT.EDGE .5, PR number.ps $ 

__________________________________________________________________________ 

9. @PINCH=nncenters the text around a specified lineup character, which is assumed to be a decimal 

point. This is good for writing a column of fractional numbers when the number 

of decimal places differs.


10. @PINCH.U=nnlike PINCH but it also underlines. 

11. @PINCH.CHAR='c' provides an alternate character such as '=' to be used in the pinch lineups. If no 

argument, it reverts to the default '.' . 

12. @FLUSH flushes the current textwriter buffer without also moving to the next line. The effect 

of flushing turns off temporary options like @CINCH or @UNDERLINE. It 

does not affect color settings. 

__________________________________________________________________________ 

Figure 11.16 PostScript: Tables with Proportional Fonts 

column one column two column three 

123.1100 168.50 568.12 

12.23900 45.67 33.20 

123.4500 211.99 444.44 

__________________________________________________________________________ 

If you wish to draw lines or boxes it is useful to he able to specify the coordinates which define the drawing 

area. The following are control words that make it easy to move the current location on the page to resume printing 

at that location or for line and box drawing: 

1. @X1 stores a value that is the current left margin. If there is an argument as @X1=3.4, that 

inch location is stored. in the X1 variable. 

2. @X2 stores a value that is the current right margin. An argument such as @X2=8.3 stores 

that inch value in the X2 variable. 

3. @Y1 stores a value that is the current top margin. If an argument is supplied, that value is 

stored in the Y1 variable. 

4. @Y2 stores a value that is the current bottom margin. If no argument is supplied, that argument 

is stored in the Y2 variable. 

5. @MOVETO sets the current location to the X1/Y1 position. The next text string will begin at that 

location. 

6. @DRAW.H draws a horizontal line from the X1/Y1 position to the X2/Y1 position. 

7. @DRAW.V draws a vertical line from the X1/Y1 position to the X1/Y2 position. 

8. @DRAW.U=nnunderlines the current line from X1 to X2. nn is the amount below the current position 

where the underline should be drawn. The argument, nn, is in units of 1/ 

72 of an inch. If no argument is given, the assumed value is 3. 

9. @DRAW.BOXdraws a box using X1/Y1 as the upper left coordinate and X2/Y2 as the lower right 

coordinate.


Yet another group of control words which determine location include: 

1. @DOWN=nnmoves down that many lines. The actual distance depends on the point size of the font 

and the leading (the space between the lines). If nn is not specified, 1 is 

assumed. 

2. @UP=nn moves up that many lines. The actual distance depends on the point size and the leading. 

If nn is not specified, 1 is assumed. 

3. @TOP moves to the first line, just below the top margin. 

4. @BOTTOM moves to the last line, just above the bottom margin. 

5. @LEADING=nnspecifies the space between lines. LEADING is usually set to 1/72 of an inch. 

LEADING=3, increases the space to 3/72 of an inch. A larger LEADING improves 

the readability of text when a large point size is used. 

6. @LINEWIDTH=nn specifies the width of the lines and boxes that are drawn. LINEWIDTH is usually 

set at .5. This measurement is units of 1/72 of an inch. Increasing the 

LINEWIDTH causes bolder looking lines. @LINEWITH with no argument resets 

it to the original value of .5. LINEWIDTH=36 would provide a border 1/2 

inch wide. 

Figure 11.15 contains the TEXTWRITER command which creates the table in Figure 11.16. Each of the 

three columns is formatted differently to show the effect of left, right and center justification. The headings are 

both properly justified and underlined. The LEADING is increased so that there is more room between lines for 

the underlining. 

PUT @LEADING=2 @Y1 @NEXT 

@CINCH.U=2.5 

@RINCH.U=4.7 

@LINCH.U=5.7 

The use of @Y1 stores the location of the line before the headings. This value is needed later when we draw 

the box around the table. The first column is of particular interest because the numbers are placed so that the decimal 

point is located 2.5 inches from the left edge of the paper. This location may or may not be the actual center 

of the number. 


PUT @PINCH=2.5 var1; ELSE; PUT @PINCH.U=2.5 var1; ENDIF; 

This IF statement tests the value of Var1. If the value is relatively large, the @PINCH control word is used 

to locate the value so that the decimal point falls 2.5 inches from the left edge of the paper. If the value is small, 

PINCH.U places the value and underlines it. PINCH and PINCH.U are very useful when a column of numbers 

contains values which have different numbers of decimal places. PINCH and PINCH.U can be used to line strings 

up around a character other than the decimal point; For example: 

PUT @PINCH.CHAR=”=” 

causes the equal sign to be used in determining the location of the text that follows. 

The second column in Figure 11.16 is right justified 4.5 inches from the left edge of the page. This works 

well when all the numbers have the same number of decimal places. The third column illustrates that left justification 

of a column of numbers is seldom satisfactory. 

11.30 Indenting Text 

The identifiers are used to set the initial margins for the page. The right and left margins can be adjusted by using 

@L.MARGIN and @R.MARGIN to supply an indent value. This value is an offset to the existing margins. 

@L.MARGIN=.5 @R.MARGIN=.5


This provides a half inch indent on each side of the page. Figure 11.17 illustrates the command and the resulting 

output. @L.MARGIN is used to create a hanging indent with text that explains both @L.MARGIN and 

@R.MARGIN. Note: a postive number as the argument moves the margin towards the center of the page. A 

negative number moves the margin towards the edge of the page. 

__________________________________________________________________________ 

Figure 11.17 Indenting the Text 

TEXTWRITER Work [ Case 1; 

PUT @L.MARGIN=1.5; 

PUT ; 

PUT @SKIP @L.MARGIN @L.MARGIN=1.5; 

PUT ; ], 

POSTSCRIPT, PORTRAIT, PR marg.ps, 

FONT1 TIMES 12, LEFT.EDGE 1.25, RIGHT.EDGE 1.25$ 

__________________________________________________________________________ 

11.31 Colors in PostScript Output 

The assumption is that postscript output will be black on white. The black can be changed to any of red, orange, 

yellow, green, blue or violet. The control words are @RED, @ORANGE, @YELLOW, @GREEN, @BLUE, 

@VIOLET, @BLACK and @NOCOLOR. @NOCOLOR reverts to black. The change in color flushes any text 

that has preceded it but not yet been placed on the page. 

If you wish more control over the colors, you can use the POSTSCRIPT.SETUP command to assign colors 

to specific fonts using 3 numeric values for the amount of red, green and blue to be used. For example: 

POSTSCRIPT.SETUP, FONT4 HELVETICA, COLOR FONT4 .2 .6 .1 

Flushing makes a difference when justification is being done and the section of text is not yet complete. Justification 

is done by adding a tiny bit to the spaces between each word. To figure out how much to add, it is 

necessary to know the length of the text in the current font. The difference between the length of the text and the 

available line width is divided by the number of words in the line. This is the amount used to pad the spaces. 

When a flush occurs in the middle, the total length of the text is not known so the amount must be estimated. 

It is this estimation which causes some of the justified lines to have more space between words that you expect 

looking at the text. When the lines are not justified, the flushing does not affect the spacing within the paragraphs 

even when there are font and color changes. 

NOTE: Because the postscript commands do not have the actual font tables available, the spacing is based on 

estimates. The use of capitalized words in a justified line may cause overprinting. The flushing of the line that 

occurs when font changes are made can also contribute to imperfections in the spacing. The leading between lines


is also based on the pointsize of the fonts. Changing font sizes in the middle of paragraphs will also cause the 

leading to change. 

11.32 Underlining Text 

It is easy to underline items in tables when using the @CINCH, @LINCH, @RINCH and @PINCH control 

words. Each of these has an underline format which is the control word followed by “.U” to indicate underlining. 

To underline text in the middle of a sentence it is necessary to indicate where the underlining starts and where it 

ends. This is done with @UNDERLINE and @NOUNDERLINE. 

Figure 11.18 illustrates the output if @UNDERLINE and @NOUNDERLINE are added to the font changes 

for variable Score. 

PUT @FONT4 @UNDERLINE; 

PUT Score; IF Sentence NE 1, PUT ; 

PUT @FONT2 @NOUNDERLINE ; 

The code which figures out where to break up a line looks for blanks or the end of a chunk of text. In the example 

above if the decimal point that ends the sentence is separated from the word that precedes it by a font, color or 

underline control word, it may well end up by itself on the next line. 

There are three ways to emphasize important text. 

1. Change the font to an italic or bold typeface 

2. Change the color of the text. 

3. Underline the text. 

These methods are not exclusive and you may if you wish use all three to produce for example text that is red, 

italic and underlined. 

PUT @FONT6 @RED @UNDERLINE 

@FONT1 @BLACK @NOUNDERLINE ; 

@UNDERLINE and the “.U” control words all underline from the start of the designated string to the end of 

that string. With @UNDERLINE that can be many lines down a page. Underline ends when @NOUNDERLINE 

or an @NEXT, @SKIP or @PARA control word is encountered. It is not affected by a color or font change. 

The @DRAW.U control word underlines from one specific location on a line to another specific location on 

that line. The position of that underline is usually 3/72 of an inch below the current line unless an argument is 

provided to provide a different distance. The start and end of the underline are determined by the values of @X1 

and @X2 which are initially set to the left and right edge values. 

@X1=3.5 @X2=6 @DRAW.U=5 

draws a line 2.5 inches long beginning 3.5 inches from the left edge of the paper and 5/72 of an inch below the 

current line.


__________________________________________________________________________ 

Figure 11.18 Underlining the Text 












average. Car parking is marginal. Tea pouring is poor.


TEXTWRITER 

Required: 

TEXTWRITER Invoices 

SUMMARY 


PUT @PAGE .DATE. @SKIP1 ; 

IF Date.Paid MISSING AND 

DAYS (.NDATE., 'YYYYMMDD') - 

DAYS (Date.Invoice, 'YYYYMMDD') GT 30, 

PUT @NEXT Inv.Number > 

Amount.Due Date.Invoice 

Company 

> Phone ], 

JUSTIFY, WIDTH 60 $ 

The TEXTWRITER command produces textual reports about the data in a P-STAT system file. TEXT- 

WRITER uses the PUT instruction to place text strings, the values of variables, scratch variables and 

system variables, and the evaluations of expressions in the report. Character strings are enclosed in 

quotes or between the directional signs “ > ”: 


PUT @PAGE .DATE. @SKIP1 ; 

The PUTL instruction may also be used to put variable labels in the text. Other PPL instructions, functions 

and operators may be used for logical testing, recoding, calculations and other tasks. The placement 

of text is controlled using the "@" symbol and control words. 

Text is not right justified unless the identifier JUSTIFY or the control word @JUST specifies right justification. 

The WIDTH identifier or the control word @WIDTH specifies an output width other than the 

current one. 

The previous TEXTWRITER command produces output in the following form: 

Outstanding Invoices as of Apr 16, 1995 

Invoice Number 1260 for $212.55, dated 950211, is past due. 

Please call Smith, Jakes & Row at 215-356-7000. 

TEXTWRITER fn 

specifies the name of the required input file. The filename is followed directly by P-STAT Programming 

Language (PPL) clauses. (No comma follows the filename.) 

fn=file name nn=number cs=character string arg=keyword argument



BLANKS nn 

CASE 

JUSTIFY 

gives the maximum number of blanks that may come between any two words after line justification. 

TEXTWRITER inserts additional blanks after certain punctuation characters and between words, if necessary, 

to justify the line of text. The default setting of BLANKS is four. A smaller number may be 

specified, but justification may be affected. 

specifies that text be flushed and printed at the conclusion of processing of each case, and that a new line 

be started at the start of processing of the next case. All control words are reset at the start of each case. 

This is the assumed mode. 

requests that the text be right-justified as well as left-justified; that is, the lines of text align on the right 

edge as well as the left. When JUSTIFY is not specified, only the left edge of the report is aligned. The 

control words @JUST and @NOJUST may be used in the PPL clauses to override the current justification 

setting. 

LABELS fn 

provides the name of a labels file. If a value in the printout belongs to a numeric variable which is represented 

in the labels file, the text for that value is used instead of the number. Extended variable labels 

in the labels file are ignored. 

LEADBLANK and NO LEADBLANK 

each line of text is usually started with an initial blank. This is used as a carriage control character. If 

the output is not going to a printer you may use the NO LEADBLANK identifier to remove this extra 

blank. LEADBLANK is the assumed setting. 

MARGIN nn 

specifies the number of columns to indent text from the left margin. MARGIN 0 is assumed when the 

MARGIN identifier is not used. Within PPL clauses, the control word @INDENT can be used for additional 

indentation beyond the MARGIN setting. 

OUT fn 

provides the name for a new P-STAT system file which contains the contents in the input file as they are 

modified by the TEXTWRITER PPL. 

PUTL.CHARS 'cs' 

STREAM 

provides 1 to 3 characters which will replace the “ = “ (blank, equal-sign, blank) which usually separates 

the variable name from the value in a PUTL situation. 

specifies the continuous output of text — text is not flushed and printed upon the completion of each 

case. All control words except @INDENT and @WIDTH, which cause the flushing of text, are reset at 

the start of processing of each case. CASE is assumed when STREAM is not specified. 

SPREAD and NO SPREAD 

SPREAD is assumed. It causes a single blank to be placed between adjacent variables. NO SPREAD 

causes the variables to be written out directly one after the other with no intervening space. 

arg=keyword argument fn=file name nn=number cs=character string


WIDTH nn 

gives the number of columns to be used for the report. When WIDTH is not used, the current output 

width defines the number of columns up to a maximum of 400. A specified WIDTH can be from 2 to 

400. The control word @WIDTH does the same thing within PUT instructions. 

WIDTH is measured from the first column. Thus, if MARGIN 20 and WIDTH 80 are specified, there 

are 60 columns available for text. These columns can be referred to using @1 through @60. 

Optional Control Words: 

@nn 

Control words are used in TEXTWRITER to control positioning of text. Any of them, except @IN- 

DENT, @JUST and @WIDTH, may also be used in the PPL that may follow any command for data 

cleaning or producing brief reports. The control words all begin with “@” and many of them are followed 

by a number giving a column location or other value. The number may follow directly after the 

control word (@SKIP2) or may follow after an equal-sign (@SKIP=2). Although the argument directly 

following the control word is typically a number, it may be any expression (within parentheses) that evaluates 

to a numeric value. 

The following control words remain in effect throughout the processing of a case unless they are specifically 

changed or turned off: 

@INDENT @EQUAL 

@WIDTH @MISS 

@JUST @COMMAS 

@TRIM @PLACES 

Prefacing a control word with “NO” (@NOCOMMAS) turns off a prior setting. 

The following control words apply only to the variable expression or character string that directly 

follows: 

@nn @PAGE 

@PLUS @NEXT 

@MINUS @PARA 

@BEFORE @SKIP 

(“nn” represents a positive whole number.) These control words must be reissued to produce the desired 

results again. 

specifies a column location, measured from the start of the line. The column pointer moves to this location 

and the next character is written in this column. 

[PUT @10 'The initial T in this line is in column 10.'] produces: 

The initial T in this line is in column 10. 

123456789012345678901234567890 

(The additional line is a scale and not part of the output.) 

@BEFORE nn 

specifies a column location against which the next output element is right aligned. The text is written 

before the specified column. When no location is given, text is written before the current location of the 

column pointer. 

(PUT @BEFORE30 'The period is in column 29.') produces: 



The period is in column 29. 

123456789012345678901234567890 


[ PUT 'The amount due is: ' 

@BEFORE (' $' // CHARACTER (Amount.Due) ) ; 

produces: 

@COMMAS 

The amount due is: $12.56 

requests that all numeric values be printed with commas inserted every three digits (counting from the 

decimal point to the left). This makes large numbers easier to read. @NOCOMMAS turns off 

@COMMAS. 

@EQUAL nn 

gives the column location of the equal-sign separating a variable name from its value. @EQUAL is used 

with the PPL instruction PUTL which puts a variable name (label) as well as its value in the line of text. 

Multiple locations may be specified: 

(PUTL @EQUAL10:30:50 Last First Age) 

The output line contains the three variables and their values, with the equal-signs in the specified 

columns: 

Last = Wilson First = Margaret Age = 23 

If the spacing is not adequate for the actual length of some of the character variables or the actual width 

of numeric variables, those long values print on the next line. @NOEQUAL turns off the @EQUAL 

specifications. An easy way to print all the variables in your file is to use the system variable .ALL. instead 

of the list of variable names. 

@INDENT nn 

@JUST 

specifies an additional number of columns to indent text from the current margin. The value after @IN- 

DENT is added to the current margin setting and text is indented that many columns from the left. This 

defines the new left margin of the report. (The current margin is that set by the identifier MARGIN in 

the TEXTWRITER command or, if MARGIN is not used, the default value 0.) @IN is an abbreviation 

for @INDENT. @NOINDENT resets the indentation to that specified by the MARGIN identifier or to 

0 if MARGIN was not used. 

requests that the text be right justified as well as left justified — that is, the lines of text be aligned on 

the right edge as well as the left one. @NOJUST turns off right justification, overriding the JUSTIFY 

identifier in the TEXTWRITER command. 

@MISS 'cs' 

defines a character to print to indicate any of the three types of missing values. It is used when characters 

other than dashes are desired to indicate missing values. @MISS1, @MISS2 and @MISS3 specify different 

characters for the three individual types of missing values: 

[ PUT Student.ID Last.Name Course.No 

@MISS1 @MISS2 



@NEXT First.Sec 

@NEXT Second.Sec ] 

@M, @M1, @M2 and @M3 are abbreviations. @NOMISS (or @NOMISS2, etc.) resets the specified 

missing character back to dashes. 

@MINUS nn 

@NEXT 

@PARA 

@PAGE 

requests that the column pointer move left the specified number of columns. The current column location 

minus the numeric argument yields the column in which text will print. @PLUS moves the pointer to 

the right. 

moves the column pointer to the beginning of the next line. Subsequent text is written on this new line . 

When @NEXT is not used, text is written on the current line until it is full and then text continues on the 

next line. 

(Note that this is opposite to what occurs when PUT is used in PPL following commands other than TEX- 

TWRITER. Then, a new line is started for each PUT clause unless an “@” by itself is used to hold the 

column pointer in the current line.) 

requests that a new paragraph start. Subsequent text prints on the next line, beginning in the fourth 

column. 

requests that subsequent text print on a new page. 

@PLACES nn 

gives the number of decimal places to use in printing numeric values. Numbers are rounded if they have 

more than the specified number of decimal places, or zeros are added to pad the numbers if they have 

fewer than the specified number of places. @PL is an abbreviation. @NOPLACES turns off the prior 

places specification. Numbers then print with their actual number of decimal places. (@PLACES0 or 

@PLACES=0 should be used to specify no decimal places or decimal point in the output.) 

@PLUS nn 

requests that the column pointer move right the specified number of columns. @MINUS moves the 

pointer to the left. 

@SKIP nn 

@SPREAD 

@TRIM 

specifies the number of lines to skip before printing text. When no number follows @SKIP, one line is 

skipped. @SK is an abbreviation. 

is assumed and causes a single blank to be inserted between variables. @NOSPREAD causes that blank 

to be omitted. 

requests that lead and trailing blanks be trimmed from character values before they are positioned in the 

text. This is assumed by TEXTWRITER and need not be specified explicitly. @NOTRIM turns trimming 

off. Untrimmed, a character string will occupy as many columns as its defined length, even though 

it may be only one character long or entirely blank. 



@WIDTH nn 

defines the output line width of the report. It overrides any previous output width settings defined by the 

command OUTPUT.WIDTH or the identifier WIDTH in TEXTWRITER. The argument for @WIDTH 

may range from 2 to 400. 

@NOWIDTH turns off the line width setting, which then reverts to that defined by the WIDTH identifier 

or the OUTPUT.WIDTH command. @NOWIDTH resets the line width to the original output width. 

TEXTWRITER and POSTSCRIPT 

Also Required: 

POSTSCRIPT 

Requires additional identifiers following the TEXTWRITER text. 

JUSTIFY, POSTSCRIPT, PORTRAIT, 

LEFT.EDGE 2., RIGHT.EDGE 2., 

FONT1 TIMES 12, FONT2 TIMES BOLD 12 $ 

is required unless the command is included within a PostScript block. A PostScript block begins with a 

POSTSCRIPT command and ends with a POSTSCRIPT.CLOSE command. 

Optional Identifiers for PostScript Output: 

BOTTOM.EDGE nn 

sets the bottom edge the specified number of inches from the bottom of the page. If there is no argument, 

the bottom edge is reset to its beginning value. This is usually 1 inch for all edges unless changed in a 

POSTSCRIPT.SETUP command. The measurements can be fractional. 

FONT arg arg nn 

provides the name, type, and point size for the fonts to be used. A character string in quotes can replace 

the first two arguments to specify an alternate font not supported in the keywords. Available keyword 

combinations are: 

TIMES HELVETICA COURIER 

TIMES BOLD HELVETICA BOLD COURIER BOLD 

TIMES ITALIC HELVETICA OBLIQUE COURIER OBLIQUE 

TIMES BOLDITALIC HELVETICA BOLDOBLIQUE COURIER BOLDOBLIQUE 

FONT1-FONT9 arg arg nn 

provides up to 9 different font/type/size combinations for use in the command. 

LANDSCAPE 

specifies that the orientation of the page is 11 wide by 8 1/2 high 

LEFT.EDGE nn 

PORTRAIT 

specifies the starting location from the left edge in inches. The number may be fractional. If no number 

is supplied the left edge is reset to the beginning value. 

specifies that the orientation of the pages is 8 1/2 inches wide by 11 inches high 



RIGHT.EDGE nn 

specifies the number of inches that the output should be from the right hand edge of the paper. The number 

may be fractional. If it is used without an argument it is reset to the beginning value. 

‘SHOWPAGE and NO SHOWPAGE 

SHOWPAGE is assumed. NO SHOWPAGE is used when you wish to put more than one command on 

a single sheet of paper. 

TOP.EDGE nn 

specifies the starting location of the printout from the top of the paper in inches which may be fractional. 

If no number is supplied, it is reset to the beginning value. 

Optional Control Words: 

@FONT1-@FONT9 

causes an immediate change in the font that is used. That font remains in effect until the next FONTn 

control word is processed. 

@CINCH.U=nn 

@RINCH=nn 

centers and also underlines the string at inch nn. 

puts the TEXTWRITER text right justified at the specified location. This works well for numbers if the 

number of decimal places is controlled. 

@RINCH.U=nn 

@LINCH=nn 

right justifies and underlines the text. 

left justifies text at the specified location. 

@LINCH.U=nn 

@PINCH=nn 

left justifies and underlines the text 

centers the text around a specified lineup character, which is assumed to be a decimal point. This is good 

for writing a column of fractional numbers when the number of decimal places differs. 

@PINCH.U=nn 

like PINCH but it also underlines. 

@PINCH.CHAR='c' 

@FLUSH 

provides an alternate character such as '=' to be used in the pinch lineups. If no argument, it reverts to 

the default '.' . 

flushes the current TEXTWRITER buffer without also moving to the next line. The effect of flushing 

turns off temporary options like @CINCH or @UNDERLINE. It does not affect color settings. 



@X1 

@X2 

@Y1 

@Y2 

@MOVETO 

@DRAW.H 

@DRAW.V 

stores a value that is the current left margin. If there is an argument as @X1=3.4, that inch location is 

stored. in the X1 variable. 

stores a value that is the current right margin. An argument such as @X2=8.3 stores that inch value in 

the X2 variable. 

stores a value that is the current top margin. If an argument is supplied, that value is stored in the Y1 

variable. 

stores a value that is the current bottom margin. If no argument is supplied, that argument is stored in 

the Y2 variable. 

sets the current location to the X1/Y1 position. The next text string will begin at that location. 

draws a horizontal line from the X1/Y1 position to the X2/Y1 position. 

draws a vertical line from the X1/Y1 position to the X1/Y2 position. 

@DRAW.U=nn 

underlines the current line from X1 to X2. nn is the amount below the current line in units of 72nds of 

an inch where the line should be drawn. If no argument is given, the assumed value is 3. 

@DOWN=nn 

@UP=nn 

@TOP 

@BOTTOM 

moves down that many lines. The actual distance depends on the point size of the font and the leading 

(the space between the lines). If nn is not specified, 1 is assumed. 

moves up that many lines. The actual distance depends on the point size and the leading. If nn is not 

specified, 1 is assumed. 

moves to the first line, just below the top margin. 

moves to the last line, just above the bottom margin. 

@LEADING=nn 

specifies the space between lines. LEADING is usually set to 1/72 of an inch. LEADING=3, increases 

the space to 3/72 of an inch. A larger LEADING improves the readability of text when a large point size 

is used. 



@LINEWIDTH=nn 

specifies the width of the lines and boxes that are drawn. LINEWIDTH is usually set at .5. Increasing 

the LINEWIDTH causes bolder looking lines. @LINEWITH with no argument resets it to the original 

value of .5. 

@UNDERLINE 

begin underlining and continue until a subsequent @SKIP, @NEXT or @PAGE control word ends the 

current chunk of output. @NOUNDERLINE can be also be used to end underlining. 

@NOUNDERLINE 

end of underlined text. 

The following control words can be used to control the color of subsequent printing. Color stays in effect until it 

is changed. @NOCOLOR is equivalent to @BLACK 

@RED 

@ORANGE 

@YELLOW 

@GREEN 

@BLUE 

@VIOLET 

@BLACK 

@NOCOLOR 

Color can also be changed by using the POSTSCRIPT.SETUP command to define fonts with specific 

colors. When the font is changed, the specified color will be used. 


12 

P-STAT MACROS 

A macro is a named collection of text that can be inserted at any point in a P-STAT run. It may contain an entire 

command or a series of many commands. It may contain a fragment of a command, subcommand or data record. 

A macro can be changed dynamically at execution by passing keyword or positional arguments. This chapter 

covers: 

1. Macro format 

2. Activating a macro 

3. Types of macros 

4. Keyword arguments 

5. Positional arguments 

6. Using arguments 

7. Default values for arguments 

8. Instream macros 

9. multi-command macros 

10. SUBFILES command 

11. DIALOG command 

12.1 MACRO FORMAT 

MACRO ABC $ 

contents of the macro 

ENDMACRO $ 

A macro has three elements. The MACRO command supplies the name of the macro and may also have argument 

information. It is, in effect, the macro header. The contents or body of the macro may have an indefinite number 

of records (including none). The ENDMACRO command completes the macro and must be the only thing on its 

record. The ENDMACRO command can have a statement label such as: 

EXIT: ENDMACRO $ 

12.2 Types of Macros 

There are two types of macros, BLOCK macros and INSTREAM macros. A BLOCK macro contains one or more 

full-fledged commands which may have subcommands and data records. It is invoked by using the RUN 

command. 

An IN STREAM macro can contain whatever one wishes. Its contents are inserted into a command or subcommand 

wherever !!macname or !!(macname) is found (where macname is the name of the macro). 

Both types of macro can have positional or keyword arguments, and can be defined with default values for 

those arguments. Alternatively, a macro can be defined without any arguments. The commands within a BLOCK 

macro can contain INSTREAM macro calls. INSTREAM macros can contain other instream macro calls.

12.2 P-STAT MACROS 

The following macro contains only one line. Since it does not contain a set of complete commands, it could 

not be used as a BLOCK macro but could, for example, be inserted into a SURVEY subcommand. It would be 

called by using !!vvv in the subcommand. The characters 'age by income ' would replace !!vvv as the subcommand 

is read. 

MACRO vvv $ 

age by income 

ENDMACRO $ 

This next macro contains a block of commands. It could not be used by an instream reference, since that 

would insert several commands within the command or subcommand that contained the !!rrr, which would cause 

syntax errors galore. The correct way to use it is by saying RUN rrr $. Its first three lines are comments. 

MACRO rrr $ 

/* example 1 of macro rrr. */ 

/* no use of arguments. */ 

/* no use of instream macro calls. */ 

CORRELATE data1 [ KEEP age income education ], OUT work1$ 

LIST work1 $ 

ENDMACRO $ 

12.3 Storing and Activating Macros 

Macros can be stored as ordinary ASCII (or EBCDIC) files which can be edited by an external editor. Within 

P-STAT’s editor each macro appears as a single command even when it is a block macro containing many commands. 

The body of the macro is stored as data records for the macro command. A macro with no body will 

appear in the editor with a single data record, the ENDMACRO command. 

A macro must first be activated before it can be used. Activating is done by processing the definition in the 

normal course of processing P- STAT commands. When a macro is activated, information about its arguments is 

acquired and the macro is placed, ready to be used, on a temporary file. The currently active macros can be seen 

by using the SHOWMACROS$ command. 

If the macro is entered from the terminal it is active as soon as the ENDMACRO $ command is processed. 

Macros stored in an external ASCII file are activated by a TRANSFER command. Macros that are stored in 

P-STAT’s edit file format are activated automatically when the OLD.EDIT.FILE command is executed. In the 

editor macros can be changed by editing the data records. The changed macro is activated by using the X (eXecute) 

edit instruction. 

After a block macro is active (i.e., its definition has been read by P-STAT), it can be executed by using the 

RUN command. The RUN command executes the entire series of commands defined in the macro. For example: 

RUN Sales $ 

An instream macro is executed when the command that references it is used. For example, macro VVV defined 

above could be executed by: 

SURVEY Psfile; 

!!vvv ; 

$ 

Figure 12.1 illustrates a command stream that activates three macros. Two are instream macros and one is a 

block macro, which uses the other two. The block macro executes a CORRELATE command and a LIST command. 

The CORRELATE command uses the first instream macro to provide the input file name and the second 

instream macro to select variables. 

The final step in Figure 12.1 is the RUN command which calls the first macro. The block macro then references 

the two instream macros. It does not matter in what order the macros are activated.

P-STAT MACROS 12.3 

__________________________________________________________________________ 

Figure 12.1 Activating Three Macros 

MACRO rrr $ 

/* example 2 of macro rrr, using instream macros. */ 

/* this macro correlates some variables */ 

/* and then lists the result. */ 

CORRELATE !!aaa [ KEEP !!bbb ], OUT work1 $ 

LIST work1 $ 

ENDMACRO $ 

MACRO aaa$ 

data1 

ENDMACRO $ 

MACRO bbb$ 

age income education 

ENDMACRO $ 

RUN rrr $ 

__________________________________________________________________________ 

12.4 Comments Within a Macro 

Comments can be used freely in macros. They are particularly useful at the beginning of a macro to document 

what the macro does, when it was last changed, who maintains it, and so forth. 

Comments start with a /* and end with a */. For example: 

/* this macro correlates some variables 

and then lists the result */ 

is a valid comment, as is 

/*---------*/ 

/* comment */ 

/*---------*/ 

12.5 Macros With Arguments 

The macros shown so far have not had any arguments. The only way to generalize such a macro was by calling 

other macros. That is perfectly legal but often, especially with block macros, generalizing is better done by using 

arguments. There are two types of notation for defining macro arguments: keyword and positional. 

The parentheses after a macro name define its arguments, if any. These arguments are known as DUMMY 

ARGUMENTS. (In some languages they are known as formal parameters.) When a macro is CALLED, each 

occurrence of a dummy argument in the body of the macro is replaced by the associated ARGUMENT VALUE 

in the call. For example: 

MACRO rrr ( file, vars) $ 

defines a macro named rrr with two keyword argument: file and vars. Figure 12.2 illustrates a version of macro 

rrr that has the same effect as the macro in Figure 12.1 except that the names of the P-STAT system file and the 

variables to be used are passed to the macro in the RUN command rather than from the instream macros. However, 

using arguments is a simpler way to allow the macro to be used with differing filenames and sets of variables.


__________________________________________________________________________ 

Figure 12.2 Block Macro With Keyword Arguments 

. MACRO rrr ( file, vars) $ 


/* this macro correlates some variables */ 

/* and then lists the result. */ 

/* it uses KEYWORD arguments instead */ 

/* of calls to other macros. */ 

CORRELATE &file [ KEEP &vars ], OUT work1 $ 

LIST work1 $ 

ENDMACRO $ 

RUN rrr ( data1, age income education ) $ 

__________________________________________________________________________ 

The ‘&’ is used to identify keywords to be replaced within the macro. Since ‘file’ was defined as the first 

keyword dummy argument, every use of &file within the macro is replaced by the first argument value. Similarly, 

&vars is replaced by the second argument value. &(file) and &(vars) can also be used; these specify the keyword 

more precisely. Argument values are separated by commas. 

( data1, age income education ) 

Thus data1 is a single argument and since it is the first argument it is associated with the keyword “file”. The 

second argument “vars” receives the entire string “age income education”. The following is what is actually 

executed: 

CORRELATE data1 [ KEEP age income education ], OUT work1 $ 

LIST work1 $ 

Figure 12.3 illustrates the same macro using positional arguments. This is done by providing the number of 

arguments in the parenthesis after the macro name. 

MACRO rrr ( 2 ) $ 

When positional arguments are used, the body of the macro contains the position preceded by the “&”. Thus wherever 

&1 or &(1) is found within the macro the first argument value found in the call will be used. 

__________________________________________________________________________ 

Figure 12.3 Block Macro With Positional Arguments 

MACRO rrr ( 2 ) $ 


/* the same thing using POSITIONAL arguments */ 

CORRELATE &1 [ KEEP &2 ], OUT work1 $ 

LIST work1 $ 

ENDMACRO $ 

RUN rrr ( data1, age income education ) $ 

__________________________________________________________________________


12.6 Using Arguments 

There are a few simple rules to follow when using arguments in a macro. 

1. A keyword for a dummy argument should start with a letter, contain letters, digits and decimal 

points, and have no more than 16 characters. For example, FILE and VARS in Figure 12.2. 

2. Each such keyword should be found at least once in the macro, preceded by an ampersand (&). 

3. Similarly, if macro ppp(4)$ were used, the macro should contain at least one usage each of &1, &2, 

&3 and &4. 

4. The keyword or integer can be within parentheses, like &(file) or &(2). 

5. There can be as many as 150 keywords or positional arguments. I.e., macro xxx(150)$ is possible. 

The order in which they are found in the body of the macro does not matter. 

6. Usages of the first &keyword are replaced by the first argument value, usages of the second &keyword 

by the second argument value, etc. 

7. Positional macros behave similarly. Usages of &1 are replaced by the first argument value, usages 

of &2 by the second argument value, etc. 

8. The number of dummy arguments given in the definition must be the same as the number of argument 

values supplied when the macro is called. 

There are similar rules for the actual arguments used when the macro is invoked: 

1. Argument values are separated by commas. Argument values of 11 and 22 for macro zzz would be 

conveyed by saying 

RUN ZZZ (11,22)$ or !!ZZZ(11,22) or !!(ZZZ)(11,22) 

2. Argument values can be quoted. This is necessary when the value contains a comma or right parenthesis 

or a form of quote. Thus, "john's house" is valid, as is 'xx"xx'. Either quote(") or apostrophe 

(') can be used unless that character is part of the value, in which case the other should be used to 

bound the value. Suppose you want to pass, literally, 'title text' to a macro. The argument should be 

“'title text'” 

3. A quoted value can be empty, as in !!abc( “”). This is called a NULL value but it is nonetheless a 

value. The associated &keyword in the macro would simply vanish. If !!abc(" ") were used, the 

one blank would replace the &keyword. 

4. If quoted, the value that is used is the contents of the quotes. If not quoted, it is the first nonblank 

through the last nonblank before the comma or right parenthesis. Consider these macro calls both 

of which do exactly the same thing: 

!!ppl ( age income education ) 

!!ppl ( 'age income education' ) 

Both are evaluated as having one argument. Since the defining quotes are stripped as the argument 

is moved into place. However, consider these macro calls: 

!!ppl ( IF age missing, DELETE ) 

!!ppl ( 'IF age missing, DELETE' ) 

The first will be evaluated as having two arguments, because of the comma. The second has but one 

argument. Put a value within quotes if it contains commas, etc. 

5. An argument value can, in one situation, have several actual values, as in 

!!abc ( 'age' 'income' 'education' )


Each of those values must be quoted. These are used when a subcommand record contains the associated 

&keyword and nothing else. That record is discarded and, in its place, a subcommand record 

is written for each of the values. The above would generate three records: one containing age, one 

containing income, and one containing education. 

6. An argument can be omitted. !!zzz( , ) has two omitted arguments, which is allowed only when the 

macro was defined with default values to be used when a call omits a value. Defaults are described 

below. 

7. P(3) or such can be used as an argument value. If P(3) is set to 123.456, those seven characters constitute 

the resulting argument value. In other words, the internal double precision binary number 

currently in P(3) is formatted into ascii characters, and those ascii characters serve as the actual argument. 

The value should not be missing. 

8. #N or ##TOTAL or such can be used as an argument value. The scratch variable can be numeric or 

character. The actual argument is the formatted ascii representation of a numeric scratch variable, 

or the current character value of a character scratch variable. The value should not be missing. 

__________________________________________________________________________ 

Figure 12.4 Macro With Positional Arguments and Default Values 

MACRO sss ( 2 ) ( age, income )$ 

BANNER &1, STUB &2; 

ENDMACRO $ 

SURVEY data2; 

!!sss (,) which becomes: BANNER age, STUB income; 

$ 

_________________________________________________________________________. 

12.7 Default Values for Arguments 

When a macro is defined with arguments, it can contain default values to be used when a call omits one or more 

values. Defaults are placed in parentheses after the keyword or positional parentheses. Since they are to be used 

as argument values when necessary, their syntax is the same as that of the argument values in a macro call. 

MACRO abc ( fff, vvv ) ( work1, )$ 

LIST &fff [ KEEP &vvv ]$ 

ENDMACRO$ 

Macro abc has two arguments. A default is supplied for the first argument. There is no supplied default for 

the second argument. A default value will be used when the call does not supply a value for the argument. For 

example: 

RUN abc( , age income )$ 

has no initial argument because there are only blanks before the initial comma. Since there is no first value to 

replace &fff, a default for that argument must have been included in the definition and will be used now. The 

expansion is: 

LIST work1 [ KEEP age income ]$ 

The defaults are totally ignored if the call has actual values for each argument. The existence of defaults do 

not change the need for a call to indicate the presence or absence of its argument or arguments. Given the macro 

above, these calls are errors: 

RUN abc ( age income )$ that is just one value, and 

the macro has two arguments.


Values are separated by commas. 

RUN abc ( age income, )$ now we have two values, the 

second being explicitly omitted, 

but the definition has no 

default for the second value. 

Figure 12.4 illustrates a macro with 2 positional arguments. Both have default values provided in the definition. 

Because both values are available the macro can be used with no values, one value or both values. A 

definition can also have omitted default values. The following macro has 5 positional arguments. Defaults are 

provided for &1 and &3 but not for &2, &4 and &5. 

MACRO xxx (5) ( aaa,, ccc,, )$ 

Consider the following instream macro call. 

!!mmm ( abc, "", ). 

It has three values. Value 2 is null, but it is still regarded as a value. Only argument 3 would invoke a default 

value. 

___________________________________________________________________________ 

Figure 12.5 Macros Can Call Macros 

. 

MACRO aaa $ 

1 2 3 

!!bbb 

7 8 9 

MACEND $ 

MACRO bbb $ 

11 12 13 

!!ccc 

MACEND $ 

MACRO ccc $ 

101 102 103 

MACEND$ 

MAKE work1, NV 3; 

!!aaa 

$ 

LIST work1$ 

1 2 3 from aaa 

11 12 13 from bbb 

101 102 103 from ccc 

7 8 9 from aaa 

___________________________________________________________________________ 

12.8 Nested Instream Macros 

Macros can call macros. Figure 12.5 illustrates the use of instream macros in which macro aaa is called within 

a MAKE command.


MAKE work1, NV 3; 

!!aaa 

$ 

Macro aaa contains 3 data records. The second record activates another instream macro, bbb. 

MACRO aaa$ 

1 2 3 

!!bbb 

7 8 9 

ENDMACRO $ 

Macro bbb contains 2 data records, one of which is a call to macro ccc. 

MACRO ccc$ 

101 102 103 

ENDMACRO $ 

Macro ccc contains a single data record. Since it does not have a call to another instream macro, the command 

is completed with records taken from the 3 instream macros in the order in which the records are processed. 

There is no rule that prohibits macros from recursion. For example macro ccc could call macro aaa. This will 

cause the MAKE command to continue until it runs out of disk space. 

12.9 Instream Macros in a Command 

A command can have many instream macro calls. They can occur anywhere after the command name and 

before the ending dollar or semicolon. The command text is scanned from its beginning for macro calls after each 

macro insertion. Therefore, its macros can call other macros indefinitely. 

The characters of an instream macro record are inserted through the right-most non blank, possibly with an 

additional padding blank when the record has less than 80 characters. These insertions may extend a command 

by hundreds or even thousands of characters. That causes no problems as long as the command does not exceed 

its maximum size, which in the Whopper/2 version of P-STAT is 50,000 characters. 

12.10 Instream Macros in Subcommands 

A single subcommand can also have many instream macro calls. Subcommand processing, however, is done differently 

due to the limit of 80 characters in a single subcommand record. There are two forms of subcommand 

macro expansion. The first occurs when the macro call is NOT the only thing in the record. For example, 

BANNER !!aaa, STUB !!bbb; 

The expansions are done in an array that can hold 800 bytes, which is ten times the size of a subcommand 

record. As with commands, the array is re-scanned after each insertion so that nested macros are honored. When 

no more macros are found, the array is written in up-to-80 character chunks to the subcommand buffer for use by 

the command that is currently active. Up-to-80 means that each chunk ends with at a reasonable point at or before 

80. Reasonable means breaking at a blank, comma, right parenthesis, etc. 

Different rules prevail when the macro call is the only thing on the subcommand record. First, that record 

vanishes. Instead, a subcommand record is generated for each line of the macro. However, what about arguments 

on one of these lines? 

If the positional argument in the macro (&3 or such) is not the only thing on the line, the record is expanded 

by replacing the argument with its value. This continues until the line has no more arguments. Then it is broken 

into 80's as described above. If the positional argument in the macro (&3 or such) IS the only thing on the line, a 

subcommand record is generated for EACH of the argument's non-null values. 

In an instream macro, the default is to insert the characters through the right most non-blank AND THEN 

ADD ONE BLANK. Consider:


MACRO vars $ 

age 

income 

ENDMACRO $ 

An instream usage might be: 

LIST x[ KEEP !!(vars) ] $ 

Is the !!(vars) replaced by 9 characters (ageincome), or by 11 characters (age income ), or by 160 characters 

(age + 77 blanks and income + 74 blanks)? In other words, do we PAD the records as they are inserted? If so, by 

how much? 

A run begins with the padding default set to one. However, there are several ways to change the default. 

MACRO.PAD n$ is a command that specifies the padding default for macros activated subsequently. N can be 

zero (which sets a no-pad status) or some larger integer, like 1 or 80. 

MACRO XXX (file), PAD n $ causes the pad default for that specific macro to be n (also 0 to 80) rather than 

the current MACRO.PAD setting. 

___________________________________________________________________________ 

Figure 12.6 Instream Macros in Subcommand Records. 

MACRO sss $ 

STUB Q1 TO Q43 

ENDMACRO $ 

MACRO bbb $ 

BANNER Age Income Education 

ENDMACRO $ 

SURVEY work1, ECHO; produces 

!!sss, !!bbb; STUB Q1 TO Q43, BANNER Age Income Education; 

$ 

SURVEY work1, ECHO; produces 

!!sss, STUB Q1 TO Q43, 

!!bbb; BANNER Age Income Education; 

$ 

___________________________________________________________________________ 

A specific record in an instream macro can contain a padding specification which takes precedence over any 

pad default. This is done using back-slashes. If a record in a macro ends with two back-slashes, like 

age \\ 

the characters up to (but not including) the back-slashes will be inserted. The above record will cause the 4 characters 

‘age ’to be inserted. 

The same 4-character insertion would happen with 

age \\ /*a comment*/ 

Either of these would insert just 3 characters: 

age\\ /*a comment*/ 

age\\ 

__________________________________________________________________________


Figure 12.7 Lots of Instream Macros 

. 

MACRO input $ 

ibm.data 

ENDMACRO $ 

MACRO ppl $ 

if age gt 20, retain; 

set region to recode ( region, 99=m ) 

ENDMACRO $ 

MACRO labfile $ 

"ibm.labels" 

ENDMACRO $ 

MACRO date $ 

August 13, 2006 

ENDMACRO $ 

MACRO layout $ 

layout question totals labels body summary, 

places means 3, 

row column percents, 

ENDMACRO $ 

MACRO define $ 

define 'under $20,000' income 1 to 3, 

define 'over $20,000' income 4 to 6, 

ENDMACRO $ 

MACRO stub.banner $ 

stub age income, banner region; 

ENDMACRO $ 

__________________________________________________________________________ 

12.11 Using Lots of Instream Macros 

TRANSFER 'macro.file' $ 

SURVEY !!input [ !!ppl ], LABELS !!labfile ; 

TITLE "this was run on !!date", 

!!layout 

!!define 

!!stub.banner 

$ 

In the above example, a transfer is done first to a file containing the macro definitions that may be needed. 

The SURVEY command uses seven macros; the macro.file should contain those seven and may well contain 

more. Activating a macro which goes unused does not cause any problems; it simply uses a bit more space on a 

temporary scratch-file on disk. Figure 12.7 illustrates what the transfer file might contain.


12.12 MACRO COMMANDS 

Thus far we have seen three commands that are associatied with macros. 

1. MACRO provides the macro name 

2. ENDMACROdefines the end of the macro 

3. RUN executes a block macro 

There are four more useful macro commands: 

4. MACRO.PAD 

5. SHOW.MACROS 

6. COUNT.MACROS 

7. FULL.MACRO.ARGS 

MACRO.PAD 0 $ changes the default padding for records of instream macros activated subsequently. The 

run begins with a default of 1. Values of 0 to 80 can be used. 

SHOW.MACROS $ can be used to display the currently activated macros. This prints the entire contents of 

the activated macros. SHOW.MACROS, NAMES $ can be used to list just the names of the currently activated 

macros. Adding FILE 'filename' causes the output to be written to that file. 

COUNT.MACROS $ simply reports how many macros have been read, how many had errors, and how many 

are usable. Since TRANSFER only reports each macro activation when verbosity is 4, using COUNT.MACROS 

after a transfer to a macro library gives a sense of what went on. 

When a macro has many arguments, some of which may not be present, constructing a call with the proper 

number of null arguments can be tricky. FULL.MACRO.ARGS is a command that can be used to specify whether 

trailing null arguments are required or optional. 

FULL.MACRO.ARGS OFF $ 

turns off the requirement that all macro arguments must be fully represented. Thus in a macro that references 1 

to 12 months and has 12 arguments in its definition 

!!zzz ( Jan, Feb ) 

can be used instead of: 

!!zzz ( Jan, Feb ,,,,,,,,, ) 

This setting works only for the trailing (rightmost) arguments. The command to require fully supplied arguments 

is: 

FULL.MACRO.ARGS $ 

The macro call can still have null arguments or a comma for defaults but there must be something represented for 

every arguments. 

The records in a macro definition should not exceed 80 characters. A macro must be activated before it can 

be used. This is largely automatic. If you TRANSFER to a file which holds all of your macro definitions, the 

result of the transfer is to activate all of the macros it found there. 

12.13 CORRECTING MACROS IN THE EDITOR 

A macro appears in the editor as a single MACRO command which has some number of 'data' records. Its data 

records are, in fact, the rest of the macro. To change it you should modify the text as needed and then EXECUTE 

the macro command. That de-activates the old version and activates the new version.


If a macro appears in the editor as a series of commands, ending with an ENDMACRO$ command, it can be 

changed in the usual way. Then, to activate the changed version, simply EXECUTE the macro command; the rest 

of the macro will automatically be included in the activation. 

12.14 BLOCK MACROS 

A block macro is a named collection of P-STAT commands and subcommands or data records. It is only necessary 

to use the RUN command with the name of the macro in order to execute the entire series of commands. This 

section covers: 

1. The special features of block macros. 

2. SUBFILES controls a loop through a series of commands. The loop is executed once for each subgroup 

found in the SUBFILES input file. SUBFILES can only be used within a block macro. 

3. DIALOG permits a conversation with the user. DIALOG is usually, but not necessarily, used within 

a macro. 

___________________________________________________________________________ 

Figure 12.8 Defining a Block Macro 

MACRO Sales ( Month ) $ 

/* 

TO USE: RUN Sales ( Month )$ 

For Month substitute the 3 letter abbreviation for the current month. 

*/ 

The Sales macro is to be run on the 5th of each month. 

Copies of the report should be sent immediately to all department 

heads and to Sam Knightbridge, Vice President of Sales. 

TITLE 'Sales by Region and Department for the Month of &Month, 2010' $ 

SORT Sales&Month, 

BY Region Department, 

OUT &MonthSales $ 

LIST &MonthSales, 

TOTALS Dollar.Amounts Sales, 

MEANS Dollar.Amounts, 


TITLES $ 

ENDMACRO $ 

___________________________________________________________________________ 

12.15 Executing a Block Macro 

After a block macro is active (i.e., its definition has been read by P-STAT), it can be executed by using the RUN 

command. The RUN command executes the entire series of commands defined in the macro: 

RUN ABC $


RUN also passes character string arguments to the macro. The number of arguments depends on the macro 

definition. The arguments are enclosed in parenthesis and can be either keyword or positional. This sales macro 

has a single keyword dummy argument. When it is executed the run command must provide the actual value to 

be used for that argument. Given: 

MACRO Sales ( State ) $ 

LIST &State $ 

ENDMACRO $ 

The macro is executed by a RUN command such as: 

RUN Sales ( NJ) $ 

___________________________________________________________________________ 

Figure 12.9 The RUN Command and Partial Output 

ECHO $ 

RUN Sales ( Jan ) $ 

TITLE 'Sales by Region and Department for the Month of Jan, 2010' $ 

SORT SalesJan, 


OUT JanSales $ 

Sort on 4 cases completed. 

The largest change in position for any case was 2 positions. 

LIST JanSales, 




COMMAS$ 

Sales by Region and Department for the Month of Jan, 2010 

-- Region : East -- 

-- Department: Clothing -- 

Dollar 

Sales Amounts 

45,265 534,500 

25,430 435,005 

Department ------ --------- 

Total 70,695 969,505 

Department ------------ 

Mean 484,752.50 

___________________________________________________________________________

12.16 Macro Substitution Using Strings 

Figure 12.8 contains a macro to do a monthly report. The macro “Sales” contains a series of commands to process 

sales records on a monthly basis. It contains a comment section which begins with /* and ends with */ . 

The input to the Sales macro is a P-STAT system file with a name such as SalesJan or SalesFeb. In the macro 

the name of the input P-STAT system file is Sales&Month. “&Month” is a string that will change depending on 

the report that is needed. When the macro is executed, the string “&Month" is replaced by the argument value 

provided in the RUN command for the dummy argument Month. &(Month) can also be used. 

RUN Sales ( Jan ) $ 

The substitution is done wherever an ampersand (&) is immediately followed by “Month”, the dummy argument 

in the macro definition. Substitution occurs in commands, subcommands and even in data records. The use 

of the & before each use of the dummy argument ensures that the substitution is only done where it is intended. 

The form of &(Month) can also be used. 

Figure 12.9 contains the RUN command for the Sales macro as well as partial output. Before the RUN command, 

there is an ECHO command. The reason for using ECHO is to see the commands after text substitution has 

occurred. Note: The comment text is not echoed because it is discarded as it is read. 

12.17 Scope of Temporary Scratch Variables 

Temporary scratch variables, such as #N, usually are erased when the command in which they are created ends; 

however, a temporary scratch variable generated in a macro exists for the life of that macro. It is, therefore, available 

for use by all commands in the macro. It is erased only when the macro exits. 

GEN #Hname:C $ 

PROCESS All [ 

IF Hid EQ &Hnum, SET #Hname = Hospital, QUITCOMMAND ] $ 

LIST #Hname [ KEEP Name Age Diagnosis ] $ 

ENDMACRO $ 

RUN ListH ( 1 ) $ 

The PROCESS command is used to search for the first case which has a value of 1 for variable Hid. The value 

of variable Hospital is stored in a character temporary scratch variable and the command terminates. When the 

LIST command is scanned by the P-STAT executive routines the value stored in #Hname is substituted for the 

filename. If it is not a legal name for a P-STAT file an error occurs. 

If a permanent scratch variable is defined for local (within-macro) use, it runs the risk of stepping on a permanent 

scratch variable of the same name used elsewhere in some other way. Having a temporary scratch variable 

be usable across commands within a macro avoids this risk. 

12.18 Scratch Variables and Nested Macros 

Suppose macro AAA begins with: 

SET P(1) = 22 $ 

GEN ##A = 23 $ 

GEN #B = 24 $ 

RUN XXX $ 

P(1) is now 22 and ##A is 23. Since their scope is global, macro XXX can use them and get or change the values 

that macro AAA just set. Also the values do not vanish when macro AAA exits. 

What about temporary scratch variable #B? 

1. as mentioned before, it 'belongs’ to macro AAA. It exists for the commands within macro AAA. It 

vanishes when macro AAA exits. In other words, its scope is local to macro AAA.


2. It also exists for commands in macros called by macro AAA if the called macro does not generate 

its own version of #B. 

If AAA calls XXX, and XXX uses #B without a GENERATE, it gets and can change the #B that belongs to 

macro AAA. If macro XXX does a GENERATE of #B, it now has its own #B, unrelated to the #B in macro AAA. 

This feature can be useful when a macro does some things and calls another macro to finish the task. An example 

of this is a DIALOG macro calling a do-the-work macro, described later. 

12.19 Temporary Files in Macros 

Intermediate files produced in macros are often given names that begin with “MACFILE.” to indicate that they 

are temporary files that are not needed after the macro completes. There can be one to eight characters after the 

“MACFILE”. These files are referenced by their names in the macro, but they are written on disk with names 

composed of the P-STAT prefix for temporary files, “W_”, and some random characters that are generated to produce 

a unique name. If you include a FILES $ command within a macro you will see a display like: 

---------------autosave files: D:\PSFILES-------------------------------- 

| name current previous | 

| | 

|#macfil1.sor W_10eZX3.PS1 | 

| (# indicates a temporary WORK file) | 

------------------------------------------------------------------------- 

Temporary macro files are deleted when the macro finishes. (Other temporary files are deleted when the 

P-TAT session ends.) Figure 12.10 shows the Sales macro with the output from the sort command as a temporary 

file. This is then input to the LIST command. When the ENDMACRO statement is processed, the temporary file 

is erased. 

___________________________________________________________________________ 

Figure 12.10 Macros: Temporary File Names 

MACRO Sales ( Month ) $ 

TITLE 'Sales by Region and Department for the Month of &Month, 2010' $ 



OUT MACFILE.sor $ 

LIST MACFILE.sor, 




TITLES $ 

ENDMACRO $ 

___________________________________________________________________________ 

12.20 Subcommands in Macros 

Figure 12.11 is a variation on the Sales macro with a SURVEY command instead of the LIST command. SUR- 

VEY requires subcommand information. If the table is always the same and only the file varies, then the 

subcommand records can be included in the macro in their final form. If, however, the tables may change, then 

there must be provision for substitution of the subcommands.


In this example, there is provision for 2 subcommand records; one to provide the BANNER (column) information 

and one to provide the STUB (row) information. Each such record is limited to 80 characters. The RUN 

command for this variation would look like: 

RUN Sales( Jan, 

BANNER Region, 

STUB Department ) $ 

Note that “BANNER Region” is a single argument replacing “bvar” in the macro definition. 

__________________________________________________________________________ 

Figure 12.11 Macros: Supplying Subcommands 

MACRO Sales ( Month, bvar, svar )$ 

TITLE 'Sales by Region and Department for the Month &Month, 2010' $ 



OUT MACFILE.sor $ 

SURVEY MACFILE.sor, TITLES; 

PLACES PERCENTS 0, 

&bvar, 

&svar; 

$ 

ENDMACRO $ 

___________________________________________________________________________ 

If the macro does not supply the subcommand punctuation as: 

SURVEY MACFILE.sor, TITLES; 


&bvar &svar 

then that punctuation must be in the arguments provided in the RUN command. Since the punctuation is meaningful 

to the RUN command, these arguments must be enclosed in quotes. 

RUN Sales( Jan, 

'BANNER Region,', 

'STUB Department;' ) $ 

In this type of situation, the block macro might well be designed to use instream macros for the subcommand 

definitions. Instream macros are covered in the previous chapter. Quotes around the arguments are stripped off 

and the contents of the quotes are substituted for the arguments in the macro. This means that you must use double 

quotes if you wish to pass a quoted string. For example: 

MACRO ttt ( t ) $ 

TITLE &t $ 

LIST sales.jan, TITLES $ 

ENDMACRO $ 

RUN ttt ( '".DATE."' ) $ 

If the TITLE command itself contains the quotes: 

TITLE '&t' $ 

The RUN command can be entered without quotes: 

RUN ttt ( .DATE. ) $


___________________________________________________________________________ 

Figure 12.12 Macro with Conditional Execution 

MACRO bvar $ 

/* BANNER aa bb cc, */ 

ENDMACRO $ 

MACRO svar $ 

/* STUB v1 v2 v3 */ 

ENDMACRO $ 

MACRO Sales.Report ( Month ) $ 

/* 

TO USE: 

1. GENERATE ##CONTROL:C = 'LIST', 'SURVEY', or 'BOTH' 

2. RUN Sales.Report ( abc ) $ 

For abc substitute the 3 letter abbreviation for the current month. 

If you are requesting a SURVEY you must supply stub variables in MACRO 

svar and (optionally) banner variables in MACRO bvar. 

*/ 

TITLE 'Sales by Region and Department for the Month &Month, 2010 ' $ 

SORT Sales&Month, BY Region Department, OUT MACFILE.sor $ 

IF ##CONTROL EQ 'LIST' BRANCH Step1 $ 

IF ##CONTROL EQ 'SURVEY' BRANCH Step2 $ 

IF ##CONTROL NE 'BOTH' THEN; 

PUT 'Macro Sales.Report: ##CONTROL must be set to LIST, SURVEY or BOTH' ; 

BRANCH Finish ; 

ENDIF $ 

Step1: LIST MACFILE.sor, TOTALS Dollar.Amounts Sales, 



TITLES $ 

IF ##CONTROL NE 'BOTH' BRANCH Finish $ 

Step2: SURVEY MACFILE.sor, TITLES; 


!!bvar 

!!svar ; 

$ 

Finish: ENDMACRO $ 

__________________________________________________________________________


12.21 Conditional Execution of Commands 

Macro Sales.Report in Figure 12.11 is an enhanced version of macro Sales. When you execute this macro, 

you not only choose your file but which commands you wish to execute. The choice in this example is either a 

LIST command, a SURVEY command or both the LIST and the SURVEY. The choice is made by setting a permanent 

system variable before running the macro. The macro in Figure 12.12 tests the scratch variable and 

branches to the desired command. 

GENERATE ##CONTROL:C = 'LIST' $ 

RUN Sales.Report ( Jan ) $ 

produces a report with just the LIST command. The following commands: 

GENERATE ##CONTROL = 'BOTH' $ 

MACRO svar $ 

STUB Department Region; 

ENDMACRO $ 

RUN Sales.Report ( Jan ) $ 

produce a report with a LIST and then a SURVEY containing two 1-way tables. 

It is the BRANCH PPL instruction which transfers control to the appropriate set of commands. BRANCH is 

followed by the label of the next command to be executed. That label must be at the beginning of a command line 

followed by a colon (:). BRANCH can be used in any command stream to bypass commands. 

__________________________________________________________________________ 

Figure 12.13 Macros: Reversing the Order of Execution 

MACRO Sales.Report ( Month ) $ 


IF ##CONTROL AMONG ( 'LIST' 'BOTH' ) BRANCH Step1 $ 

IF ##CONTROL AMONG ( 'SURVEY' 'REVERSE' ) BRANCH Step2 $ 

PUT 'Macro Sales: Invalid value for ##CONTROL' $ 

BRANCH Finish $ 




TITLES $ 

IF ##CONTROL AMONG ( 'LIST' 'REVERSE' ) BRANCH Finish $ 



!!bvar 

!!svar ; 

$ 

IF ( ##CONTROL EQ 'REVERSE' ) BRANCH Step1 $ 


___________________________________________________________________________


In a macro, BRANCH can be used to either bypass commands or to branch back and execute commands that 

occur earlier in the macro. Thus it is easy to change Sales.Report so that the order of the report, LIST and then 

SURVEY or SURVEY and then LIST, is also controlled. This requires only the ability to branch around the LIST 

and then possibly to branch back. Figure 12.13 contains the changes needed to add this option. 

12.22 DIALOG 

DIALOG is a PPL function which can be used anywhere but is especially useful when you wish to make a macro 

easy for someone else to use. If the macro is designed correctly with DIALOG, it can be run interactively by a 

user who knows little more than the names of the macro and the files or variables he wishes to select. With DIA- 

LOG in place the RUN command for this version of the Sales Report macro is simply: 

RUN sales.report $ 

Using the macro in Figure 12.14, the following messages then appear on the screen. User replies are in bold-faced 

type: 

------------------------------------------------- 

Enter the three letter abbreviation for the month 

feb 

Enter one of the numbers 1-4 for these choices 

1: LIST command only 

2: LIST and SURVEY commands 

3: SURVEY command only 

4: SURVEY and LIST commands 

4 

Enter the names of your column (banner) variables 

region department 

Enter the names of your stub (row) variables 

item1 TO item10 

___________________________________________________________________________ 

Figure 12.14 Macros: DIALOG Provides an Interactive Front End 

MACRO Sales.Report $ 

GEN ##Reply, GEN #Mon:c3 $ 

GEN #bvar:c78 =' ', GEN #svar:c78 =' ' $ 

GEN #bvar2:c80=' ', GEN #svar2:c80=' ' $ 

Prompt1: DIALOG #Mon 

'-------------------------------------------------' 

'Enter the three letter abbreviation for the month' 

HELP 'Expected abbreviations include' 

'jan feb mar apr may jun jul aug sep oct nov dec' $ 

IF .RESPONSE. EQ 0 OR .RESPONSE. EQ -9 BRANCH Finish $ 

IF .RESPONSE. NE 14 BRANCH Prompt1 $ 

IF #Mon NOTAMONG ( 'jan' 'feb' 'mar' 'apr' 'may' 'jun' 

'jul' 'aug' 'sep' 'oct' 'nov' 'dec' ) 

BRANCH Prompt1 $ 

Prompt2: DIALOG ##Reply ' ' 

'Enter one of the numbers 1-4 for these choices'


'1: LIST command only' 

'2: LIST and SURVEY commands' 

'3: SURVEY command only' 

'4: SURVEY and LIST commands' $ 

IF .RESPONSE. EQ 0 BRANCH Finish $ 


IF ##REPLY LT 1 .OR. ##REPLY GT 4 BRANCH Prompt2 $ 

IF ##REPLY EQ 1 BRANCH Do.it $ 

Prompt3: DIALOG #bvar 

'Enter the names of your column (banner) variables' $ 

IF .RESPONSE. NOTAMONG ( -2 14 16 ) BRANCH Prompt3 $ 

IF .RESPONSE. NE -2 SET #bvar2 = 'BAN' /// LRTRIM ( #bvar ) // ',' $ 

Prompt4: DIALOG #svar 

'Enter the names of your stub (row) variables' $ 

IF .RESPONSE. NOTAMONG ( -2 14 16 ) BRANCH Prompt4 $ 

IF .RESPONSE. NE -2 SET #svar2 = 'STUB' /// LRTRIM ( #svar ) // ',' $ 

Do.it: RUN Report.Step2 ( #mon, #bvar2, #svar2 ) $ 

FINISH: ENDMACRO $ 

MACRO Report.Step2 ( Month, bvar, svar ) $ 

TITLE 'Sales by Region and Department for the Month &Month, 2010 ' $ 


IF ##REPLY EQ 1 OR ##REPLY EQ 2 BRANCH Step1 $ 

IF ##REPLY EQ 3 OR ##REPLY EQ 4 BRANCH Step2 $ 




TITLES $ 

IF ##REPLY EQ 1 OR ##REPLY EQ 4 BRANCH Finish $ 



&bvar 

&svar ; 

$ 

IF ##REPLY EQ 4 BRANCH Step1 $ 


__________________________________________________________________________


In order to supply this friendly front end, the Sales.Report macro is rewritten as “Report.Step2” and a new 

Sales.Report macro is designed which prompts for the information it needs. It uses this information to build the 

RUN command for Report.Step2. Figure 12.14 lists the new Sales.Report macro and the revised Report.Step2. 

Report.Step2 is very like the previous Sales.Report except that the character ##CONTROL variable is replaced 

by the use of the ##REPLY numeric scratch variable. The SURVEY command is also slightly changed so 

that the user need only know the names of the variables that define the rows and columns rather than rewrite the 

supporting instream bvar and svar macros. 

There is a great deal of overhead in a DIALOG macro if you wish to provide for all the possible responses 

that a user may make. There should be provisions for QUIT. Help text and tests for appropriate replies should be 

provided whenever possible. 

12.23 Format of the DIALOG command 

The DIALOG command has a scratch variable and some number of lines of text enclosed in quotes. The 

scratch variable is required only if a reply is expected. Each line of text is displayed on a separate line on the 

terminal. The lines of text can contain scratch variables. If so, their current values are displayed. 

Optional HELP text is also part of the DIALOG command. This is not displayed unless the user requests it 

by entering either “H” or “HELP” in reply to the prompt. The keyword “HELP” separates the normal DIALOG 

text from the HELP text. In Figure 12.14, the first DIALOG command: 

Prompt1: DIALOG #Mon 

'-------------------------------------------------' 




contains a scratch variable, 2 lines of text, and the HELP key word followed by 2 lines of help text. Note: the 

scratch variable must be created before the DIALOG command. 

There are two mechanisms for examining a user reply. The first is the user reply which is stored in the DIA- 

LOG scratch variable. The second is a numeric system variable .RESPONSE. which contains a code indicating 

the type of the reply. .RESPONSE. is set each time DIALOG is executed. .RESPONSE. values are: 

negative: no response, or an invalid response: 

-2 = entirely blank 

-4 = H or HELP, but the dialog had no help text 

-6 = 'abc' for a numeric scratch variable, or such 

-8 = a scratch variable was not supplied 

-9 = in batch mode 

zero: the response was Q or QUIT 

positive: a valid response: 

1 = integer, like 1990 

2 = non-integer, like 3.1416 

11 = Y or YES 

12 = N or NO 

14 = character response other than yes/no/quit 

that is a legal p-stat name or label 

16 = other character response 

In Figure 12.14 the code which looks at the user reply first checks to see whether QUIT was entered and to 

make sure that the macro is not being inappropriately used in a batch run. 

IF .RESPONSE. EQ 0 OR .RESPONSE. EQ -9 BRANCH Finish $


The next check is to make sure that the response is a single word: 


The final check tests the character scratch variable #Mon to make sure that it is one of the 12 months. 

IF #Mon NOTAMONG ( 'jan' 'feb' 'mar' 'apr' 'may' 'jun' 

'jul' 'aug' 'sep' 'oct' 'nov' 'dec' ) 

BRANCH Prompt1 $ 

These checks are not as complete and informative as they might be. In the example above the BRANCH 

might better have been preceded by: 

PUT 'Incorrect reply. Use H to get a list of the 3 character months', 

The following provides a more complete diagnostic of the problem when DIALOG is run in a batch job: 

IF .RESPONSE. EQ -9 PUT 

 

 

; 

GO TO FINISH; 

Note the use of ##REPLY which is a permanent scratch variable when the other scratch variables in the 

Sales.Report macro are generated with a single # sign as temporary scratch variables. If ##REPLY is generated 

as a temporary scratch variable, Report.Step2 cannot be run as a standalone macro without the front end dialog. 

The other information that it needs, the month and the stub and banner variables are passed to it as arguments and 

it does not matter whether the RUN command comes from the dialog macro or from a standalone RUN command. 

With ##REPLY as a permanent scratch variable, the macro can be run interactively with a dialog or in a batch 

command stream. 

The other three prompt sections in Figure 12.14 are all similar to the first prompt section. In each case the 

essentials are in place, but improvements could be made to the error handling. 

There is one tricky piece in preparing the character string arguments for the Report.Step2 macro. The problem 

occurs when passing a character string to a macro if that character string contains a comma. If a string is not 

enclosed in quotes when it is given to the RUN command the comma which is needed in the macro instead serves 

as a delimiter between the arguments of the RUN command. If it is enclosed in quotes, the quotes are stripped off 

by the MACRO command after the string is properly stored. 

Because quotes are stripped off as the RUN command is processed, a string that requires quotes within the 

macro must be enclosed in double quotes or angle brackets. For example 

MACRO small ( t ) $ 

TITLES &t $ 

LIST myfile, TITLES $ 

ENDMACRO $ 

RUN small ( ) $ 

12.24 Does the File Exist 

A user friendly macro can also check that the files, which are referenced in the macro, exist, and if they do not 

provide a reasonable error message. The P-STAT command INQUIRE.EXTERNAL is used to test the existence 

of a given external file. It sets a system variable, .XINQUIRE. to 1 if the file is there and to 0 if it is not there. If 

the Sales.Report macro used a labels file named 'report.lab' we could check its existence: 

GEN #LABNAME = "'report.lab'" $ 

INQUIRE.EXTERNAL #LABNAME $ 

IF .XINQUIRE. EQ 1, BRANCH OK $ 

DIALOG 'Labels file #LABNAME is needed.', 

OK: etc.


The existence of a P-STAT system file can also be tested. INQUIRE ABC $ set .INQUIRE. to 1 if it exists, 

and to zero if it does not. 

12.25 SUBFILES 

The SUBFILES command is a major feature which is only available within macros. SUBFILES provides a BY 

capability for all the commands within its provenance. SUBFILES is similar to MACROS in that its domain begins 

and ends with a P-STAT command. For SUBFILES, the ending command is ENDSUBFILES $. 

Figure 12.15 Macro With SUBFILES 

___________________________________________________________________________ 

MACRO Sales ( Month )$ 

SUBFILES Sales&Month, BY Region $ 

SORT SUBFILE, BY Department, OUT Work $ 

TITLES 'Report by #Region for &Month 2010' $ 

LIST Work, BY Department, 

NO.CASES, TOTALS, 

TITLES $ 

ENDSUBFILES $ 

ENDMACRO $ 

___________________________________________________________________________ 

Figure 12.15 contains the commands for a simple macro with a SUBFILES command. The macro prints a 

separate report for each value of the BY variable Region. The file does NOT have to be sorted on the BY variable. 

The TITLES command refers to #REGION, a scratch variable that appears to be undefined. That is because it is 

defined behind the scenes by the SUBFILES command. For every BY variable a scratch variable is created which 

has the same name as the BY variable with the single # prefix. These scratch variables contain the current value 

of each BY variable as the SUBFILE iterations are done. 

The input to the SUBFILES command is a P-STAT system file. This will usually be the only time that file 

is referenced in the SUBFILES loop. The file name "SUBFILE" is used to refer to the current subgroup that is 

being processed regardless of the original input file name. 

It is the value of the scratch variables which allow you to easily identify which of the many possible subgroups 

is currently being processed. These are temporary scratch variables but since they are defined within a MACRO 

command, they exist as long as the macro is being processed. This means that the values are available throughout 

the subfile process. These scratch variables can be used in TITLES and in PUT statements. 

The SUBFILES command needs to know the name of the input file and the names of the BY variables There 

can be up to 15 different BY variables and there can be a mixture of numeric and character variables. Thus it is 

possible to have hundreds of different groups defined by all possible combinations of the by group variables. For 

each such group a pass is made through all the commands within the current SUBFILE. 

12.26 SUBFILES Optional Identifiers 

The SUBFILES command has several optional identifiers. Usually the groups are presented in the order in 

which they are encountered in the data file. This can be controlled by using the identifiers UP or DOWN. The 

following illustrates how these identifiers work with two BY variables, one numeric and one character. The first 

pair of columns is the order in which the initial case of each subgroup is found in the input file. The second pair 

of columns is the way the groups are organized if UP is used. The third pair of columns illustrates the DOWN 

order.


Natural Order UP Order DOWN Order 

2 West 1 North 3 South 

1 North 1 West 3 East 

3 South 2 East 2 West 

2 East 2 North 2 South 

1 West 2 South 2 North 

2 North 2 West 2 East 

3 East 3 East 1 West 

2 South 3 South 1 North 

FREQUENCIES is a SUBFILES identifier that causes the groups to be displayed according to the number of 

cases in the group. When FREQUENCIES is used, DOWN is assumed unless UP is specified. When DOWN is 

used the group with the largest number of cases is first. When UP is used that group is last. 

SUBFILES Myfile, BY Age Region, FREQUENCIES, UP $ 

Character variables are considered a match if the characters are identical even if the case of the characters is 

different. The identifier EXACT can be used. When EXACT is used, a value will be considered part of a new 

group unless all the characters are exactly the same in every way. South is different from SOUTH which is different 

from south, and so on. 

The final identifier is the GROUPS identifier which is followed by the name of a file of group definitions. 

Usually this file is generated for you and stays behind the scenes. Figure 12.16 shows the commands that are actually 

executed when the Sales macro with SUBFILES is run. 

There are two commands that do the work in a SUBFILES loop. LOCATE.GROUPS reads through the input 

file and determines how many groups there are and how many cases are in each group. It also notes where thefirst 

and last cases in each group are located in the file. A GROUPS file from the LOCATE.GROUPS command with 

a single BY variable with just 2 values might look like: 

Number 

First Last of Compare 

case case cases Region mode 

1 22 15 West not exact 

6 24 9 East not exact 

Figure 12.16 The SUBFILE Commands 

___________________________________________________________________________ 

SUBFILES Salesfeb, BY Region $ 

LOCATE.GROUPS Salesfeb, by Region, 

verbosity 1, groups WORK0032 $ 

SUBNEXT Salesfeb [ cases 1111 to 9999], 

groups WORK0032, out subfile $ 

TITLES 'Report by #Region for feb 2010' $ 

SORT SUBFILE, BY Department, OUT WORK $ 

LIST WORK, BY Department, TITLES, TOTALS$ 

Report by West for feb 2010 

-- Department: Clothing --


( Rest of report for West follows ) 

ENDSUBFILES $ 

SUBNEXT Salesfeb [ cases 1111 to 9999], 

groups WORK0032, out subfile $ 

TITLES 'Report by #Region for feb 2010' $ 

SORT SUBFILE, BY Department, OUT WORK $ 

LIST WORK, BY Department, TITLES, TOTALS$ 

Report by East for feb 2010 

-- Department: Clothing -- 

( Rest of report for East follows ) 

ENDSUBFILES $ 

MACDONE$ 

___________________________________________________________________________ 

12.27 SUBFILES Looping 

The second command, SUBNEXT, controls the looping. It keeps track of the current group and, using the 

GROUPS file from the LOCATE.GROUPS command, creates a subset of the original data file which contains 

just the members of the current group. The SUBNEXT command which appears to be: 

SUBNEXT SalesFeb [ CASES 1111 TO 9999 ] is executed as if it were 

SUBNEXT SalesFeb [ CASES 1 TO 22 ] for the first group and 

SUBNEXT SalesFeb [ CASES 6 TO 24 ] for the second group. 

This enables the SUBNEXT command to work very efficiently, especially if the file is already partially or fully 

sorted. 

The output file from SUBNEXT is always written to a file with the name "SUBFILE". This explains why 

the input to the SORT in Figure 12.16 is file "subfile". It is not a magic name out of nowhere, it is an actual temporary 

file that is created to contain the current subgroup. 

The SUBNEXT command is internal to SUBFILES and cannot be executed by a user. The LO- 

CATE.GROUPS command, on the other hand, can be executed at any time during a run and provides an easy way 

to determine the number of cases in the subgroups of a file. You can run the LOCATE.GROUPS command before 

a SUBFILE loop. The final identifier to the SUBFILES command is the GROUPS identifier which, if used, requires 

the name of a GROUPS output file from a previous LOCATE.GROUPS command. 

Because the GROUPS file is a P-STAT system file which can itself be modified with PPL there is yet further 

control over the groups that are processed. For example: 

MACRO Grouper $ 

LOCATE.GROUPS Myfile, BY County State, GROUPS MyGroups $ 

SUBFILES Myfile, BY County State, 

GROUPS MyGroups [ if Number.of.cases LT 20, EXCLUDE ] $ 

This will cause all the small groups to be omitted from the rest of the SUBFILE loop. 

The SUBFILES identifiers UP and DOWN apply to all the BY variables. If the order that you want is UP on 

one variable and DOWN on another, that can be accomplished by using the LOCATE.GROUPS command followed 

by a SORT command. 

LOCATE.GROUPS Myfile, BY Sales State, GROUPS Mygroup $


SORT Mygroup, BY Sales (D) State (U), OUT Mygroup $ 

SUBFILES Myfile, GROUPS Mygroup $ 

If you use LOCATE.GROUPS to create your own GROUPS file, you may use any of the SUBFILES identifiers 

UP, DOWN, FREQUENCIES, and EXACT in the LOCATE.GROUPS command. However, if you provide 

your own GROUPS file to the SUBFILES command, you cannot use BY, UP, DOWN, FREQUENCIES or EX- 

ACT in that SUBFILES command. 

If the LOCATE.GROUPS command has PPL which deletes cases, the GROUPS file no longer describes the 

original input file. If you then use SUBFILES with the new GROUPS file and the original input file, the cases 

selected will not be the correct cases. The solution is easy. Add the OUT identifier to the LOCATE.GROUPS 

command to produce a file that corresponds to the GROUPS file. 

LOCATE.GROUPS Myfile [IF Department LT 10, EXCLUDE], 

BY Sales State, GROUPS Mygroup, OUT Temp $ 

SORT Mygroup, BY Sales (D) State (U), OUT Mygroup $ 

SUBFILES Temp, GROUPS Mygroup $ 

12.28 SUBFILES System Variables 

There are three system variables that are set by the SUBFILES command. 

1. .SUBFILEPASS.counts the number of times through the subfile loop. 

2. .SUBFILEMAX.the total number of iterations to be done. This is the same as the total number of 

groups. 

3. .SUBFILECASESthe number of cases in the current group. 

These variables can be used to provide different paths depending on their values. The following is a simplistic 

macro which uses all three variables: 

MACRO Counter $ 

GEN #Big $ 

SUBFILES Myfile, BY Region $ 

IF .SUBFILEPASS. EQ 1, SET #BIG = 0, PUT $ 

IF .SUBFILECASES GT 260 INCREASE #BIG $ 

IF .SUBFILEPASS. EQ .SUBFILEMAX. 

PUT #BIG >$ 

ENDSUBFILES $ 

ENDMACRO $


MACRO 

SUMMARY 

The MACRO command provides a name for a collection of P-STAT text. There are two type of macros. 

Block macros contain one or more P-STAT commands. They are executed with the RUN command. 

Instream macros contains pieces of command, programming language (PPL), subcommands or data. 

A block macro is a named collection of P-STAT commands and data records. The MACRO command 

supplies a name for the macro and defines any macro arguments. The arguments can be either keyword 

or positional. It is followed by one or more P-STAT commands, subcommands and data records. A macro 

is neither checked for syntax nor executed when it is defined. 

The RUN command executes the entire series of commands that comprise the macro. The RUN command 

passes the true values for each of the macro arguments. If the arguments are keywords, substitution 

is done whenever the keyword preceded by an ampersand (&month) is found in the macro text. When 

the arguments are positional, substitution is done for &1, &2, etc. 

File names in macros, prefaced with “MACFILE.”, are temporary files that disappear after the macro 

finishes. 

A set of macro definitions can be created and modified in a simple ASCII file using a text editor. The 

macros are then made available to a P-STAT run by doing a TRANSFER to that file. 

A macro appears in the P-STAT editor as a single command. Its commands are stored as data records to 

the macro command. A macro can be edited just as any other command is edited. It must then be reexecuted 

(X) from within the editor for the changes to take effect. 

Macros support both keyword and positional arguments. Default values can be provided. If defaults are 

not provided in the macro definition, values must be supplied when the macro is used. 

MACRO rrr ( file, vars) $ 

CORRELATE &file [ KEEP &vars ], OUT work1$ 

LIST work1 $ 

ENDMACRO$ 

Instream macros are executed by providing the name preceeded by !! (two exclamation points). 

MACRO survey.def $ 

layout question totals labels body missing summary, 

places means 3, places percents 2, 

row.totals on right, 

ENDMACRO $ 

SURVEY PsFile; 

!!survey.def 

BANNER Age Education, STUB Q1 TO Q43; 

$ 

MACRO Sales ( Month, Region ) ( jan, east ) $ 

MACRO Sales ( 2 ) ( jan, '' ) $ 

RUN rrr ( data1, age income education ) $


Required: 

MACRO name $ 

Optional Identifier: 

PAD nn 

This specifies the default padding for instream records as they are inserted. 

ENDMACRO 

RUN 

ENDMACRO $ ends the macro definition. 

RUN SALARY $ 

RUN SALES ( Sept ) $ 

The run command causes a block macro to be executed. Argument substitution is supported. 

FULL.MACRO.ARGS 

All arguments must be supplied when a macro is called. An argument can be a replacement value, a null 

value, or a comma if default values are available. This is the default. 

FULL.MACRO.ARGS OFF 

The commas for trailing arguments need not be supplied if defaults are available. 

COUNT.MACROS 

COUNT.MACROS simply reports how many macros have been read, how many had errors, and how 

many are usable. Since TRANSFER only reports each macro activation when verbosity is 4, using 

COUNT.MACROS after a transfer to a macro library gives a sense of what went on. 

SHOW.MACROS 

Optional: 

NAMES 

SHOW.MACROS can be used to display the currently activated macros. This prints the entire contents 

of the activated macros. 

SHOW.MACROS, NAMES $ can be used to list the names of the currently activated macros.


FILE “fn” 

Name for an external file where the SHOW.MACRO command is to put its results. 

SHOW.MACRO, FILE “MyMacros” $ 

MACRO.PAD nn 

is a command that specifies the padding default for macros activated subsequently. N can be zero (which 

set a no-pad status) or some larger integer, like 1 or 80. 

SUBFILES 

Required: 

begins a SUBFILES loop. The SUBFILES command can only be used within a macro. 

SUBFILES Myfile, BY County State, FREQUENCIES $ or 

SUBFILES Myfile, GROUPS Mygroups $ 

SUBFILES fn 

provides the name of the P-STAT system file 

BY vn vn 

provides the names of the BY variables. Up to 15 BY variables may be cited. A SUBFILES loop is done 

for each subgroup that is defined by the different values of the BY variables. The groups are usually processed 

in the order in which they occur in the input file. 


DOWN 

EXACT 

specifies that the groups are to be organized in descending order of the BY group values or, if FRE- 

QUENCIES is used, by descending size. 

specifies that character variable must match not only in their spelling but also in the case of the characters 

to be considered as members of the same group. 

FREQUENCIES 

UP 

specifies that the groups are to be ordered by their frequencies. UP and DOWN can be used to control 

whether the largest or smallest group comes first. 

specifies that the groups are to be organized in ascending order of the BY group values or, if FREQUEN- 

CIES is also used, by ascending size. 

GROUPS fn 

provides the name of a file that was created by a previous LOCATE.GROUPS command. If GROUPS 

is used, none of the other identifiers can be used. The groups file contains all the relevant information.


Subfiles System Variables 

.SUBFILEPASS. 

counts the number of times through the subfile loop. 

.SUBFILEMAX. 

the total number of iterations to be done. This is the same as the total number of groups. 

.SUBFILECASES. 

the number of cases in the current group. 

ENDSUBFILES 

ends a SUBFILES loop 

LOCATE.GROUPS 

Required: 

LOCATE.GROUPS reads a P-STAT system file and counts the number of cases in each of the subgroups 

that are defined by the BY variables. If the PPL deletes any of the cases, the OUT file should also be 

created and used with the GROUPS file in any subsequent SUBFILES commands. 

LOCATE.GROUPS Myfile [ IF Age LT 20, EXCLUDE ], OUT Myfile2 

GROUPS MyGroup, FREQUENCIES $ 

LOCATE.GROUPS fn 

provides the name of the P-STAT system file 

BY vn vn 

provides the names of the BY variables. Up to 15 BY variables may be cited. A SUBFILES loop is done 

for each subgroup that is defined by the different values of the BY variables. The groups are usually processed 

in the order in which they occur in the input file. 


DOWN 

EXACT 

specifies that the groups are to be organized in descending order of the BY group values or, if FRE- 

QUENCIES is used, by descending size. 

specifies that character variable must match not only in their spelling but also in the case of the characters 

to be considered as members of the same group. 

FREQUENCIES 

specifies that the groups are to be ordered by their frequencies. UP and DOWN can be used to control 

whether the largest or smallest group comes first.


GROUPS fn 

UP 

provides the name for an output file containing information about each subgroup. Included are the frequencies 

for each subgroup and the locations of the first and last cases of the subgroup. 

specifies that the groups are to be organized in ascending order of the BY group values or, if FREQUEN- 

CIES is used, by ascending size. 

OUT fn 

provides the name for an output file which is the same as the input file after the PPL, if any, has been 

processed. 

INQUIRE.EXTERNAL 

Required: 

INQUIRE.EXTERNAL 'cs' 

provides the name of an external file in quotes. The results are returned in the system variable .XIN- 

QUIRE. which is set to one if the file is found and is zero if the file is not found. 

INQUIRE 

Required: 

INQUIRE fn 

provies the name of a P-STAT system file. The results are returned in the system variable .INQUIRE. 

which is set to one if the file is found and is zero if the file is not found. 

DIALOG 

DIALOG #Mon 

'-------------------------------------------------' 




The DIALOG command has a scratch variable and some number of lines of text enclosed in quotes. The 

scratch variable is required only if a reply is expected. Each line of text is displayed on a separate line 

on the terminal. The lines of text can contain scratch variables. If so, their current value is displayed. 

Optional HELP text is also part of the DIALOG command. This is not displayed unless the user requests 

it by entering either "H" or 'HELP' in reply to the prompt. The keyword 'HELP' separates the normal 

DIALOG text from the HELP text. 

There are two mechanisms for examining a user reply. The first is the user reply which is stored in the 

DIALOG scratch variable. The second is a numeric system variable .RESPONSE. which contains a code 

indicating the type of the reply. .RESPONSE. is set each time DIALOG is executed. .RESPONSE. values 

are: 

negative: no response, or an invalid response:


-2 = entirely blank 

-4 = H or HELP, but the dialog had no help text 

-6 = 'abc' for a numeric scratch variable, or such 

-8 = a scratch variable was not supplied 

-9 = in batch mode 

zero: the response was Q or QUIT 

positive: a valid response: 

1 = integer, like 1990 

2 = non-integer, like 3.1416 

11 = Y or YES 

12 = N or NO 

14 = character response other than yes/no/quit 

that is a legal p-stat name or label 

16 = other character response

i Index 

Symbols 

^ MATCHES meta-character 9.21 

? MATCHES meta-character 9.21 

? variable name wildcard 2.7, 2.20 

_ MATCHES meta-character 9.21 

- MATCHES meta-character 9.21 

- PPL numeric operator 2.9, 2.25 

* MATCHES meta-character 9.21 

* PPL numeric operator 2.9, 2.25 

** PPL numeric operator 2.9, 2.25 

*/ comment ending 2.20, 3.20 

/ PPL numeric operator 2.9, 2.25 

/* comment beginning 2.20, 3.20 

// concatenate 9.4, 9.31 

/// squeeze concatenate 9.5, 9.31 

\\ MATCHES meta-character 9.21 

& MACRO substitution 12.4 

&& concatenation of character constants 

9.23, 9.31 

# MATCHES meta-character 9.21 

# scratch variables 8.3 

+ MATCHES meta-character 9.21 

+ PPL numeric operator 2.9, 2.25 

< > MATCHES meta-character 9.21 

| MATCHES meta-character 9.21 

$ MATCHES meta-character 9.21 

0 + MATCHES meta-character 9.21 

0 1 MATCHES meta-character 9.21 

1 + MATCHES meta-character 9.21 

1 1 MATCHES meta-character 9.21 

SystemVariables 

.ALL. 3.8, 3.11, 11.14, 11.34 

.CDATE. 6.26 

.CHARACTER. 2.6, 6.17, 6.24 

.COLLECTIONS. 8.27, 8.30 

.COLLECTMAX. 8.27, 8.30 

.COLLECTMIN. 8.27, 8.30 

.COLLECTSIZE. 8.23, 8.27, 8.30 

.COLLECTSUM. 8.27, 8.30 

.CTIME. 6.26 

.DATE. 6.19, 6.24 

.e. 6.14, 6.24 

.FILE. 6.18, 6.24, 8.2 

.G. 2.12, 6.14, 6.24 

.HERE. 3.6, 6.16, 6.24 

.INQUIRE. 12.23, 12.31 

.M. 2.12, 6.15, 6.24 

system variable 2.24 

.M1., .M2., .M3. 6.15, 6.25 

.N. 3.6, 6.16, 6.25 

.NDATE. 6.19, 6.26 

.NEW. 2.6, 6.5, 6.25 

.NTIME. 6.19, 6.26 

.NUMERIC. 2.6, 6.17, 6.25 

.NV. 6.15, 6.25 

.ON. 2.3, 6.25 

.OTHERS. 2.6, 6.25 

.PAGE. 6.19, 6.25 

.PI. 6.14, 6.25 

.PUT. 3.9, 6.18, 6.25 

with system variables 8.11 

.RDATE. 6.26 

.RESPONSE. 12.21, 12.31 

.RTIME. 6.26 

.SUBFILECASES. 12.26 

.SUBFILEMAX. 12.26 

.SUBFILEPASS. 12.26 

.TIME. 6.19, 6.25 

.USED. 6.16, 6.26 

.XDATE. 6.26 

.XINQUIRE. 12.22, 12.31 

.XTIME. 6.26 

( ) MATCHES meta-character 9.21 

[ ] MATCHES meta-character 9.21 

PUT and TEXTWRITER Controls 

@ 

in PUT and PUTL 3.9, 3.19 

in TEXTWRITER command 11.9, 11.33 

@ MATCHES meta-character 9.21 

@BEFORE 

in PUT and PUTL 3.19 

in TEXTWRITER command 11.10, 

11.33 

@COMMAS 



11.34 

@EQUAL 

in PUTL 3.11, 3.19 

in TEXTWRITER command 11.13,

Index ii 

11.34 

@INDENT 


11.34 

@JUST 


11.34 

@LABEL 


in TEXTWRITER command 11.14 

@MINUS 



@MISS 



11.34 

@NEXT 

in PUT and PUTL 3.11, 3.13, 3.19 


11.35 

@PAGE 



11.35 

@PARA 



11.35 

@PLACES 



11.35 

@PLUS 


in TEXTWRITER command 11.8, 11.9, 

11.35 

@SKIP 

in PUT and PUTL 3.13, 3.19 


@SPREAD 


11.35 

@TRIM 

in PUT AND PUTL 3.19 


11.35 

@WIDTH 


11.36 

A 

ABS 

PPL function 6.2, 6.20 

Absolute value function 6.2 

ACOS 


Add dates and times 10.21 

Add variables 

see GENERATE 

Addition operator + 2.9 

MATCHES meta-character 9.21 

ALL 

PPL operator 2.16, 2.23 

AMONG 

PPL operator 2.15, 2.23, 9.31 

AND 


ANY 


Arc cosine function 6.3 

Arc sine function 6.3 

Arc tangent function 6.3 

Arguments 

in a macro 12.3 

ARRAY Commands 

DEFINE.ARRAY 8.7 

DROP.ARRAY 8.7 

SHOW.ARRAYS 8.7 

summary 8.28 

Arrays 

multi-dimensional user-defined 8.1 

user defined 8.7 

ASIN 


Asterisk 


multiplication operator 2.9 

Asterisk, double 

exponentiation operator 2.9 

ATAN 

PPL function 6.3, 6.21

iii Index 

B 

Backslash 


Bernoulli distribution 7.4 

Binary random number 7.1 

Binomial distribution 7.4 

inverse 7.5 

BLANK 


BLANKS 


Brackets 


BRANCH 

conditional execution 12.18 

BY 

in COLLECT function 8.20 

in LOCATE.GROUPS command 12.30 

in SUBFILES command 12.23, 12.29 

C 

C.TRANSPOSE 2.20 

CAPS 


CARRY 


in SPLIT function 8.13 

CASE 


CASES 

PPL instruction 2.1, 2.3, 2.21 

Ceiling function 6.2 

CENTER 

PPL function 9.25 

CHANGE 


CHARACTER 


Character constants 

concatenation with && 9.23, 9.31 

CHAREX 


CHECK 1.3, 1.6 

Chi-square 

distribution 7.4 

inverse 7.5 

CLAG 

character function 6.8 


COLLECT 


BY option 8.20 

CARRY option 8.21 

COLLECT counter 8.20 

complex usage 8.22 

example 9.17 

INDEX option 8.21 

SORT option 8.21 

COMBINATIONS 


Comments 3.20 

in PPL clauses 11.6 

within or between commands 3.14 

COMPARE 1.3, 1.6 

P-STAT system files 1.6 

COMPRESS 


Concatenation 

of files on-the-fly 3.4 

operator // 9.4 

operator /// 9.5 

Conditional execution 

BRANCH 12.18 

CONTAINS 


Control words 


COS 


Cosine function 6.3 

COUNT.GOOD 

PPL function 6.7, 6.23, 9.3, 9.30 

COUNT.MACROS 12.11, 12.28 

CREATE 


CURRENT 

PPL instruction 1.4, 1.6 

CURRENT.DATE function 10.4, 10.20 

CVAL 


CYCLE 

in SPLIT function 8.17

Index iv 

D 

Data 

cleaning 3.8, 8.11 

DATE.LANGUAGE 10.14, 10.23 

DATE.ORDER 10.14, 10.24 

Dates 10.1–10.13 

adding 10.9, 10.21 

changing 10.11, 10.23 

difference between 10.12, 10.23 

extracting 10.10, 10.22 

logical operators 10.16, 10.24 

simple functions 10.3 

DAY.MONTH.YEAR 10.3 

DAY.YEAR.MONTH 10.3 

MONTH.YEAR.DAY 10.4 

YEAR.DAY.MONTH 10.4 

YEAR.MONTH.DAY 10.4 

subtracting 10.9, 10.21 

DAY.WITHIN.WEEK function 10.7, 10.21 

DAY.WITHIN.YEAR function 10.7, 10.21 

DAYS 


Decimal places function 6.10 

DECREASE 


DEFINE.ARRAY 8.7, 8.28 

DELETE 


DES 

in MODIFY command 3.2 

DIALOG 12.19, 12.31 

DIF 


DIF function 6.9 

Difference function 6.8 

Digit extraction function 6.10 

Distribution functions 7.4 

inverse 7.5 

Division operator / 2.9 

DO loops 5.1–5.11 

PPL instruction 5.22 

Double slash 

PPL operator 9.4 

DOWN 


in SUBFILES command 12.23, 12.29 

DROP 


DROP.ARRAY 8.7, 8.28 

DROP.P.VECTOR 8.29 

Dummy variables 

creating 6.4, 6.15 

recoding into one variable 6.5 

E 

ECHO 12.14 

Econometrics 

LAG, DIF functions 6.8 

Enclosures 

in MATCHES operator 9.21 

ENDDO 


ENDIF 


ENDMACRO 12.1, 12.11, 12.28 

ENDSUBFILES 12.23, 12.23 

summary 12.30 

EQ 


Escape characters 


Escape codes, passing 9.14 

Exact string comparisons 2.12 

EXITDO 


EXP 


EXPAND 


Exponentiation function 6.3 

Exponentiation operator ** 2.9 

F 

F distribution 7.4 

inverse 7.5 

FACTORIAL 


FILE 

in SHOW.MACROS command 12.11, 

12.29 

FILE.IN 1.1 

FILES 12.15

v Index 

Filtering a file using PPL 2.11 

FIRST 


FIRST.GOOD 


FISCAL.QUARTER function 10.6, 10.21 

FISCAL.YEAR function 10.6, 10.21 

Floor function 6.2 

FOLD 

in LIST command 2.20 

FONT 


11.36 

FONT1-FONT9 


Fonts 

changing 

in TEXTWRITER 11.22 

FRAC 


Fractional portion function 6.2 

FREQUENCIES 


in SUBFILES command 12.24 

FULL.MACRO.ARGS 12.11, 12.28 

FUZZ 

command 7.8, 7.12 

Fuzzy arithmetic 7.6, 7.12 

G 

GENERATE 

in DO loop 5.13, 5.23 

PPL instruction 2.1, 2.9, 2.21, 9.1 

GOOD 


GOTO 


GROUPS 



GT operator 2.12 

H 

HEX 


I 

IF 

PPL instruction 2.1, 2.11, 2.17, 2.21, 3.7, 

9.2 

IF-THEN-ELSE 5.14–5.18, 5.24 

INCREASE 


INDEX 



INQUIRE 

determine existence of P-STAT system 

file 12.23, 12.31 

INQUIRE.EXTERNAL 12.22, 12.31 

INRANGE 


INT 


Integer function 6.2 

interaction with IF 6.9 

INVBIN 


INVCHI 


Inverse probability functions 7.5 

binomial distribution 7.5 

chi-square distribution 7.5 

F distribution 7.5 

normal distribution 7.6 

Poisson distribution 7.6 

t distribution 7.6 

INVF 


INVNORM 


INVPOIS 


INVT 


IVAL 


J 

JUSTIFY 

in TEXTWRITER command 11.7

Index vi 

K 

KEEP 


L 

LABELS 


Labels 

for PPL statements 3.7 

LAG 


LAG function 6.9 

Lagging function 6.8 

LANDSCAPE 


11.36 

LAST 


LAST.GOOD 


LE 


LEADBLANK 


LEFT 


LEFT.EDGE 


LENGTH 


LIST 

identifiers 

FOLD 2.20 

MAX.PLACES 6.10 

MIN.PLACES 6.10 

LOC 


LOCATE.GROUPS 12.24 

identifiers 

BY 12.30 

DOWN 12.30 

EXACT 12.30 

FREQUENCIES 12.30 

OUT 12.26, 12.31 

UP 12.31 

summary 12.30 

Location function 6.3 

LOG 


LOG10 


Logarithm functions 6.3 

Logical operators 2.23 

date/time 10.16 

LOWER 


LPAD 


LRPAD 


LRTRIM 


LT 


LTRIM 


M 

MACRO 12.1, 12.11 

summary 12.27 

MACRO.PAD 12.11, 12.29 

Macros 

activating 12.2 

arguments 12.5 

default values 12.6 

keyword 12.4 

positional 12.4 

block 12.2 

executing 12.2 

calling other macros 12.7 

comments 12.3 

correcting in editor 12.11 

format of 12.1 

in stream 12.1 

in subcommands 12.8 

in the editor 12.11 

Scratch variable usage 12.14 

storing 12.2 

temporary files 12.15 

using RUN 12.14 

MAKE 1.1 

MAKE.CHARACTER 9.26

vii Index 

MAKE.DATE function 10.4, 10.20 

MAKE.NUMERIC 9.27 

MARGIN 


MASK 

in case, variable selection 2.6 

in SPLIT instruction 8.15 

Masks 

Complex for GENERATE 5.13 

for RENAME and GENERATE 5.24 

MATCHES 


meta-characters 9.20 

MAX 


MAX.GOOD 


MEAN 


MEAN.GOOD 


Meta-characters 

in MATCHES operator 9.20, 9.21 

MIN 


MIN.GOOD 


MISSING 


MISSING1, MISSING2, MISSING3 

PPL operators 2.13 

MOD 

PPL function 3.6, 6.9, 6.22 

MODIFY 1.3, 1.6, 3.2 

identifiers 

DES 3.2 

OUT 3.2, 3.16 

TEMPLATE 3.3, 3.16 

summary 3.16 

Modular function 6.9 

MONTH.CASE 10.15, 10.24 

MONTH.LENGTH 10.24 

MONTH.NAMES 10.15, 10.24 

MONTH.YEAR.DAY 10.3 

Multiplication operator * 2.9 


N 

NAMES 

in SHOW.MACROS command 12.28 

NCOT 


NE 


NEAR 

PPL logical operator 7.8 

NEXTDO 


NO LEADBLANK 


NO SHOWPAGE 


NO SPREAD 


No-break character 

in TEXT.WRITER command 11.2 

Normal distribution 7.4 

inverse 7.6 

Normal random number 7.1 

NOTAMONG 


NOTNEAR 

PPL logical operator 7.8 

NTOKEN 


NUMBER 


NUMBER.E 


NUMBER.W 


NUMEX 


O 

OR 


OUT 

in LOCATE.GROUPS command 12.26, 

12.31 

in MODIFY command 3.16 


OUTRANGE

Index viii 


P 

P vector 1.2, 8.5 

PAD 


Parentheses 


Permanent vector 

see P vector 

Phrases 

PPL 2.1 

PLACES 


Poisson distribution 7.4, 7.6 

PORTRAIT 


11.36 

POSITION 


Positional notation, variables 2.4, 2.8, 6.3 

POSTSCRIPT 


11.36 

POSTSCRIPT.SETUP 11.21, 11.28 

PPL 1.1, 2.1, 3.1, 4.1, 6.1, 8.1, 9.1 

case, variable selection 2.1 

character variables 9.1 

comments 3.20 

concatenation, on-the-fly 3.4 

Date and time summary 10.20 

DO loops 5.1–5.11, 5.22 

exact comparisons of characters 2.12 

generating variables 2.7 

IF tests 3.7 

introduction 1.1 

logical selection 2.11, 2.17 

modifying variables 2.7, 3.1 

order of numeric operators 2.10 

permanent vector 8.5 

phrases with PPL clauses 2.1 

scratch variables 8.3 

size constraints 2.3 

standalone commands 1.3, 1.6, 3.12, 3.17 

summary 1.5, 2.20, 3.16, 4.14, 5.22, 6.20, 

8.28, 9.24 

wildcard notation 2.6, 8.15, 8.24 

PPL command 1.6 

PPL Functions 2.10 

char and numeric 

COLLECT 8.19–8.27 

COUNT.GOOD 6.7, 9.3 

EXPAND 6.11 

FIRST 8.2, 8.10 

FIRST.GOOD 6.7, 9.3 

LAST 8.2, 8.10 

LAST.GOOD 6.7, 9.3 

RECODE 4.3 

SPLIT 8.12–8.19 

SPLIT * 8.24 

VARNAME 3.11 

XRECODE 4.5 

character 9.6 

BLANK 9.10 

CAPS 9.7 

CENTER 9.7 

CHANGE 9.10 

CHARACTER 9.14 

CHAREX 9.14 

CLAG 9.23 

COMPRESS 9.11 

CVAL 9.14 

IVAL 9.14 

LENGTH 9.8 

LOWER 9.7 

LPAD 9.12 

LRPAD 9.12 

LRTRIM 9.12 

LTOKEN 9.10 

LTRIM 9.12 

NTOKEN 9.10 

NUMBER 9.13 

NUMBER.E 9.13 

NUMBER.W 9.13 

PAD 9.12 

POSITION 9.8 

RIGHT 9.7 

RPAD 9.12 

RTOKEN 9.10 

RTRIM 9.12 

SIZE 9.8 

SUBSTRING 9.9

ix Index 

TOKEN 9.9 

TRIM 9.12 

UPPER 9.7 

VARNAME 9.17 

VERIFY 9.8 

XBLANK 9.10 

XCHANGE 9.10 

XPOSITION 9.8 

complex, nested 9.16 

Date/time 

ADD dates and times 10.9, 10.21 

CHANGE dates and times 10.11, 

10.22 

CURRENT.DATE 10.4 

DAY.MONTH.YEAR 10.3, 10.20 

DAY.WITHIN.WEEK 10.7, 10.21 

DAY.WITHIN.YEAR 10.7, 10.21 

DAY.YEAR.MONTH 10.3, 10.20 

DAYS 10.5 

DIF - compare dates and times 10.23 

Difference between dates/times 10.12 

EXTRACT dates and times 10.10, 

10.22 

FISCAL.QUARTER 10.6, 10.21 

FISCAL.YEAR 10.6, 10.21 

MAKE.DATE 10.4, 10.20 

MONTH.DAY.YEAR 10.20 

MONTH.YEAR.DAY 10.3, 10.4, 

10.20 

QUARTER 10.7, 10.21 

REFORMAT.DATE 10.5, 10.20 

SECONDS 10.6, 10.21 

SECONDS.MIDNIGHT 10.6, 10.21 

STATUS.DATE 10.5, 10.20 

SUBTRACT dates and times 10.9 

UNDO.DAYS 10.6, 10.21 

UNDO.SECONDS 10.6, 10.21 

WEEK.WITHIN.YEAR 10.7 

YEAR.DAY.MONTH 10.4, 10.20 

YEAR.MONTH.DAY 10.4, 10.20 

numeric 

ABS 6.2 

ACOS 6.3 

ASIN 6.3 

ATAN 6.3 

CEIL 6.2 

COMBINATIONS 6.11 

COS 6.3 

DIF 6.8 

EXP 6.3 

FACTORIAL 6.3 

FLOOR 6.2 

FRAC 6.2 

HEX 7.7, 7.11 

INT 6.2 

INVBIN 7.5 

INVCHI 7.5 

INVF 7.5 

INVNORM 7.6 

INVPOIS 7.6 

INVT 7.6 

LAG 6.8 

LOC 6.3 

LOG 6.3 

LOG10 6.3 

MOD 3.6, 6.9 

NCOT 4.1 

NUMEX 6.10 

PLACES 6.10 

PROBCHI 7.4 

PROBF 7.4 

PROBIN 7.4 

PROBIT 7.6 

PROBNORM 7.4 

PROBPOIS 7.4 

PROBT 7.5 

RANBIN 7.1 

RANNORM 3.7, 7.1 

RANTABLE 7.1 

RANUNI 7.1 

ROUND 6.2 

SIN 6.3 

SQRT 6.3 

STEP.DOWN 7.7, 7.12 

STEP.UP 7.7, 7.11 

STEPS 7.7, 7.12 

TAN 6.3 


CASES 2.1, 2.3 

using ranges, TO 2.4 

with MASK 2.6 

CURRENT 1.4, 1.6

Index x 

DECREASE 2.8 

DELETE 2.11 

DO loops 5.1–5.11, 5.22 

DROP 2.1, 2.3 


with MASK 2.6 

with wildcard 2.6 

ENDDO 5.23 

EXITDO 5.22 

GENERATE 2.1, 2.8, 5.23, 9.1 

GOTO 3.7 

IF 2.1, 2.11, 2.17, 3.7, 9.2 

missing data 2.18 

T F M prefixes 2.18 

IF-THEN-ELSE 5.24 

INCREASE 2.8 

KEEP 2.1, 2.3, 2.5 


with MASK 2.6 

with wildcard 2.6 

NEXTDO 5.23 

PREVIOUS 1.3, 1.7 

PUT 3.8, 3.9, 11.2 

PUTL 3.11, 11.2 

QUITCOMMAND 3.14 

QUITFILE 3.14 

QUITRUN 3.14 

RENAME 2.19, 2.23, 5.23 

REPEAT 3.6, 7.2 

RETAIN 2.11 

SET 2.1, 2.7, 9.2 

PPL Operators 

character 9.3 

// concatenate 9.4 

/// squeeze concatenate 9.5 

&& concatenation of character constants 

9.23 

CONTAINS 9.4 

MATCHES 9.18 

XAMONG 2.16 

XCONTAINS 9.4 

XEQ 2.12, 9.5 

XMATCHES 9.18 

XNOTAMONG 2.16 

logical 2.12 

ALL 2.16 

AMONG 2.15 

AND 2.13 

ANY 2.16 

DATE.EQ 10.16 

DATE.GE 10.16 

DATE.GT 10.16 

DATE.LE 10.16 

DATE.LT 10.16 

DATE.NE 10.16 

EQ 2.12, 7.7 

fuzzy 7.8 

GE 7.7 

GOOD 2.12 

GT 2.12, 7.7 

INRANGE 2.16 

LE 2.12, 7.7 

LT 2.12, 7.7 

MISSING 2.12 

NE 2.12, 7.7 

NEAR 7.8, 7.12 

NOTAMONG 2.15 

NOTNEAR 7.8, 7.12 

OR 2.13 

OUTRANGE 2.16 

numeric 2.9 

- 2.9 

* 2.9 

** 2.9 

/ 2.9 

+ 2.9 

PPL System Variables 

.ALL. 3.8, 3.11, 11.14, 11.34 

.CHARACTER. 2.6, 6.17 

.DATE. 6.19 

.e. 6.14 

.FILE. 6.18, 8.2 

.G. 2.12, 6.14 

.HERE. 3.6, 6.16 

.M. 2.12, 6.15 

.M1., .M2., .M3. 6.15 

.N. 3.6, 6.16 

.NDATE. 6.19 

.NEW. 2.6, 6.5 

.NUMERIC. 2.6, 6.17 

.NV. 6.15 

.ON. 2.3

xi Index 

.OTHERS. 2.6, 6.15 

.PAGE. 6.19 

.PI. 6.14 

.PUT. 3.9, 6.18, 8.11 

.REPEAT. 6.19 

.RESPONSE. 12.21 

.SUBFILECASES. 12.26 

.SUBFILEMAX. 12.26 

.SUBFILEPASS> 12.26 

.TIME. 6.19 

.USED. 6.16 

.XINQUIRE. 12.22 

PREVIOUS 


Probability functions 7.4 

PROBBIN 


PROBCHI 


PROBF 


PROBIT 


Probit distribution 7.6 

PROBNORM 


PROBPOIS 


PROBT 


PROCESS 1.3, 1.6, 3.9, 3.13 

summary 3.17 

P-STAT Programming Language 

see PPL 

P-STAT system file 

previous or current version 1.3 

PUT 



PUTL 



PUTL.CHARS 


Q 

QUARTER function 10.7, 10.21 

QUITCOMMAND 


QUITFILE 


QUITRUN 


R 

RANBIN 


Random 

assignment 7.3 

data generation 7.1 

number functions 7.1 

sampling 7.2 

with replacement 7.2 

RANNORM 


RANTABLE 


RANUNI 


RECODE 

arguments 4.6 

complex 4.6 

exact matches with XRECODE 4.12 


tests 4.6 

REFORMAT.DATE function 10.5, 10.20 

RENAME 


variables 2.19, 2.23 

REPEAT 


Report writing 3.9 

RETAIN 


RIGHT 


ROUND 


Rounding function 6.2 

RTOKEN 


Index xii 

RTRIM 


RUN 12.2, 12.11, 12.28 

S 

Scratch variables 1.2, 3.12, 8.3 


SDEV 


SDEV.GOOD 


SECONDS function 10.6, 10.21 

SECONDS.MIDNIGHT function 10.6, 10.21 

SET 


SHOW.ARRAYS 8.7, 8.28 

SHOW.MACROS 12.11, 12.28, 12.29 

identifiers 

FILE 12.11, 12.29 

NAMES 12.28 

SHOWPAGE 


11.37 

SIN 


Sine function 6.3 

SIZE 


Size constraints 

PPL modifications 2.3 

Slash 

division operator 2.9 

Slash, back 


Slash, double 


Slash, triple 


SORT 


SPLIT 

example 9.18 


CARRY option 8.13 

CREATE option 8.14 

CYCLE option 8.17 

INDEX option 8.16 

SPLIT * 8.24 

STEP option 8.17 

USE option 8.14 

SPREAD 


SPSS.IN 1.1 

SQRT 


Square root function 6.3 

Standalone PPL commands 1.3, 1.6, 3.12, 

3.17 

STATUS.DATE function 10.5, 10.20 

STEP 


STEP.DOWN 


STEP.UP 


STEPS 


STREAM 


SUBFILES 12.23, 12.23 

identifiers 

BY 12.23, 12.29 

DOWN 12.23, 12.29 

EXACT 12.29 

FREQUENCIES 12.24, 12.29 

GROUPS 12.25, 12.29 

UP 12.23, 12.29 

use of scratch variables 12.23 

SUBSTRING 


SUBTRACT dates and times 10.21 

Subtraction operator - 2.9 

Subtraction sign 


SUM 


SUM.GOOD 


System files 

previous or current version 1.3 

template 3.3 

System variables 6.1

xiii Index 

.M. 2.24 

T 

t distribution 7.5 

inverse 7.6 

Tabled random number 7.1 

TAN 


Tangent function 6.3 

TEMPLATE 

in MODIFY command 3.3, 3.16 

TEXTFILE.IN 1.1 

TEXTWRITER 1.3, 1.6, 11.1 

comments 11.6 

control words 11.8 

@ 11.9, 11.33 

@BEFORE 11.10, 11.33 

@BLACK 11.28 

@BLUE 11.28 

@BOTTOM 11.27 

@CINCH 11.24 

@CINCH.U 11.25 

@COMMAS 11.11, 11.34 

@DOWN 11.27 

@DRAW.BOX 11.26 

@DRAW.H 11.26 

@DRAW.U 11.26, 11.38 

@DRAW.V 11.26 

@EQUAL 11.13, 11.34 

@FLUSH 11.26 

@FONT1-@FONT9 11.22, 11.37 

@GREEN 11.28 

@INDENT 11.10, 11.34 

@JUST 11.10, 11.34 

@L.MARGIN 11.27 

@LABEL 11.14 

@LEADING 11.27 

@LINCH 11.25 

@LINCH.U 11.25 

@LINEWIDTH 11.27 

@MINUS 11.9, 11.35 

@MISS 11.14, 11.34 

@MOVETO 11.26 

@NEXT 11.10, 11.35 

@NOCOLOR 11.28 

@NOUNDERLINE 11.29 

@ORANGE 11.28 

@PAGE 11.10, 11.35 

@PARA 11.10, 11.35 

@PINCH 11.25 

@PINCH.CHAR 11.26 

@PINCH.U 11.26 

@PLACES 11.12, 11.35 

@PLUS 11.8, 11.9, 11.35 

@R.MARGIN 11.27 

@RED 11.28 

@RINCH 11.25 

@RINCH.U 11.25 

@SKIP 11.8, 11.35 

@SPREAD 11.13, 11.35 

@TOP 11.27 

@TRIM 11.10, 11.35 

@UNDERLINE 11.29 

@UP 11.27 

@VIOLET 11.28 

@WIDTH 11.10, 11.36 

@X1 11.26 

@X2 11.26 

@Y1 11.26 

@Y2 11.26 

@YELLOW 11.28 

identifiers 

BLANKS 11.7, 11.32 

BOTTOM.EDGE 11.21, 11.36 

CASE 11.6, 11.32 

FONT 11.21, 11.36 

FONT1-FONT9 11.21, 11.36 

JUSTIFY 11.7, 11.32 

LABELS 11.8, 11.32 

LANDSCAPE 11.21, 11.36 

LEADBLANK 11.7, 11.32 

LEFT.EDGE 11.21, 11.36 

MARGIN 11.7, 11.32 

NO LEADBLANK 11.7 

NO SHWPAGE 11.21 

NO SPREAD 11.7 

OUT 11.8, 11.32 

PORTRAIT 11.20, 11.36 

POSTSCRIPT 11.20, 11.36 

PUTL.CHARS 11.32 

RIGHT.EDGE 11.21 

SHOWPAGE 11.21, 11.37

Index xiv 

SPREAD 11.7, 11.32 

STREAM 11.7, 11.32 

TOP.EDGE 11.21, 11.37 

WIDTH 11.8, 11.33, 11.36 

justification 11.2 

no-break character 11.2 

PUT instructions 11.2 

summary 11.31 

Time functions 10.1–10.13 

TITLES 

system variables, use of 6.26 

TOKEN 


TOP.EDGE 


TRIM 


Triple slash 


U 

UNDO.DAYS function 10.6, 10.21 

UNDO.SECONDS function 10.6, 10.21 

Uniform random number 7.1 

UP 


UPPER 


USE 


using /* and */ 3.20 

V 

V vector 1.2 

Variables 

3 types 1.2 

accessing names 3.11 

across command (P Vector) 8.5 

across-case (scratch) 8.3 

generating names 5.8 

in P-STAT file 1.2, 1.5 

positional notation 2.4, 2.8, 6.3 

reordering 2.5 

scratch 1.2, 1.5 

system 1.2, 1.5 

VARNAME 


Vectors 

dynamic 1.3, 1.5 

P 1.2, 1.5 

V 1.2, 1.5 

VERIFY 


W 

WEEK.WITHIN. YEAR function 10.7 

WEEKDAY.CASE 10.15, 10.24 

WEEKDAY.LENGTH 10.24 

WEEKDAY.NAMES 10.15, 10.24 

Weighting 

integer 3.7 

WIDTH 

in TEXTWRITER command 11.8, 11.33, 

11.36 

Wildcard 

in MATCHES meta-characters 9.19 

in PPL instructions 8.15, 8.24 

in SPLIT instruction 8.15 

in variable selection 2.6 

X 

XAMONG 

PPL operator 2.16, 2.24, 9.3, 9.32 

XBLANK 


XCHANGE 


XCONTAINS 


XEQ 

PPL operator 2.12, 2.23, 9.2, 9.5, 9.32 

XGE 


XGT 


XLE 


XLT 


XMATCHES 


XNE

xv Index 


XNOTAMONG 

PPL operator 2.16, 2.24, 9.3, 9.33 

XPOSITION 


XRECODE 4.12

A Guide to the Language (PPL) P-STAT Programming - P-STAT, Inc.

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?