12.07.2015 Views

Old Friends, New Features: What's new in SUMMARY and ... - NESUG

Old Friends, New Features: What's new in SUMMARY and ... - NESUG

Old Friends, New Features: What's new in SUMMARY and ... - NESUG

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

NEW WAYS TO CONTROL CONTENT AND ORDER IN V7 <strong>SUMMARY</strong> AND TABULATERobert Ray / Information Products / SAS Institute Inc., Cary NCAbstractFor Version 7 of SAS ® software, the <strong>SUMMARY</strong> <strong>and</strong>TABULATE procedures have many <strong>new</strong> options, two ofwhich enhance control over which class levels appear <strong>in</strong>the output <strong>and</strong> the order <strong>in</strong> which they appear. Thepreloadfmt CLASS statement option generates <strong>in</strong>itial classlevel data from user-def<strong>in</strong>ed SAS formats. Theclassdata= proc statement option gives further control byallow<strong>in</strong>g the user to supply a second data set whichdef<strong>in</strong>es all acceptable class levels. Both of these featurescan be used to filter or extend the analysis data <strong>and</strong> bothredef<strong>in</strong>e the mean<strong>in</strong>g of order=data.OverviewHave you ever needed to ensure that a certa<strong>in</strong> set ofclassification levels was present <strong>in</strong> a report even if thoselevels were not all present <strong>in</strong> the <strong>in</strong>put data? Have youever wanted a class level order<strong>in</strong>g that wasn’t availablewith one of the st<strong>and</strong>ard order<strong>in</strong>g options? If so, then twoof the <strong>new</strong> options of PROC <strong>SUMMARY</strong> <strong>and</strong> PROCTABULATE will be of <strong>in</strong>terest to you.Both <strong>SUMMARY</strong> <strong>and</strong> TABULATE now allow multipleCLASS statements <strong>and</strong> CLASS statements now have options.One of the CLASS statement options, preloadfmt, willgenerate class level <strong>in</strong>itiat<strong>in</strong>g data from user-def<strong>in</strong>ed SASformats that have been assigned to the class variables.Because this generated data is seen before the actual <strong>in</strong>putdata, the order=data option will then place the output <strong>in</strong>the order <strong>in</strong> which the format was def<strong>in</strong>ed, assum<strong>in</strong>g theformat was created with the notsorted option . Anextension to preloadfmt is the exclusive option.Specify<strong>in</strong>g exclusive <strong>in</strong> the CLASS statement withpreloadfmt will cause the format to be used like a whereclause, screen<strong>in</strong>g out observations with class variablevalues not covered by the format.As mentioned above, preloadfmt can add class values notfound <strong>in</strong> the <strong>in</strong>put data to one or more class variables. Toget all these values <strong>in</strong>to the output, either the pr<strong>in</strong>tmissTABLE statement option of PROC TABULATE or the <strong>new</strong>completetypes proc option of PROC <strong>SUMMARY</strong> mustbe used. This will cause all possible class variable valuecomb<strong>in</strong>ations to appear <strong>in</strong> the output even if some cellshave a frequency of zero. You may be th<strong>in</strong>k<strong>in</strong>g thatPROC TABULATE has always had the pr<strong>in</strong>tmiss optionbut <strong>in</strong> the past, every class variable value output to thetable had to be present somewhere <strong>in</strong> the <strong>in</strong>put data seteven if not all comb<strong>in</strong>ations were present. Now the outputcan conta<strong>in</strong> class values not found anywhere <strong>in</strong> the <strong>in</strong>putdata set.Now you may be th<strong>in</strong>k<strong>in</strong>g that pr<strong>in</strong>tmiss is okay but itoutputs all possible comb<strong>in</strong>ations of class values, whichcan be too much! Some class value comb<strong>in</strong>ations couldbe nonsense such as when multiple location classes areused together such as city <strong>and</strong> state. Not all city namesmake sense for all states. This is where the classdata=option comes <strong>in</strong> h<strong>and</strong>y. It allows the user to specify a setof class variable value comb<strong>in</strong>ations or levels that must bepresent <strong>in</strong> the output even if the associated frequency iszero.Like preloadfmt, classdata= has an exclusive optionthat when used <strong>in</strong> the proc statement causes the class datato resemble a large where clause. The difference be<strong>in</strong>gthat a where clause can only remove data whereasclassdata= can remove <strong>and</strong> add class levels at the sametime. In SQL terms, classdata= used alone acts like a fulljo<strong>in</strong> between the class data set <strong>and</strong> the <strong>in</strong>put data set. Usedwith exclusive, the class data set acts like the preserverof a one-sided outer jo<strong>in</strong>. Also, like preloadfmt,classdata= changes the mean<strong>in</strong>g of order=data. Withclassdata=, data order<strong>in</strong>g becomes the order that eachclass variable value was encountered <strong>in</strong> the class data set.ExamplesThe first example is a simple year-to dates report orderedby a fiscal year that demonstrates the preloadfmt feature.The example shows how content for <strong>in</strong>dividual classvariables can be controlled by user-def<strong>in</strong>ed SAS formats./*-- Def<strong>in</strong>e a format ordered by fiscal year --*/proc format; value mthfmt ( notsorted )6='Jun' 7='Jul' 8='Aug' 9='Sep'10='Oct' 11='Nov' 12='Dec' 1='Jan'2='Feb' 3='Mar' 4='Apr' 5='May';run;/*-- Make-up some data for the IRS --*/data records;reta<strong>in</strong> seed1 543;drop <strong>in</strong>d;do month = 1 to 7;do day = 1 to 30;call ranuni( seed1, <strong>in</strong>d );dailyTotals = 1000 + (1000 * <strong>in</strong>d);output;end;end;drop seed1;


format month mthfmt.;run;/*-- Send output to HTML --*/ods html body="C:\temp\preloadFmt1.htm";/*-- Now, do the report --*/proc summary pr<strong>in</strong>t sum mean completetypes;class month / preloadfmt order=data;var dailyTotals;run;ods html close;And now the ODS - HTML output…PA Philly 100000NC Cary 50000NC Raleigh 360000PA Pittsburgh 75000PA Philly 450000NC Raleigh 50000PA Philly 800000NC Cary 30000NC Raleigh 50000;/*-- Query def<strong>in</strong>es class levels <strong>and</strong> order --*/data query;<strong>in</strong>put state $1-2 city $4-13;datal<strong>in</strong>es;PA PhillyPA PittsburghNC RaleighNC Durham;run;ods html body="C:\temp\classdata1.htm";proc tabulate data=revenues classdata=queryexclusive;class state city / order=data;var amt;table state*city, amt*(n sum);run;ods html close;And the result<strong>in</strong>g HTML output…The next example shows how classdata= <strong>and</strong>exclusive can be used to zero-<strong>in</strong> on a target set of classlevels that may or may not be conta<strong>in</strong>ed <strong>in</strong> the <strong>in</strong>put data.Notice that the class comb<strong>in</strong>ation NC/Durham does notoccur <strong>in</strong> the <strong>in</strong>put data but does appear <strong>in</strong> the output.data revenues;<strong>in</strong>put state $1-2 city $4-13 amt;datal<strong>in</strong>es;PA Pittsburgh 200000PA Philly 500000NC Raleigh 250000ConclusionsThe examples above illustrate just two of over thirty <strong>new</strong>features of PROC <strong>SUMMARY</strong> <strong>and</strong> PROC TABULATEfor Version 7 aimed at reduc<strong>in</strong>g the need for pre <strong>and</strong> postprocess<strong>in</strong>g data to get what you really want.TrademarksSAS is a registered trademark of SAS Institute Inc. <strong>in</strong> theUSA <strong>and</strong> other countries. ® <strong>in</strong>dicates USA registration.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!