13.07.2015 Views

Drill Down Till You Drop: OLAP Without Windows - NESUG

Drill Down Till You Drop: OLAP Without Windows - NESUG

Drill Down Till You Drop: OLAP Without Windows - NESUG

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

and contained only three cities, Baltimore, Newark, andAnnapolis, there would be two observations correspondingto _TYPE_=2, three observations corresponding to_TYPE_=4, and three observations corresponding to_TYPE=6.Selection of the Dimensions and <strong>Drill</strong>HierarchySelection of the proper dimensions for the drill downinvolves careful study of analysts needs and anunderstanding of the relationship among the entitieswhich the data represents. In many cases, the drillhierarchy may follow an organizations standard way ofdoing business, but in other cases, the data can be minedfirst using techniques such as CHAID and CART in orderto find significant dimensions for analysis. The entireprocess of determining the needs of the analyst can not beoveremphasized, since the drill down model shouldprovide a meaningful application for the analyst. If not,the model can end up being too simplistic, complex, orsimply wrong from a business point of view.Our sample data will use a Time/Geography <strong>Drill</strong>hierarchy. Examining the relationship among the levelsthere is a natural relationship as follows:drill downDATE→ COUNTRY→ REGION→STATE/PROVINCE→ CITYdrill upCITY→ STATE/PROVINCE→ REGION→ COUNTRY→ DATEIn other more complex cases there may not be a naturalorder, or the relationship may be a one-to-manyrelationship. In this case more analysis would be needed.Specification of the PROC SUMMARYNext, we will set up a PROC SUMMARY specifying thehierarchy we have established. However, we must takecare when setting up the CLASS statement, since theorder in which the variables appear is important. In orderto take advantage of numerical properties of the _TYPE_variable, we will specify the drill down order in theCLASS statement from right to left. Therefore thestatement for this example becomes:PROC SUMMARY data=drill noprint;CLASS city state region country date ;VAR sales;output out=temp sum=;Note that we also need to select an analysis variable aswell as a summary statistic. Since drill down applicationsneed to access summary statistics at each level of the drillhierarchy, we need to specify SUM= on the outputstatement. We will also specify SALES as the variablewhich is to be summed.Creating the Optimized <strong>Drill</strong> <strong>Down</strong>DatasetAs mentioned earlier, the output of PROC SUMMARYwill generated observations for each combination ofCLASS variables listed on the CLASS statement. Eventhough PROC SUMMARY drastically reduces the numberof observations in the output dataset, as compared to theoriginal dataset, in a drill down hierarchy we are onlyinterested in a subset of all possible subgroups. We willexamine the output dataset “temp” to see whichobservations correspond to the drill hierarchy at eachlevel. Those observations will be kept, while others willbe discarded.• We need the summary for DATE by itself(_TYPE_=1) since this is the starting level for ourdrill down.• The next level of drill down is to the COUNTRYlevel. In this case, the required summary would bethe one corresponding to _TYPE_=3 (DATE byCOUNTRY). We will include DATE as part of thissubgroup since we will already have drilled downfrom the starting DATE level, and need the DATEvariable as a way to provide a link to the previouslevel. If we examine all of the other subgroups whichcontain COUNTRY we will find a summary forCOUNTRY by itself, as well as subgroups containingcombinations of variables such as COUNTRY byCITY, COUNTRY by STATE by CITY, etc. Some ofthese summaries will be discarded, since we are onlyinterested in that summary relevant to positionswithin the drill down hierarchy.• If we continue to work through the drill hierarchy tothe next level, we see that the REGION level requires_TYPE_=7 (DATE by COUNTRY by REGION),since by drilling down to the REGION level, we willhave already drilled down on DATE and thenCOUNTRY. By the same rational given for theprevious level, we can discard the other _TYPE_variables containing REGION not relevant to our drilldown. (For example: REGION by CITY will be3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!