11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

324 The present’s futuredata. In Section 29.2, we describe briefly what symbolic data are and howthey might arise. Then, in Section 29.3, we illustrate some symbolic methodologicalanalyses and compare the results with those obtained when usingclassical surrogates. Some concluding remarks about the future of such dataare presented in Section 29.4.29.2 Symbolic dataSymbolic data consist of lists, intervals, histograms and the like, and arisein two broadly defined ways. One avenue is when data sets of classical pointobservations are aggregated into smaller data sets. For example, consider alarge medical data set of millions of individual observations with informationsuch as demographic (e.g., age, gender, etc.), geographical (e.g., townof residence, country, region, ...), basic medical diagnostics (pulse rate, bloodpressure, weight, height, previous maladies and when, etc.), current ailments(e.g., cancer type such as liver, bone, etc.; heart condition, etc.), and so on.It is unlikely the medical insurer (or medical researcher, or...) is interested inthe details of your specific visit to a care provider on a particular occasion;indeed, the insurer may not even be interested in your aggregated visits over agiven period of time. Rather, interest may focus on all individuals (and theiraccumulated history) who have a particular condition (such as heart valve failure),or, maybe interest centers on the collection of individuals of a particulargender-age group with that condition. Thus, values are aggregated across allindividuals in the specific categories of interest. It is extremely unlikely thatall such individuals will have the same pulse rate, the same weight, and soforth. Instead, the aggregated values can take values across an interval, as ahistogram, as a list of possible values, etc. That is, the data set now consistsof so-called symbolic data.Automobile insurers may be interested in accident rates of categories suchas 26-year-old male drivers of red convertibles, and so on. Census data arefrequently in the form of symbolic data; e.g., housing characteristics for regionsmay be described as {owner occupied, .60; renter occupied, .35; vacant, .05}where 60% of the homes are owner occupied, etc.There are countless examples. The prevailing thread is that large data setsof single classical observations are aggregated in some way with the resultthat symbolic data perforce emerge. There are a myriad of ways these originaldata sets can be aggregated, with the actual form being driven by the scientificquestion/s of interest.On the other hand, some data are naturally symbolic in nature. For example,species are typically described by symbolic values; e.g., the mushroomspecies bernardi has a pileus cap width of [6, 7] cm. However, the particularmushroom in your hand may have a cap width of 6.2 cm, say. Pulse rates

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!