05.06.2013 Views

PNNL-13501 - Pacific Northwest National Laboratory

PNNL-13501 - Pacific Northwest National Laboratory

PNNL-13501 - Pacific Northwest National Laboratory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

We demonstrate our design using a climate modeling data<br />

set provided by the <strong>Laboratory</strong>’s Global Change Program.<br />

The basic multivariate time-dependent data set has all the<br />

characteristics and features for experiments. The data set<br />

has five data variables (pressure, temperature, water vapor<br />

mixing ratio, and two wind-velocity components) of<br />

different types (scalars and vectors) and dimensions (two-<br />

and three-dimensional) recorded daily.<br />

Figure 1 shows a cluster analysis of 92 scalar fields (water<br />

vapor flux) encoded into grayscale images with the<br />

highest moisture value mapped to the white end. Each of<br />

the individual three-dimensional scalar fields, shown in<br />

Figure 1 as two-dimensional images for simplicity, has<br />

over 120,000 real numbers. Each of the data signatures<br />

that represent individual scalar fields in the cluster<br />

analysis has only 200 real numbers. The analysis process<br />

clearly separates individual images into clusters according<br />

to their patterns and characteristics.<br />

Figure 1. A cluster analysis of 92 scalar fields of a time<br />

varying climate simulation<br />

Our data signature approach is powerful and flexible<br />

enough to merge multiple data dimensions of different<br />

data types into a signature vector for computation. In this<br />

example, we merge a scalar field (water vapor flux) and a<br />

vector field (wind velocity) for each of the 92 simulation<br />

steps into a single signature for cluster analysis. Figure 2<br />

shows 92 icons (one for each signature) that are colorcoded<br />

according to the sequential order of the simulation.<br />

The simulation starts from the left (red and orange),<br />

across the middle (yellow, green, and cyan), and stops at<br />

the right (blue and purple). The three-dimensional<br />

scatterplot based on the combined signatures correctly<br />

separates the seven periods that are characterized by three<br />

heavy rainfall episodes (orange, green, and blue) in the<br />

simulation.<br />

170 FY 2000 <strong>Laboratory</strong> Directed Research and Development Annual Report<br />

Figure 2. A cluster analysis of 92 scalar and 92 vector fields<br />

of a time varying climate simulation<br />

We also explored the idea of using data signature in<br />

scientific data mining and developed a feature decimation<br />

technique to simplify a vector flow field in visualization.<br />

Figure 3 shows a three-dimensional atmospheric space<br />

crowded with flow field information such as critical<br />

points, flow field lines, and boundary regions. Figure 4<br />

shows a filtered version of Figure 3 with threshold<br />

relative vorticity magnitudes of 6. In both Figures 3 and<br />

4, critical points are grouped together into saddle (cross)<br />

and non-saddle (cube) classes to simplify the<br />

visualization.<br />

Figure 3. An atmospheric data volume with all critical<br />

points and field flow lines<br />

Summary and Conclusions<br />

Data signatures allow scientists with limited resources to<br />

conduct analysis on very large data sets at a higher level<br />

of abstraction. Its novel design is flexible enough to<br />

merge different data types into a quantitative data vector

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!