06.09.2015 Views

In Situ Data Provenance Capture in Spreadsheets

In Situ - UW Bothell

In Situ - UW Bothell

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

comparison to the implemented tool. F<strong>in</strong>ally, we conclude with<br />

potential directions for future work.<br />

II. TECHNIQUE<br />

<strong>In</strong> this section, we present a framework for support<strong>in</strong>g the<br />

<strong>in</strong> situ capture of data provenance. First, we discuss the<br />

rationale for build<strong>in</strong>g our framework on top of MS Excel and<br />

then we discuss the details of record<strong>in</strong>g user <strong>in</strong>teractions.<br />

Additional capabilities of our technique such as undo/redo<br />

functionality, dynamic version creation, and change<br />

dependency track<strong>in</strong>g are also discussed.<br />

A. Build<strong>in</strong>g <strong>Provenance</strong> Support With<strong>in</strong> Scientific Tools<br />

The ma<strong>in</strong> rationale for build<strong>in</strong>g provenance support on top<br />

of Excel is that Excel is already a key data analysis tool used<br />

by many scientists and researchers. <strong>In</strong> addition, s<strong>in</strong>ce<br />

provenance capture resides with<strong>in</strong> a familiar environment, the<br />

captured provenance log is at the same level of discourse as the<br />

tool, which enhances understandability. For <strong>in</strong>stance, Excel<br />

users can understand that the label “$A$1” corresponds to a<br />

specific cell location with<strong>in</strong> the spreadsheet.<br />

<strong>In</strong> general, there are several ways provenance support can<br />

be built with<strong>in</strong> the context of an exist<strong>in</strong>g tool. The first is to<br />

build a provenance user <strong>in</strong>terface with<strong>in</strong> the work environment<br />

us<strong>in</strong>g an adapter or software plug-<strong>in</strong>. Similar to the PReP<br />

approach [12], one can also build a wrapper around the tools<br />

used by the scientist so that user <strong>in</strong>teraction events can be<br />

<strong>in</strong>tercepted. Another approach is to leverage any customization<br />

or extensibility support provided by the tool. If the tool<br />

supports logg<strong>in</strong>g of user activities, this log can be fed <strong>in</strong>to a<br />

provenance component to be analyzed and rendered visually.<br />

Our approach uses Excel’s public application programm<strong>in</strong>g<br />

<strong>in</strong>terface (API) to support the follow<strong>in</strong>g key functions:<br />

start/stop record<strong>in</strong>g, enter user task, show log, and undo/redo<br />

data process<strong>in</strong>g, and analyze change dependencies. For the<br />

analysis of provenance, we also take advantage of built-<strong>in</strong><br />

Excel functionality that is familiar to casual Excel users, such<br />

as group<strong>in</strong>g, filter<strong>in</strong>g, and sort<strong>in</strong>g data.<br />

B. Captur<strong>in</strong>g Mean<strong>in</strong>gful User <strong>In</strong>teractions <strong>in</strong> the<br />

Background<br />

An analysis of exist<strong>in</strong>g provenance systems [5, 6] suggests<br />

that the level of event capture determ<strong>in</strong>es how well the<br />

provenance can be understood. The level of automatic<br />

semantic capture <strong>in</strong>creases with the <strong>in</strong>creased awareness of<br />

semantics <strong>in</strong> the framework of the provenance tool. For<br />

example, s<strong>in</strong>ce the CAVES project is built on top of an exist<strong>in</strong>g<br />

data analysis tool [5], it is easy to <strong>in</strong>fer the semantics of its<br />

generated provenance log. Meanwhile, the orig<strong>in</strong>al PASS<br />

project was built to only capture operat<strong>in</strong>g system level events,<br />

and thus the events were difficult to map to high-level tasks [6].<br />

Figure 1. The top Excel screenshot shows the changes made to the dataset. The bottom screenshot shows the same dataset reverted to the previous version<br />

via the Undo button (which is accessible from our tool’s ribbon menu shown above). Note that highlighted cells mark the entries that have been reverted.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!