24.04.2013 Views

Sudarshan Chawathe, Associate Professor, Computer Science ...

Sudarshan Chawathe, Associate Professor, Computer Science ...

Sudarshan Chawathe, Associate Professor, Computer Science ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Accelerating Scientific Dataflows<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong><br />

<strong>Associate</strong> <strong>Professor</strong> of <strong>Computer</strong><strong>Science</strong><br />

& Cooperating <strong>Associate</strong> <strong>Professor</strong> of Climate Change Institute<br />

University of Maine


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

A Data-Centric View<br />

■ What are the primary and supplementaldatasets?<br />

■ How are different datasets acquired?<br />

■ What are the key transformations, interpretations, and<br />

visualizations?<br />

■ What may be automated? What requires human<br />

interpretation?<br />

■ What are effective and efficient modes of interaction with<br />

data?<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 2


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

Project 301<br />

■ Cyber-Infrastructure for Climate-Change Research.<br />

■ Goal: Accelerate scientific discoveries by enabling more<br />

effective management of large and diverse datasets.<br />

■ Approach: Develop domain-specific adaptations of data<br />

management methods. Implement and evaluatethe methods<br />

on real data.<br />

■ Research topics (<strong>Computer</strong> Sci.):<br />

◆ Data importation: “ETL” for scientific data.<br />

◆ Data integration: instruments, documents, Web services, ...<br />

◆ Interactive data exploration and visualization.<br />

◆ Visual programming.<br />

◆ Data mining.<br />

◆ Provenance of data.<br />

◆ Workflows.<br />

◆ Systems issues: performance, scalability,reliability,...<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 3


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

P301dx Features<br />

■ Integrated view of large, diverse datasets: ice-core data,<br />

volcanic records, data extracted from documents, ...<br />

■ Interactive data exploration based on charts plotting<br />

time-seriesand related data, maps, ...<br />

■ Palette of tools for data processing, plotting, and other<br />

manipulations. Built-in tools for resampling, smoothing, ...<br />

■ Tools that operate on, and produce, objects in the<br />

working-object store, simplifying multi-stepdata manipulation<br />

and plotting.<br />

■ Interactive generation of new tools by composition and other<br />

higher-level operations: tool-generating tools.<br />

■ Chart exportation in high-quality vector and raster formats.<br />

■ A door to the larger cyber-infrastructure effort, P301.<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> AcceleratingScientific Dataflows – p. 4


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

Tambora and SO4<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 5


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

Map: Icereader Data<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 6


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

Web Application Challenges<br />

1. REST: Representational State Transfer.<br />

■ Robust and scalable Web applications.<br />

■ Standards-based, wide availability.<br />

■ Broadly accessible.<br />

2. Modern Web interfaces: JavaScript, HTML5, ...<br />

■ High interactivity.<br />

■ Client-side optimizations.<br />

■ Glamor.<br />

3. How to consolidate 1 and 2?<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 7


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

RFDE: Robust Web Applications<br />

■ REST Framework for Dynamic Environments<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 8


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

RFDE Client Upgrades<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 9


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

Web Mapping Service<br />

Desktop<br />

Applications<br />

Web<br />

Applications<br />

Mobile<br />

Applications<br />

Clients<br />

Load<br />

Balancer<br />

Cached<br />

& Static<br />

Tile<br />

Renderer<br />

TMS Servers<br />

Interpolation<br />

Module<br />

Database Servers<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 10<br />

Tiles<br />

x,y,z<br />

Grids<br />

■ Arbitrary geocoded point and grid data, backgrounds, ...<br />

■ Web interface similar to Google Maps; de-facto standard.<br />

■ REST-based design; easily re-targetable: android, iOS, ...<br />

■ Challenges: 10 13 tiles, 10 4 Terabytes.<br />

■ Fast in-database dynamic tile generation from numeric data.<br />

■ Easy to replicate, map on to cloud services.


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

WMS Descriptive Parameters<br />

data parameters 115<br />

period 32 years<br />

tiles 23×10 12<br />

rendered tile size 10, 000 Terabytes<br />

database size 0.42 Terabytes<br />

avg static response time 0.2 seconds<br />

avg dynamic response time 0.5 seconds<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> AcceleratingScientific Dataflows – p. 11


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

Handheld Data Analysis<br />

■ Test data; do not use!<br />

■ HCDX: handheld<br />

chronological data explorer.<br />

■ Android, iOS, Maemo, Web, ...<br />

■ Very high-level end-user<br />

programming.<br />

■ Interactive analysis of<br />

time-seriesdatasets.<br />

■ In-field data collection and<br />

analysis.<br />

■ Handheld interfaces,<br />

functional programming,<br />

database optimizations, ...<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> AcceleratingScientific Dataflows – p. 12


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

Summary<br />

■ Scientific dataflows: from raw data to insights.<br />

◆ Explication, documentation, optimization,...<br />

◆ Durability,traceability, analyses, visualizations, ...<br />

◆ Platforms: desktop/laptop,Web, mobile, ...<br />

◆ Bottleneck in the research process?<br />

■ Investments in improving dataflow have a multipliereffect on<br />

other research investments.<br />

■ Acknowledgments:<br />

◆ Faculty: Shaleen Jain, Andrei Kurbatov, Paul Mayewski.<br />

◆ Graduate students: Erik Albert, Mark Royer.<br />

◆ Undergraduate students: Will Lamond, Joe Petrakovich.<br />

◆ Project teams: P301, 10green, RFDE/SSI.<br />

◆ Funding: NSF, U.Maine.<br />

■ Data management collaborations? chaw@cs.umaine.edu<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> AcceleratingScientific Dataflows – p. 13


A Data-Centric View<br />

Project 301<br />

P301dx Features<br />

Tambora & SO4<br />

Map: Icereader Data<br />

Web App Challenges<br />

RFDE<br />

RFDE Client Upgrades<br />

Web Mapping Service<br />

WMSParameters<br />

HandheldData<br />

Summary<br />

<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!