Sudarshan Chawathe, Associate Professor, Computer Science ...
Sudarshan Chawathe, Associate Professor, Computer Science ...
Sudarshan Chawathe, Associate Professor, Computer Science ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Accelerating Scientific Dataflows<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong><br />
<strong>Associate</strong> <strong>Professor</strong> of <strong>Computer</strong><strong>Science</strong><br />
& Cooperating <strong>Associate</strong> <strong>Professor</strong> of Climate Change Institute<br />
University of Maine
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
A Data-Centric View<br />
■ What are the primary and supplementaldatasets?<br />
■ How are different datasets acquired?<br />
■ What are the key transformations, interpretations, and<br />
visualizations?<br />
■ What may be automated? What requires human<br />
interpretation?<br />
■ What are effective and efficient modes of interaction with<br />
data?<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 2
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
Project 301<br />
■ Cyber-Infrastructure for Climate-Change Research.<br />
■ Goal: Accelerate scientific discoveries by enabling more<br />
effective management of large and diverse datasets.<br />
■ Approach: Develop domain-specific adaptations of data<br />
management methods. Implement and evaluatethe methods<br />
on real data.<br />
■ Research topics (<strong>Computer</strong> Sci.):<br />
◆ Data importation: “ETL” for scientific data.<br />
◆ Data integration: instruments, documents, Web services, ...<br />
◆ Interactive data exploration and visualization.<br />
◆ Visual programming.<br />
◆ Data mining.<br />
◆ Provenance of data.<br />
◆ Workflows.<br />
◆ Systems issues: performance, scalability,reliability,...<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 3
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
P301dx Features<br />
■ Integrated view of large, diverse datasets: ice-core data,<br />
volcanic records, data extracted from documents, ...<br />
■ Interactive data exploration based on charts plotting<br />
time-seriesand related data, maps, ...<br />
■ Palette of tools for data processing, plotting, and other<br />
manipulations. Built-in tools for resampling, smoothing, ...<br />
■ Tools that operate on, and produce, objects in the<br />
working-object store, simplifying multi-stepdata manipulation<br />
and plotting.<br />
■ Interactive generation of new tools by composition and other<br />
higher-level operations: tool-generating tools.<br />
■ Chart exportation in high-quality vector and raster formats.<br />
■ A door to the larger cyber-infrastructure effort, P301.<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> AcceleratingScientific Dataflows – p. 4
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
Tambora and SO4<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 5
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
Map: Icereader Data<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 6
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
Web Application Challenges<br />
1. REST: Representational State Transfer.<br />
■ Robust and scalable Web applications.<br />
■ Standards-based, wide availability.<br />
■ Broadly accessible.<br />
2. Modern Web interfaces: JavaScript, HTML5, ...<br />
■ High interactivity.<br />
■ Client-side optimizations.<br />
■ Glamor.<br />
3. How to consolidate 1 and 2?<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 7
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
RFDE: Robust Web Applications<br />
■ REST Framework for Dynamic Environments<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 8
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
RFDE Client Upgrades<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 9
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
Web Mapping Service<br />
Desktop<br />
Applications<br />
Web<br />
Applications<br />
Mobile<br />
Applications<br />
Clients<br />
Load<br />
Balancer<br />
Cached<br />
& Static<br />
Tile<br />
Renderer<br />
TMS Servers<br />
Interpolation<br />
Module<br />
Database Servers<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 10<br />
Tiles<br />
x,y,z<br />
Grids<br />
■ Arbitrary geocoded point and grid data, backgrounds, ...<br />
■ Web interface similar to Google Maps; de-facto standard.<br />
■ REST-based design; easily re-targetable: android, iOS, ...<br />
■ Challenges: 10 13 tiles, 10 4 Terabytes.<br />
■ Fast in-database dynamic tile generation from numeric data.<br />
■ Easy to replicate, map on to cloud services.
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
WMS Descriptive Parameters<br />
data parameters 115<br />
period 32 years<br />
tiles 23×10 12<br />
rendered tile size 10, 000 Terabytes<br />
database size 0.42 Terabytes<br />
avg static response time 0.2 seconds<br />
avg dynamic response time 0.5 seconds<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> AcceleratingScientific Dataflows – p. 11
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
Handheld Data Analysis<br />
■ Test data; do not use!<br />
■ HCDX: handheld<br />
chronological data explorer.<br />
■ Android, iOS, Maemo, Web, ...<br />
■ Very high-level end-user<br />
programming.<br />
■ Interactive analysis of<br />
time-seriesdatasets.<br />
■ In-field data collection and<br />
analysis.<br />
■ Handheld interfaces,<br />
functional programming,<br />
database optimizations, ...<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> AcceleratingScientific Dataflows – p. 12
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
Summary<br />
■ Scientific dataflows: from raw data to insights.<br />
◆ Explication, documentation, optimization,...<br />
◆ Durability,traceability, analyses, visualizations, ...<br />
◆ Platforms: desktop/laptop,Web, mobile, ...<br />
◆ Bottleneck in the research process?<br />
■ Investments in improving dataflow have a multipliereffect on<br />
other research investments.<br />
■ Acknowledgments:<br />
◆ Faculty: Shaleen Jain, Andrei Kurbatov, Paul Mayewski.<br />
◆ Graduate students: Erik Albert, Mark Royer.<br />
◆ Undergraduate students: Will Lamond, Joe Petrakovich.<br />
◆ Project teams: P301, 10green, RFDE/SSI.<br />
◆ Funding: NSF, U.Maine.<br />
■ Data management collaborations? chaw@cs.umaine.edu<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> AcceleratingScientific Dataflows – p. 13
A Data-Centric View<br />
Project 301<br />
P301dx Features<br />
Tambora & SO4<br />
Map: Icereader Data<br />
Web App Challenges<br />
RFDE<br />
RFDE Client Upgrades<br />
Web Mapping Service<br />
WMSParameters<br />
HandheldData<br />
Summary<br />
<strong>Sudarshan</strong> S. <strong>Chawathe</strong> Accelerating Scientific Dataflows – p. 14