Extracting Value from the Data Mountain - OpenJaw Technologies

openjawtech.com

Extracting Value from the Data Mountain - OpenJaw Technologies

Extracting Value from the

Data Mountain

Mark Roantree

OpenJaw t-Retailing Summit and User Conference

25th May 2012


Outline

Overview

What is Value?

The Vendors

The Issues

The Future

The Message


Data, Data, Everywhere ...

Participation in NSF’s EarthCube

NSF is now actively organising and designing for EarthCube, which

will be a cyberinfrastructure for the Earth sciences (solid earth

science/geology, oceanography, atmospheric science). There will

be many data management issues in EarthCube .....


What is fueling this data growth?

International Data Corporation

According to research conducted by the IDC, the size of the digital

universe in 2010 surpassed one Zettabyte (ZB) for the first time in

history and it now stands at about 1.8ZB.

◮ The ubiquitous availability of smartphones, tablets, laptops

and other easily portable devices

◮ The adoption of social networking sites, make it possible to

continuously contribute to

◮ this massively distributed information publishing process by

enterprise, service providers, individuals . . .


What is fueling this data growth?

International Data Corporation

According to research conducted by the IDC, the size of the digital

universe in 2010 surpassed one Zettabyte (ZB) for the first time in

history and it now stands at about 1.8ZB.

◮ Reports on real world events, Japan’s Earthquake and

Tsunami, the Arab Spring uprisings, Riots in the UK,

◮ are propagated far quicker within the network of social sensors

(eg. Twitter)

◮ than by traditional means (eg. seismic sensor reading analysis,

police emergency reports, news media coverage).


The Data Mountain

We are all creating our own mountain of data . . .

◮ Does it have any value?

◮ Can the value be extracted?

◮ Can we exploit value to differentiate, be competitive?


Focus

We will discuss why this happens

◮ Application Areas

◮ current approaches

the issues preventing easy solutions

◮ look at what the future might hold


Outline

Overview

What is Value?

The Vendors

The Issues

The Future

The Message


The Sensor Web

Extracting Value

Bridging the Physical - Digital Divide

Extending the Web with Sensors

◮ The Physical World: no connection

to the web?

◮ Between 10 and 15 million sensor

readings for every ”event”

◮ Close to real time: DropBox it!

◮ Analysis & Extraction starts

immediately


The Smart City

Extracting Value

Aggregating partial observations, mined from large amounts of

disparate data sources, traffic sensors, CCTV streams, event

information, tweets from citizens

Value for city planners, emergency

support team and city citizens

◮ Develop innovative visualisation

and analysis tools for linking

events that occur in a city

◮ Understanding events, their evolution over time and

cause-effect on other resources and activities in the city.

◮ Analysis using the Web eg. Citybikes


The Travel Industry

◮ Problems are repeated

◮ Knowns: flights, seats, cars, hotels,

. . . every transaction contains value

◮ Unknowns: Are there other applications

you need to think about?


The Travel Industry

◮ Problems are repeated

◮ Knowns: flights, seats, cars, hotels,

. . . every transaction contains value

◮ Unknowns: Are there other applications

you need to think about?

◮ We never delete data . . .


Outline

Overview

What is Value?

The Vendors

The Issues

The Future

The Message


Current Situation

We seem to be living in a world of buzzwords

Dating mining has taken a back seat to the new kids on the block

- namely Big Data and Data Analytics -

both of which incorporate data mining.

◮ All of the big companies - Microsoft, Oracle, IBM, Amazon -

are in the rush to provide platforms to facilitate (host) Big

Data

◮ You will be hard-pressed to find publicised information online

detailing real-world XML data mining use-cases.

◮ Ironically the need is there, particularly in the Healthcare,

Financial, and legal domains.


Predictions

Predicting the Future

Analytics will become pervasive

◮ By 2014, 30% of analytic applications will use proactive,

predictive and forecasting capabilities: Gartner report (2011)

◮ Analytics is moving from departments to the Enterprise level


Amazon

Analytics-as-a-Service

Amazon see the value in migrating from outmoded BI to the

Analytics-as-a-service model.

◮ Current approaches are unable to meet the demands of

increasingly complex KPIs, metrics and dashboards.

◮ Analytics: departmental solutions fail at the enterprise level


Amazon

Analytics-as-a-Service

Amazon see the value in migrating from outmoded BI to the

Analytics-as-a-service model.

◮ Current approaches are unable to meet the demands of

increasingly complex KPIs, metrics and dashboards.

◮ Analytics: departmental solutions fail at the enterprise level

◮ Benefits: why use Elastic Map Reduce (EMR)?

◮ Hadoop is challenging to configure and manage

◮ No upfront investment in hardware or staff, no hardware

procurement delay


Technology

◮ Amazon EMR: enables businesses, researchers, data analysts,

and developers to cost-effectively process vast amounts of

data

◮ Uses a Hadoop framework running on Amazon Elastic

Compute Cloud (Amazon EC2) and using Amazon Simple

Storage Service (Amazon S3)

◮ Apache Hive is built on top of Hadoop for data

summarisation, query, and analysis

◮ HiveQL is the SQL-like query language


Technology Frameworks


Enterprise Frameworks


XML

Amazon for us?

3GB XML processing in 50s on a 16-node Amazon EMR cluster

www.globenewswire.com/newsroom/news.html?d=256413

◮ Success stories using 3rd party tools

◮ XMap.

Enterprise-class XML mapping: uses standard XPath language

◮ HParser.

Big data processing of ACORD XML: takes advantage of

Hadoop performance for formats such as XML and EDI.


Outline

Overview

What is Value?

The Vendors

The Issues

The Future

The Message


Genericity

Are solutions suitably generic for my organisation?

◮ No? Is this because they are ignoring the strength of XML?

◮ Not hard-wired but . . . database style, fixed schema approach

may not extensible

◮ Perhaps get us 70-90% there and we do the rest?


Development Technologies

The technologies may present a challenge

Executing your goal requires developing state-of-the-art capabilities

around three facets: algorithms, platform building blocks, and

infrastructure

Extracting Value requires new technologies: Hadoop, HBase,

HiveQL

◮ . . . then the infrastructure

◮ . . . then the applications


Development Strategy

Migration may be costly

Given the many inhouse BI projects based on vendor stacks, it will

be difficult in most large corporations to rip-and-replace

Options

◮ 1. Buy (eg. Amazon EMR + 3rd party)

◮ 2. Buy+Build (No 3rd Party)

◮ 3. Build your own cluster and utilities

◮ 4. One toe in the water: Outsource?


Developing Skills

Vendors Can Disseminate Knowledge

IBM build products but also provide a lot of information

◮ Survey several approaches to XML data mining

◮ Mining XML association rules

◮ Clustering XML documents for improved data mining


Developing Skills

Vendors Can Disseminate Knowledge

IBM build products but also provide a lot of information

◮ Survey several approaches to XML data mining

◮ Mining XML association rules

◮ Clustering XML documents for improved data mining

◮ Taming big data

◮ Solve cloud-related big data problems with MapReduce

◮ Hadoop: the Big Answer to Big Data! (IBM Big Data)


Outline

Overview

What is Value?

The Vendors

The Issues

The Future

The Message


What Does the Future hold?

Is XML an Option?

Current Approaches use Relational Technology. Why not XML . . . ?

◮ Knowledge is generated in XML

◮ XML offers greater ease of interoperability

◮ XML contains semantics


What Does the Future hold?

Is XML an Option?

Current Approaches use Relational Technology. Why not XML . . . ?

◮ Knowledge is generated in XML

◮ XML offers greater ease of interoperability

◮ XML contains semantics

◮ BUT . . . XML is slow; tools not readily available; less

functionality


What Price Value?

Obstacles or Opportunities

◮ XML Optimisation

◮ XML Multidimensional Modelling

◮ XML Data Mining Primitives


What Price Value?

Obstacles or Opportunities

◮ XML Optimisation

◮ XML Multidimensional Modelling

◮ XML Data Mining Primitives

◮ BUT . . . some solutions already

available


Help!

Collaboration

Increasingly we are seeing industry-academic research partnerships

◮ International Workshop on Multimodal Crowd Sensing

◮ Very Large Data Search: International Workshop on Searching

and Integrating New Web Data Sources

◮ Participation in NSF’s EarthCube

◮ International Conference on Big Data Analytics

◮ International Workshop on Big Data, Streams and

Heterogeneous Source Mining

◮ BIGMINE-12 CFP International Workshop on Big Data,

Streams and Heterogeneous Source Mining


Building Your Own Solution

Going Alone

Is XML the right way to go?

◮ Researchers want data, want

the data mountain, want the

real world problems

◮ Outsource: for companies

opportunity to try something

different


Outline

Overview

What is Value?

The Vendors

The Issues

The Future

The Message


Reasons for Optimism

Good Business Intelligence is not the preserve of elite

organisations

◮ Within the reach of organisations possessing the minimum

requirements of: the vision and the determination to see it

through

◮ Key analyst prediction:

”We expect this enterprise analytics transformation trend will

take a decade to play out (innovation to maturity cycle)”


Deliver on the Optimism

Getting to the Core of Your Data Mountain

◮ So you have time

◮ Think longer term

◮ Manage the problem in

stages

Data Management - Query

Processing - Analytics


Audience Questions?

Data Mining the Web

◮ International Data Corporation (idc.com)

◮ Does Business Intelligence Require Intelligent Business?

(peterjamesthomas.com)

◮ Big Data Will Help Shape Your Markets Next Big Winners

(blogs.forrester.com)

◮ NSF’s EarthCube (earthcube.ning.com)

◮ Smart City

(www.solarfeeds.com/top-10-smart-and-sustainable-cities)

◮ Amazon Analytics

(practicalanalytics.wordpress.com/2011/08/13)

◮ Amazon MapReduce (aws.amazon.com/elasticmapreduce)

◮ IBM Mining XML (www.ibm.com/developerworks/library)

More magazines by this user
Similar magazines