10.03.2016 Views

What's New and Cooking in the KNIME Big Data Labs

04_-_tk_bl_whats_new_whats_cooking_bigdata_final

04_-_tk_bl_whats_new_whats_cooking_bigdata_final

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>What's</strong> <strong>New</strong> <strong>and</strong> <strong>Cook<strong>in</strong>g</strong> <strong>in</strong> <strong>the</strong><br />

<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> <strong>Labs</strong><br />

Tobias Kötter <strong>and</strong> Björn Lohrmann<br />

<strong>KNIME</strong><br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved.


Recap<br />

<strong>Data</strong>base Integration <strong>and</strong><br />

<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 2


<strong>Data</strong>base Integration - Recap<br />

• Visually assemble complex SQL statements<br />

• Connect to almost all JDBC-compliant databases<br />

• Preconfigured nodes to connect to various databases<br />

• Harness <strong>the</strong> power of your database with<strong>in</strong> <strong>KNIME</strong><br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 3


<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors - Recap<br />

• Package required drivers/libraries for HDFS, Hive,<br />

Impala access<br />

• Performs operations on Hadoop<br />

• Extends <strong>the</strong> open source database <strong>in</strong>tegration<br />

• Preconfigured connectors<br />

– Hive<br />

– Cloudera Impala<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 4


What’s <strong>New</strong><br />

<strong>Data</strong>base Integration <strong>and</strong><br />

<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 5


<strong>New</strong> <strong>Data</strong>base Nodes<br />

• Python Script (DB)/(Hive)<br />

• <strong>Data</strong>base Nummeric-/Auto-B<strong>in</strong>ner <strong>and</strong> Apply-B<strong>in</strong>ner<br />

• <strong>Data</strong>base Sampl<strong>in</strong>g with support for stratified sampl<strong>in</strong>g<br />

• <strong>Data</strong>base Pivot<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 6


What’s <strong>New</strong><br />

<strong>KNIME</strong> Spark Executor<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 7


<strong>KNIME</strong> Spark Executor<br />

• Based on Spark MLlib<br />

• Scalable mach<strong>in</strong>e learn<strong>in</strong>g library<br />

• Runs on Hadoop<br />

• Algorithms for<br />

– Classification (decision tree, naïve bayes, …)<br />

– Regression (logistic regression, l<strong>in</strong>ear regression, …)<br />

– Cluster<strong>in</strong>g (k-means)<br />

– Collaborative filter<strong>in</strong>g (ALS)<br />

– Dimensionality reduction (SVD, PCA)<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 8


Familiar Usage Model<br />

• Usage model <strong>and</strong> dialogs similar to exist<strong>in</strong>g nodes<br />

• Spark nodes start <strong>and</strong> manage Spark jobs<br />

• No cod<strong>in</strong>g required<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 9


In-Hadoop Process<strong>in</strong>g<br />

• Spark RDDs as <strong>in</strong>put/output format<br />

• <strong>Data</strong> stays with<strong>in</strong> your cluster<br />

• No unnecessary data movements<br />

• Several <strong>in</strong>put/output nodes e.g. Hive, hdfs files, …<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 10


Comb<strong>in</strong>e with Exist<strong>in</strong>g <strong>KNIME</strong> Nodes<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 11


Let <strong>KNIME</strong> Control Your Spark Jobs<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 12


47 Spark Nodes <strong>and</strong> Count<strong>in</strong>g<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 13


DEMO<br />

• More than 170 Mio rows with energy usage data from smart<br />

meters<br />

• Uses <strong>KNIME</strong> Analytics Platform, <strong>Big</strong> <strong>Data</strong> Connectors <strong>and</strong><br />

Spark Executor to forecast energy consumption<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 14


<strong>What's</strong> <strong>Cook<strong>in</strong>g</strong><br />

<strong>Data</strong>base Integration <strong>and</strong><br />

<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 15


<strong>Data</strong>base Integration<br />

• Improved …<br />

– connection h<strong>and</strong>l<strong>in</strong>g<br />

– schema support<br />

– database driver h<strong>and</strong>l<strong>in</strong>g<br />

– type h<strong>and</strong>l<strong>in</strong>g with support for arrays<br />

• <strong>New</strong> node:<br />

– Table Creator with support for unique/primary keys<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 16


<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors<br />

• Improved secured cluster support<br />

– Full scale Kerberos support<br />

– Apache KNOX <strong>in</strong>tegration<br />

• <strong>New</strong> nodes:<br />

– Amazon S3<br />

– Amazon Redshift<br />

– Phoenix<br />

– HBase<br />

– Drill<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 17


<strong>What's</strong> <strong>Cook<strong>in</strong>g</strong><br />

<strong>KNIME</strong> Spark Executor<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 18


<strong>KNIME</strong> Spark Executor Development<br />

• Extended Spark version support<br />

• Public API to write your own Spark nodes<br />

• Enhanced support for multiple Spark contexts<br />

• Support for secured clusters<br />

• <strong>New</strong> nodes:<br />

– GroupBy<br />

– Remote File Reader/Writer<br />

– <strong>Data</strong>base to Spark/Spark to <strong>Data</strong>base<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 19


Virtual <strong>Data</strong> Warehouse<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 20


The <strong>KNIME</strong>® trademark <strong>and</strong> logo <strong>and</strong> OPEN FOR INNOVATION® trademark are used by<br />

<strong>KNIME</strong>.com AG under license from <strong>KNIME</strong> GmbH, <strong>and</strong> are registered <strong>in</strong> <strong>the</strong> United States.<br />

<strong>KNIME</strong>® is also registered <strong>in</strong> Germany.<br />

© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!