What's New and Cooking in the KNIME Big Data Labs
04_-_tk_bl_whats_new_whats_cooking_bigdata_final
04_-_tk_bl_whats_new_whats_cooking_bigdata_final
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>What's</strong> <strong>New</strong> <strong>and</strong> <strong>Cook<strong>in</strong>g</strong> <strong>in</strong> <strong>the</strong><br />
<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> <strong>Labs</strong><br />
Tobias Kötter <strong>and</strong> Björn Lohrmann<br />
<strong>KNIME</strong><br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved.
Recap<br />
<strong>Data</strong>base Integration <strong>and</strong><br />
<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 2
<strong>Data</strong>base Integration - Recap<br />
• Visually assemble complex SQL statements<br />
• Connect to almost all JDBC-compliant databases<br />
• Preconfigured nodes to connect to various databases<br />
• Harness <strong>the</strong> power of your database with<strong>in</strong> <strong>KNIME</strong><br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 3
<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors - Recap<br />
• Package required drivers/libraries for HDFS, Hive,<br />
Impala access<br />
• Performs operations on Hadoop<br />
• Extends <strong>the</strong> open source database <strong>in</strong>tegration<br />
• Preconfigured connectors<br />
– Hive<br />
– Cloudera Impala<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 4
What’s <strong>New</strong><br />
<strong>Data</strong>base Integration <strong>and</strong><br />
<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 5
<strong>New</strong> <strong>Data</strong>base Nodes<br />
• Python Script (DB)/(Hive)<br />
• <strong>Data</strong>base Nummeric-/Auto-B<strong>in</strong>ner <strong>and</strong> Apply-B<strong>in</strong>ner<br />
• <strong>Data</strong>base Sampl<strong>in</strong>g with support for stratified sampl<strong>in</strong>g<br />
• <strong>Data</strong>base Pivot<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 6
What’s <strong>New</strong><br />
<strong>KNIME</strong> Spark Executor<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 7
<strong>KNIME</strong> Spark Executor<br />
• Based on Spark MLlib<br />
• Scalable mach<strong>in</strong>e learn<strong>in</strong>g library<br />
• Runs on Hadoop<br />
• Algorithms for<br />
– Classification (decision tree, naïve bayes, …)<br />
– Regression (logistic regression, l<strong>in</strong>ear regression, …)<br />
– Cluster<strong>in</strong>g (k-means)<br />
– Collaborative filter<strong>in</strong>g (ALS)<br />
– Dimensionality reduction (SVD, PCA)<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 8
Familiar Usage Model<br />
• Usage model <strong>and</strong> dialogs similar to exist<strong>in</strong>g nodes<br />
• Spark nodes start <strong>and</strong> manage Spark jobs<br />
• No cod<strong>in</strong>g required<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 9
In-Hadoop Process<strong>in</strong>g<br />
• Spark RDDs as <strong>in</strong>put/output format<br />
• <strong>Data</strong> stays with<strong>in</strong> your cluster<br />
• No unnecessary data movements<br />
• Several <strong>in</strong>put/output nodes e.g. Hive, hdfs files, …<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 10
Comb<strong>in</strong>e with Exist<strong>in</strong>g <strong>KNIME</strong> Nodes<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 11
Let <strong>KNIME</strong> Control Your Spark Jobs<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 12
47 Spark Nodes <strong>and</strong> Count<strong>in</strong>g<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 13
DEMO<br />
• More than 170 Mio rows with energy usage data from smart<br />
meters<br />
• Uses <strong>KNIME</strong> Analytics Platform, <strong>Big</strong> <strong>Data</strong> Connectors <strong>and</strong><br />
Spark Executor to forecast energy consumption<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 14
<strong>What's</strong> <strong>Cook<strong>in</strong>g</strong><br />
<strong>Data</strong>base Integration <strong>and</strong><br />
<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 15
<strong>Data</strong>base Integration<br />
• Improved …<br />
– connection h<strong>and</strong>l<strong>in</strong>g<br />
– schema support<br />
– database driver h<strong>and</strong>l<strong>in</strong>g<br />
– type h<strong>and</strong>l<strong>in</strong>g with support for arrays<br />
• <strong>New</strong> node:<br />
– Table Creator with support for unique/primary keys<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 16
<strong>KNIME</strong> <strong>Big</strong> <strong>Data</strong> Connectors<br />
• Improved secured cluster support<br />
– Full scale Kerberos support<br />
– Apache KNOX <strong>in</strong>tegration<br />
• <strong>New</strong> nodes:<br />
– Amazon S3<br />
– Amazon Redshift<br />
– Phoenix<br />
– HBase<br />
– Drill<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 17
<strong>What's</strong> <strong>Cook<strong>in</strong>g</strong><br />
<strong>KNIME</strong> Spark Executor<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 18
<strong>KNIME</strong> Spark Executor Development<br />
• Extended Spark version support<br />
• Public API to write your own Spark nodes<br />
• Enhanced support for multiple Spark contexts<br />
• Support for secured clusters<br />
• <strong>New</strong> nodes:<br />
– GroupBy<br />
– Remote File Reader/Writer<br />
– <strong>Data</strong>base to Spark/Spark to <strong>Data</strong>base<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 19
Virtual <strong>Data</strong> Warehouse<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 20
The <strong>KNIME</strong>® trademark <strong>and</strong> logo <strong>and</strong> OPEN FOR INNOVATION® trademark are used by<br />
<strong>KNIME</strong>.com AG under license from <strong>KNIME</strong> GmbH, <strong>and</strong> are registered <strong>in</strong> <strong>the</strong> United States.<br />
<strong>KNIME</strong>® is also registered <strong>in</strong> Germany.<br />
© 2016 <strong>KNIME</strong>.com AG. All Rights Reserved. 21