22.08.2019 Views

cloudera-spark

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Developing Spark Applications<br />

• If restrictions on HDFS encryption zones prevent files from being moved to the HDFS trashcan. This restriction<br />

primarily applies to CDH 5.7 and lower. With CDH 5.8 and higher, each HDFS encryption zone has its own HDFS<br />

trashcan, so the normal DROP TABLE behavior works correctly without the PURGE clause.<br />

Using Spark MLlib<br />

CDH 5.5 supports MLlib, Spark's machine learning library. For information on MLlib, see the Machine Learning Library<br />

(MLlib) Guide.<br />

Running a Spark MLlib Example<br />

To try Spark MLlib using one of the Spark example applications, do the following:<br />

1. Download MovieLens sample data and copy it to HDFS:<br />

$ wget --no-check-certificate \<br />

https://raw.githubusercontent.com/apache/<strong>spark</strong>/branch-1.6/data/mllib/sample_movielens_data.txt<br />

$ hdfs dfs -copyFromLocal sample_movielens_data.txt /user/hdfs<br />

2. Run the Spark MLlib MovieLens example application, which calculates recommendations based on movie reviews:<br />

$ <strong>spark</strong>-submit --master local --class org.apache.<strong>spark</strong>.examples.mllib.MovieLensALS \<br />

SPARK_HOME/lib/<strong>spark</strong>-examples.jar \<br />

--rank 5 --numIterations 5 --lambda 1.0 --kryo sample_movielens_data.txt<br />

Enabling Native Acceleration For MLlib<br />

MLlib algorithms are compute intensive and benefit from hardware acceleration. To enable native acceleration for<br />

MLlib, perform the following tasks.<br />

Install Required Software<br />

• Install the appropriate libgfortran 4.6+ package for your operating system. No compatible version is available<br />

for RHEL 5 or 6.<br />

OS<br />

RHEL 7.1<br />

SLES 11 SP3<br />

Ubuntu 12.04<br />

Ubuntu 14.04<br />

Debian 7.1<br />

Package Name<br />

libgfortran<br />

libgfortran3<br />

libgfortran3<br />

libgfortran3<br />

libgfortran3<br />

Package Version<br />

4.8.x<br />

4.7.2<br />

4.6.3<br />

4.8.4<br />

4.7.2<br />

• Install the GPL Extras parcel or package.<br />

Verify Native Acceleration<br />

You can verify that native acceleration is working by examining logs after running an application. To verify native<br />

acceleration with an MLlib example application:<br />

1. Do the steps in Running a Spark MLlib Example on page 20.<br />

2. Check the logs. If native libraries are not loaded successfully, you see the following four warnings before the final<br />

line, where the RMSE is printed:<br />

15/07/12 12:33:01 WARN BLAS: Failed to load implementation from:<br />

com.github.fommil.netlib.NativeSystemBLAS<br />

20 | Spark Guide

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!