cloudera-spark
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Developing Spark Applications<br />
• If restrictions on HDFS encryption zones prevent files from being moved to the HDFS trashcan. This restriction<br />
primarily applies to CDH 5.7 and lower. With CDH 5.8 and higher, each HDFS encryption zone has its own HDFS<br />
trashcan, so the normal DROP TABLE behavior works correctly without the PURGE clause.<br />
Using Spark MLlib<br />
CDH 5.5 supports MLlib, Spark's machine learning library. For information on MLlib, see the Machine Learning Library<br />
(MLlib) Guide.<br />
Running a Spark MLlib Example<br />
To try Spark MLlib using one of the Spark example applications, do the following:<br />
1. Download MovieLens sample data and copy it to HDFS:<br />
$ wget --no-check-certificate \<br />
https://raw.githubusercontent.com/apache/<strong>spark</strong>/branch-1.6/data/mllib/sample_movielens_data.txt<br />
$ hdfs dfs -copyFromLocal sample_movielens_data.txt /user/hdfs<br />
2. Run the Spark MLlib MovieLens example application, which calculates recommendations based on movie reviews:<br />
$ <strong>spark</strong>-submit --master local --class org.apache.<strong>spark</strong>.examples.mllib.MovieLensALS \<br />
SPARK_HOME/lib/<strong>spark</strong>-examples.jar \<br />
--rank 5 --numIterations 5 --lambda 1.0 --kryo sample_movielens_data.txt<br />
Enabling Native Acceleration For MLlib<br />
MLlib algorithms are compute intensive and benefit from hardware acceleration. To enable native acceleration for<br />
MLlib, perform the following tasks.<br />
Install Required Software<br />
• Install the appropriate libgfortran 4.6+ package for your operating system. No compatible version is available<br />
for RHEL 5 or 6.<br />
OS<br />
RHEL 7.1<br />
SLES 11 SP3<br />
Ubuntu 12.04<br />
Ubuntu 14.04<br />
Debian 7.1<br />
Package Name<br />
libgfortran<br />
libgfortran3<br />
libgfortran3<br />
libgfortran3<br />
libgfortran3<br />
Package Version<br />
4.8.x<br />
4.7.2<br />
4.6.3<br />
4.8.4<br />
4.7.2<br />
• Install the GPL Extras parcel or package.<br />
Verify Native Acceleration<br />
You can verify that native acceleration is working by examining logs after running an application. To verify native<br />
acceleration with an MLlib example application:<br />
1. Do the steps in Running a Spark MLlib Example on page 20.<br />
2. Check the logs. If native libraries are not loaded successfully, you see the following four warnings before the final<br />
line, where the RMSE is printed:<br />
15/07/12 12:33:01 WARN BLAS: Failed to load implementation from:<br />
com.github.fommil.netlib.NativeSystemBLAS<br />
20 | Spark Guide