22.08.2019 Views

cloudera-spark

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Developing Spark Applications<br />

To compile Scala, include the Scala tools plug-in:<br />

<br />

org.scala-tools<br />

maven-scala-plugin<br />

<br />

<br />

<br />

compile<br />

testCompile<br />

<br />

<br />

<br />

<br />

which requires the scala-tools plug-in repository:<br />

<br />

<br />

scala-tools.org<br />

Scala-tools Maven2 Repository<br />

http://scala-tools.org/repo-releases<br />

<br />

<br />

Also, include Scala and Spark as dependencies:<br />

<br />

<br />

org.scala-lang<br />

scala-library<br />

2.10.2<br />

provided<br />

<br />

<br />

org.apache.<strong>spark</strong><br />

<strong>spark</strong>-core_2.10<br />

1.6.0-cdh5.7.0<br />

provided<br />

<br />

<br />

To generate the application JAR, run:<br />

$ mvn package<br />

to create <strong>spark</strong>wordcount-1.0-SNAPSHOT-jar-with-dependencies.jar in the target directory.<br />

Running the Application<br />

1. The input to the application is a large text file in which each line contains all the words in a document, stripped<br />

of punctuation. Put an input file in a directory on HDFS. You can use tutorial example input file:<br />

$ wget --no-check-certificate .../inputfile.txt<br />

$ hdfs dfs -put inputfile.txt<br />

2. Run one of the applications using <strong>spark</strong>-submit:<br />

• Scala - Run in a local process with threshold 2:<br />

$ <strong>spark</strong>-submit --class com.<strong>cloudera</strong>.<strong>spark</strong>wordcount.SparkWordCount \<br />

--master local --deploy-mode client --executor-memory 1g \<br />

--name wordcount --conf "<strong>spark</strong>.app.id=wordcount" \<br />

12 | Spark Guide

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!