22.08.2019 Views

cloudera-spark

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Running Spark Applications<br />

Table 3: Deployment Mode Summary<br />

Mode<br />

Driver runs in<br />

Requests resources<br />

Starts executor processes<br />

Persistent services<br />

Supports Spark Shell<br />

YARN Client Mode<br />

Client<br />

ApplicationMaster<br />

YARN NodeManager<br />

YARN ResourceManager and<br />

NodeManagers<br />

Yes<br />

YARN Cluster Mode<br />

ApplicationMaster<br />

ApplicationMaster<br />

YARN NodeManager<br />

YARN ResourceManager and<br />

NodeManagers<br />

No<br />

Configuring the Environment<br />

Spark requires that the HADOOP_CONF_DIR or YARN_CONF_DIR environment variable point to the directory containing<br />

the client-side configuration files for the cluster. These configurations are used to write to HDFS and connect to the<br />

YARN ResourceManager. If you are using a Cloudera Manager deployment, these variables are configured automatically.<br />

If you are using an unmanaged deployment, ensure that you set the variables as described in Running Spark on YARN.<br />

Running a Spark Shell Application on YARN<br />

To run the <strong>spark</strong>-shell or py<strong>spark</strong> client on YARN, use the --master yarn --deploy-mode client flags when<br />

you start the application.<br />

Spark Guide | 37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!