1---
2layout: "guides"
3page_title: "Apache Spark Integration - Getting Started"
4sidebar_current: "guides-analytical-workloads-spark-pre"
5description: |-
6  Get started with the Nomad/Spark integration.
7---
8
9# Getting Started
10
11To get started, you can use Nomad's example Terraform configuration to
12automatically provision an environment in AWS, or you can manually provision a
13cluster.
14
15## Provision a Cluster in AWS
16
17Nomad's [Terraform configuration](https://github.com/hashicorp/nomad/tree/master/terraform)
18can be used to quickly provision a Spark-enabled Nomad environment in
19 AWS. The embedded [Spark example](https://github.com/hashicorp/nomad/tree/master/terraform/examples/spark)
20 provides for a quickstart experience that can be used in conjunction with
21 this guide. When you have a cluster up and running, you can proceed to
22[Submitting applications](/guides/analytical-workloads/spark/submit.html).
23
24## Manually Provision a Cluster
25
26To manually configure provision a cluster, see the Nomad
27[Getting Started](/intro/getting-started/install.html) guide. There are two
28basic prerequisites to using the Spark integration once you have a cluster up
29and running:
30
31- Access to a [Spark distribution](https://nomad-spark.s3.amazonaws.com/spark-2.1.0-bin-nomad.tgz)
32built with Nomad support. This is required for the machine that will submit
33applications as well as the Nomad tasks that will run the Spark executors.
34
35- A Java runtime environment (JRE) for the submitting machine and the executors.
36
37The subsections below explain further.
38
39### Configure the Submitting Machine
40
41To run Spark applications on Nomad, the submitting machine must have access to
42the cluster and have the Nomad-enabled Spark distribution installed. The code
43snippets below walk through installing Java and Spark on Ubuntu:
44
45Install Java:
46
47```shell
48$ sudo add-apt-repository -y ppa:openjdk-r/ppa
49$ sudo apt-get update
50$ sudo apt-get install -y openjdk-8-jdk
51$ JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
52```
53
54Install Spark:
55
56
57```shell
58$ wget -O - https://nomad-spark.s3.amazonaws.com/spark-2.1.0-bin-nomad.tgz \
59  | sudo tar xz -C /usr/local
60$ export PATH=$PATH:/usr/local/spark-2.1.0-bin-nomad/bin
61```
62
63Export NOMAD_ADDR to point Spark to your Nomad cluster:
64
65```shell
66$ export NOMAD_ADDR=http://NOMAD_SERVER_IP:4646
67```
68
69### Executor Access to the Spark Distribution
70
71When running on Nomad, Spark creates Nomad tasks to run executors for use by the
72application's driver program. The executor tasks need access to a JRE, a Spark
73distribution built with Nomad support, and (in cluster mode) the Spark
74application itself. By default, Nomad will only place Spark executors on client
75nodes that have the Java runtime installed (version 7 or higher).
76
77In this example, the Spark distribution and the Spark application JAR file are
78being pulled from Amazon S3:
79
80```shell
81$ spark-submit \
82    --class org.apache.spark.examples.JavaSparkPi \
83    --master nomad \
84    --deploy-mode cluster \
85    --conf spark.executor.instances=4 \
86    --conf spark.nomad.sparkDistribution=https://nomad-spark.s3.amazonaws.com/spark-2.1.0-bin-nomad.tgz \
87    https://nomad-spark.s3.amazonaws.com/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
88```
89
90### Using a Docker Image
91
92An alternative to installing the JRE on every client node is to set the
93[spark.nomad.dockerImage](/guides/analytical-workloads/spark/configuration.html#spark-nomad-dockerimage)
94 configuration property to the URL of a Docker image that has the Java runtime
95installed. If set, Nomad will use the `docker` driver to run Spark executors in
96a container created from the image. The
97[spark.nomad.dockerAuth](/guides/analytical-workloads/spark/configuration.html#spark-nomad-dockerauth)
98 configuration property can be set to a JSON object to provide Docker repository
99 authentication configuration.
100
101When using a Docker image, both the Spark distribution and the application
102itself can be included (in which case local URLs can be used for `spark-submit`).
103
104Here, we include [spark.nomad.dockerImage](/guides/analytical-workloads/spark/configuration.html#spark-nomad-dockerimage)
105and use local paths for
106[spark.nomad.sparkDistribution](/guides/analytical-workloads/spark/configuration.html#spark-nomad-sparkdistribution)
107and the application JAR file:
108
109```shell
110$ spark-submit \
111    --class org.apache.spark.examples.JavaSparkPi \
112    --master nomad \
113    --deploy-mode cluster \
114    --conf spark.nomad.dockerImage=rcgenova/spark \
115    --conf spark.executor.instances=4 \
116    --conf spark.nomad.sparkDistribution=/spark-2.1.0-bin-nomad.tgz \
117    /spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
118```
119
120## Next Steps
121
122Learn how to [submit applications](/guides/analytical-workloads/spark/submit.html).
123