1---
2layout: "guides"
3page_title: "Preemption (Service and Batch Jobs)"
4sidebar_current: "guides-operating-a-job-preemption-service-batch"
5description: |-
6  The following guide walks the user through enabling and using preemption on
7  service and batch jobs in Nomad Enterprise (0.9.3 and above).
8---
9
10# Preemption for Service and Batch Jobs
11
12~> **Enterprise Only!** This functionality only exists in Nomad Enterprise. This
13is not present in the open source version of Nomad.
14
15Prior to Nomad 0.9, job [priority][priority] in Nomad was used to process
16scheduling requests in priority order. Preemption, implemented in Nomad 0.9
17allows Nomad to evict running allocations to place allocations of a higher
18priority. Allocations of a job that are blocked temporarily go into "pending"
19status until the cluster has additional capacity to run them. This is useful
20when operators need to run relatively higher priority tasks sooner even under
21resource contention across the cluster.
22
23While Nomad 0.9 introduced preemption for [system][system-job] jobs, Nomad 0.9.3
24[Enterprise][enterprise] additionally allows preemption for
25[service][service-job] and [batch][batch-job] jobs. This functionality can
26easily be enabled by sending a [payload][payload-preemption-config] with the
27appropriate options specified to the [scheduler
28configuration][update-scheduler] API endpoint.
29
30## Reference Material
31
32- [Preemption][preemption]
33- [Nomad Enterprise Preemption][enterprise-preemption]
34
35## Estimated Time to Complete
36
3720 minutes
38
39## Prerequisites
40
41To perform the tasks described in this guide, you need to have a Nomad
42environment with Consul installed. You can use this
43[repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud)
44to easily provision a sandbox environment. This guide will assume a cluster with
45one server node and three client nodes. To simulate resource contention, the
46nodes in this environment will each have 1 GB RAM (For AWS, you can choose the
47[t2.micro][t2-micro] instance type). Remember that service and batch job
48preemption require Nomad 0.9.3 [Enterprise][enterprise].
49
50-> **Please Note:** This guide is for demo purposes and is only using a single
51server node. In a production cluster, 3 or 5 server nodes are recommended.
52
53## Steps
54
55### Step 1: Create a Job with Low Priority
56
57Start by creating a job with relatively lower priority into your Nomad cluster.
58One of the allocations from this job will be preempted in a subsequent
59deployment when there is a resource contention in the cluster. Copy the
60following job into a file and name it `webserver.nomad`.
61
62```hcl
63job "webserver" {
64  datacenters = ["dc1"]
65  type        = "service"
66  priority    = 40
67
68  group "webserver" {
69    count = 3
70
71    task "apache" {
72      driver = "docker"
73
74      config {
75        image = "httpd:latest"
76
77        port_map {
78          http = 80
79        }
80      }
81
82      resources {
83        network {
84          mbits = 10
85          port  "http"{}
86        }
87
88        memory = 600
89      }
90
91      service {
92        name = "apache-webserver"
93        port = "http"
94
95        check {
96          name     = "alive"
97          type     = "http"
98          path     = "/"
99          interval = "10s"
100          timeout  = "2s"
101        }
102      }
103    }
104  }
105}
106```
107Note that the [count][count] is 3 and that each allocation is specifying 600 MB
108of [memory][memory]. Remember that each node only has 1 GB of RAM.
109
110### Step 2: Run the Low Priority Job
111
112Register `webserver.nomad`:
113
114```shell
115$ nomad run webserver.nomad
116==> Monitoring evaluation "1596bfc8"
117    Evaluation triggered by job "webserver"
118    Allocation "725d3b49" created: node "16653ac1", group "webserver"
119    Allocation "e2f9cb3d" created: node "f765c6e8", group "webserver"
120    Allocation "e9d8df1b" created: node "b0700ec0", group "webserver"
121    Evaluation status changed: "pending" -> "complete"
122==> Evaluation "1596bfc8" finished with status "complete"
123```
124You should be able to check the status of the `webserver` job at this point and see that an allocation has been placed on each client node in the cluster:
125
126```shell
127$ nomad status webserver
128ID            = webserver
129Name          = webserver
130Submit Date   = 2019-06-19T04:20:32Z
131Type          = service
132Priority      = 40
133...
134Allocations
135ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
136725d3b49  16653ac1  webserver   0        run      running  1m18s ago  59s ago
137e2f9cb3d  f765c6e8  webserver   0        run      running  1m18s ago  1m2s ago
138e9d8df1b  b0700ec0  webserver   0        run      running  1m18s ago  59s ago
139```
140
141### Step 3: Create a Job with High Priority
142
143Create another job with a [priority][priority] greater than the job you just deployed. Copy the following into a file named `redis.nomad`:
144
145```hcl
146job "redis" {
147  datacenters = ["dc1"]
148  type        = "service"
149  priority    = 80
150
151  group "cache1" {
152    count = 1
153
154    task "redis" {
155      driver = "docker"
156
157      config {
158        image = "redis:latest"
159
160        port_map {
161          db = 6379
162        }
163      }
164
165      resources {
166        network {
167          port "db" {}
168        }
169
170        memory = 700
171      }
172
173      service {
174        name = "redis-cache"
175        port = "db"
176
177        check {
178          name     = "alive"
179          type     = "tcp"
180          interval = "10s"
181          timeout  = "2s"
182        }
183      }
184    }
185  }
186}
187```
188Note that this job has a priority of 80 (greater than the priority of the job
189from [Step 1][step-1]) and requires 700 MB of memory. This allocation will
190create a resource contention in the cluster since each node only has 1 GB of
191memory with a 600 MB allocation already placed on it.
192
193### Step 4: Try to Run `redis.nomad`
194
195Remember that preemption for service and batch jobs are [disabled by
196default][preemption-config]. This means that the `redis` job will be queued due
197to resource contention in the cluster. You can verify the resource contention before actually registering your job by running the [`plan`][plan] command:
198
199```shell
200$ nomad plan redis.nomad
201+ Job: "redis"
202+ Task Group: "cache1" (1 create)
203  + Task: "redis" (forces create)
204
205Scheduler dry-run:
206- WARNING: Failed to place all allocations.
207  Task Group "cache1" (failed to place 1 allocation):
208    * Resources exhausted on 3 nodes
209    * Dimension "memory" exhausted on 3 nodes
210```
211Run the job to see that the allocation will be queued:
212
213```shell
214$ nomad run redis.nomad
215==> Monitoring evaluation "1e54e283"
216    Evaluation triggered by job "redis"
217    Evaluation status changed: "pending" -> "complete"
218==> Evaluation "1e54e283" finished with status "complete" but failed to place all allocations:
219    Task Group "cache1" (failed to place 1 allocation):
220      * Resources exhausted on 3 nodes
221      * Dimension "memory" exhausted on 3 nodes
222    Evaluation "1512251a" waiting for additional capacity to place remainder
223```
224
225You may also verify the allocation has been queued by now checking the status of the job:
226
227```shell
228$ nomad status redis
229ID            = redis
230Name          = redis
231Submit Date   = 2019-06-19T03:33:17Z
232Type          = service
233Priority      = 80
234...
235Placement Failure
236Task Group "cache1":
237  * Resources exhausted on 3 nodes
238  * Dimension "memory" exhausted on 3 nodes
239
240Allocations
241No allocations placed
242```
243You may remove this job now. In the next steps, we will enable service job preemption and re-deploy:
244
245```shell
246$ nomad stop -purge redis
247==> Monitoring evaluation "153db6c0"
248    Evaluation triggered by job "redis"
249    Evaluation status changed: "pending" -> "complete"
250==> Evaluation "153db6c0" finished with status "complete"
251```
252
253### Step 5: Enable Service Job Preemption
254
255Verify the [scheduler configuration][scheduler-configuration] with the following
256command:
257
258```shell
259$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
260{
261  "SchedulerConfig": {
262    "PreemptionConfig": {
263      "SystemSchedulerEnabled": true,
264      "BatchSchedulerEnabled": false,
265      "ServiceSchedulerEnabled": false
266    },
267    "CreateIndex": 5,
268    "ModifyIndex": 506
269  },
270  "Index": 506,
271  "LastContact": 0,
272  "KnownLeader": true
273}
274```
275
276Note that [BatchSchedulerEnabled][batch-enabled] and
277[ServiceSchedulerEnabled][service-enabled] are both set to `false` by default.
278Since we are preempting service jobs in this guide, we need to set
279`ServiceSchedulerEnabled` to `true`. We will do this by directly interacting
280with the [API][update-scheduler].
281
282Create the following JSON payload and place it in a file named `scheduler.json`:
283
284```json
285{
286  "PreemptionConfig": {
287    "SystemSchedulerEnabled": true,
288    "BatchSchedulerEnabled": false,
289    "ServiceSchedulerEnabled": true
290  }
291}
292```
293Note that [ServiceSchedulerEnabled][service-enabled] has been set to `true`.
294
295Run the following command to update the scheduler configuration:
296
297```shell
298$ curl -XPOST localhost:4646/v1/operator/scheduler/configuration -d @scheduler.json
299```
300You should now be able to check the scheduler configuration again and see that
301preemption has been enabled for service jobs (output below is abbreviated):
302
303```shell
304$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
305{
306  "SchedulerConfig": {
307    "PreemptionConfig": {
308      "SystemSchedulerEnabled": true,
309      "BatchSchedulerEnabled": false,
310      "ServiceSchedulerEnabled": true
311    },
312...
313}
314```
315
316### Step 6: Try Running `redis.nomad` Again
317
318Now that you have enabled preemption on service jobs, deploying your `redis` job
319should evict one of the lower priority `webserver` allocations and place it into
320a queue. You can run `nomad plan` to see a preview of what will happen:
321
322```shell
323$ nomad plan redis.nomad
324+ Job: "redis"
325+ Task Group: "cache1" (1 create)
326  + Task: "redis" (forces create)
327
328Scheduler dry-run:
329- All tasks successfully allocated.
330
331Preemptions:
332
333Alloc ID                              Job ID     Task Group
334725d3b49-d5cf-6ba2-be3d-cb441c10a8b3  webserver  webserver
335...
336```
337
338Note that Nomad is indicating one of the `webserver` allocations will be
339evicted.
340
341Now run the `redis` job:
342
343```shell
344$ nomad run redis.nomad
345==> Monitoring evaluation "7ada9d9f"
346    Evaluation triggered by job "redis"
347    Allocation "8bfcdda3" created: node "16653ac1", group "cache1"
348    Evaluation status changed: "pending" -> "complete"
349==> Evaluation "7ada9d9f" finished with status "complete"
350```
351You can check the status of the `webserver` job and verify one of the allocations has been evicted:
352
353```shell
354$ nomad status webserver
355ID            = webserver
356Name          = webserver
357Submit Date   = 2019-06-19T04:20:32Z
358Type          = service
359Priority      = 40
360...
361Summary
362Task Group  Queued  Starting  Running  Failed  Complete  Lost
363webserver   1       0         2        0       1         0
364
365Placement Failure
366Task Group "webserver":
367  * Resources exhausted on 3 nodes
368  * Dimension "memory" exhausted on 3 nodes
369
370Allocations
371ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
372725d3b49  16653ac1  webserver   0        evict    complete  4m10s ago  33s ago
373e2f9cb3d  f765c6e8  webserver   0        run      running   4m10s ago  3m54s ago
374e9d8df1b  b0700ec0  webserver   0        run      running   4m10s ago  3m51s ago
375```
376
377### Step 7: Stop the Redis Job
378
379Stop the `redis` job and verify that evicted/queued `webserver` allocation
380starts running again:
381
382```shell
383$ nomad stop redis
384==> Monitoring evaluation "670922e9"
385    Evaluation triggered by job "redis"
386    Evaluation status changed: "pending" -> "complete"
387==> Evaluation "670922e9" finished with status "complete"
388```
389You should now be able to see from the `webserver` status that the third allocation that was previously preempted is running again:
390
391```shell
392$ nomad status webserver
393ID            = webserver
394Name          = webserver
395Submit Date   = 2019-06-19T04:20:32Z
396Type          = service
397Priority      = 40
398Datacenters   = dc1
399Status        = running
400Periodic      = false
401Parameterized = false
402
403Summary
404Task Group  Queued  Starting  Running  Failed  Complete  Lost
405webserver   0       0         3        0       1         0
406
407Allocations
408ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
409f623eb81  16653ac1  webserver   0        run      running   13s ago    7s ago
410725d3b49  16653ac1  webserver   0        evict    complete  6m44s ago  3m7s ago
411e2f9cb3d  f765c6e8  webserver   0        run      running   6m44s ago  6m28s ago
412e9d8df1b  b0700ec0  webserver   0        run      running   6m44s ago  6m25s ago
413```
414
415## Next Steps
416
417The process you learned in this guide can also be applied to
418[batch][batch-enabled] jobs as well. Read more about preemption in Nomad
419Enterprise [here][enterprise-preemption].
420
421[batch-enabled]: /api/operator.html#batchschedulerenabled-1
422[batch-job]: /docs/schedulers.html#batch
423[count]: /docs/job-specification/group.html#count
424[enterprise]: /docs/enterprise/index.html
425[enterprise-preemption]: /docs/enterprise/index.html#preemption
426[memory]: /docs/job-specification/resources.html#memory
427[payload-preemption-config]: /api/operator.html#sample-payload-1
428[plan]: /docs/commands/job/plan.html
429[preemption]: /docs/internals/scheduling/preemption.html
430[preemption-config]: /api/operator.html#preemptionconfig-1
431[priority]: /docs/job-specification/job.html#priority
432[service-enabled]: /api/operator.html#serviceschedulerenabled-1
433[service-job]: /docs/schedulers.html#service
434[step-1]: #step-1-create-a-job-with-low-priority
435[system-job]: /docs/schedulers.html#system
436[t2-micro]: https://aws.amazon.com/ec2/instance-types/
437[update-scheduler]: /api/operator.html#update-scheduler-configuration
438[scheduler-configuration]: /api/operator.html#read-scheduler-configuration
439