1--- 2layout: "guides" 3page_title: "Preemption (Service and Batch Jobs)" 4sidebar_current: "guides-operating-a-job-preemption-service-batch" 5description: |- 6 The following guide walks the user through enabling and using preemption on 7 service and batch jobs in Nomad Enterprise (0.9.3 and above). 8--- 9 10# Preemption for Service and Batch Jobs 11 12~> **Enterprise Only!** This functionality only exists in Nomad Enterprise. This 13is not present in the open source version of Nomad. 14 15Prior to Nomad 0.9, job [priority][priority] in Nomad was used to process 16scheduling requests in priority order. Preemption, implemented in Nomad 0.9 17allows Nomad to evict running allocations to place allocations of a higher 18priority. Allocations of a job that are blocked temporarily go into "pending" 19status until the cluster has additional capacity to run them. This is useful 20when operators need to run relatively higher priority tasks sooner even under 21resource contention across the cluster. 22 23While Nomad 0.9 introduced preemption for [system][system-job] jobs, Nomad 0.9.3 24[Enterprise][enterprise] additionally allows preemption for 25[service][service-job] and [batch][batch-job] jobs. This functionality can 26easily be enabled by sending a [payload][payload-preemption-config] with the 27appropriate options specified to the [scheduler 28configuration][update-scheduler] API endpoint. 29 30## Reference Material 31 32- [Preemption][preemption] 33- [Nomad Enterprise Preemption][enterprise-preemption] 34 35## Estimated Time to Complete 36 3720 minutes 38 39## Prerequisites 40 41To perform the tasks described in this guide, you need to have a Nomad 42environment with Consul installed. You can use this 43[repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud) 44to easily provision a sandbox environment. This guide will assume a cluster with 45one server node and three client nodes. To simulate resource contention, the 46nodes in this environment will each have 1 GB RAM (For AWS, you can choose the 47[t2.micro][t2-micro] instance type). Remember that service and batch job 48preemption require Nomad 0.9.3 [Enterprise][enterprise]. 49 50-> **Please Note:** This guide is for demo purposes and is only using a single 51server node. In a production cluster, 3 or 5 server nodes are recommended. 52 53## Steps 54 55### Step 1: Create a Job with Low Priority 56 57Start by creating a job with relatively lower priority into your Nomad cluster. 58One of the allocations from this job will be preempted in a subsequent 59deployment when there is a resource contention in the cluster. Copy the 60following job into a file and name it `webserver.nomad`. 61 62```hcl 63job "webserver" { 64 datacenters = ["dc1"] 65 type = "service" 66 priority = 40 67 68 group "webserver" { 69 count = 3 70 71 task "apache" { 72 driver = "docker" 73 74 config { 75 image = "httpd:latest" 76 77 port_map { 78 http = 80 79 } 80 } 81 82 resources { 83 network { 84 mbits = 10 85 port "http"{} 86 } 87 88 memory = 600 89 } 90 91 service { 92 name = "apache-webserver" 93 port = "http" 94 95 check { 96 name = "alive" 97 type = "http" 98 path = "/" 99 interval = "10s" 100 timeout = "2s" 101 } 102 } 103 } 104 } 105} 106``` 107Note that the [count][count] is 3 and that each allocation is specifying 600 MB 108of [memory][memory]. Remember that each node only has 1 GB of RAM. 109 110### Step 2: Run the Low Priority Job 111 112Register `webserver.nomad`: 113 114```shell 115$ nomad run webserver.nomad 116==> Monitoring evaluation "1596bfc8" 117 Evaluation triggered by job "webserver" 118 Allocation "725d3b49" created: node "16653ac1", group "webserver" 119 Allocation "e2f9cb3d" created: node "f765c6e8", group "webserver" 120 Allocation "e9d8df1b" created: node "b0700ec0", group "webserver" 121 Evaluation status changed: "pending" -> "complete" 122==> Evaluation "1596bfc8" finished with status "complete" 123``` 124You should be able to check the status of the `webserver` job at this point and see that an allocation has been placed on each client node in the cluster: 125 126```shell 127$ nomad status webserver 128ID = webserver 129Name = webserver 130Submit Date = 2019-06-19T04:20:32Z 131Type = service 132Priority = 40 133... 134Allocations 135ID Node ID Task Group Version Desired Status Created Modified 136725d3b49 16653ac1 webserver 0 run running 1m18s ago 59s ago 137e2f9cb3d f765c6e8 webserver 0 run running 1m18s ago 1m2s ago 138e9d8df1b b0700ec0 webserver 0 run running 1m18s ago 59s ago 139``` 140 141### Step 3: Create a Job with High Priority 142 143Create another job with a [priority][priority] greater than the job you just deployed. Copy the following into a file named `redis.nomad`: 144 145```hcl 146job "redis" { 147 datacenters = ["dc1"] 148 type = "service" 149 priority = 80 150 151 group "cache1" { 152 count = 1 153 154 task "redis" { 155 driver = "docker" 156 157 config { 158 image = "redis:latest" 159 160 port_map { 161 db = 6379 162 } 163 } 164 165 resources { 166 network { 167 port "db" {} 168 } 169 170 memory = 700 171 } 172 173 service { 174 name = "redis-cache" 175 port = "db" 176 177 check { 178 name = "alive" 179 type = "tcp" 180 interval = "10s" 181 timeout = "2s" 182 } 183 } 184 } 185 } 186} 187``` 188Note that this job has a priority of 80 (greater than the priority of the job 189from [Step 1][step-1]) and requires 700 MB of memory. This allocation will 190create a resource contention in the cluster since each node only has 1 GB of 191memory with a 600 MB allocation already placed on it. 192 193### Step 4: Try to Run `redis.nomad` 194 195Remember that preemption for service and batch jobs are [disabled by 196default][preemption-config]. This means that the `redis` job will be queued due 197to resource contention in the cluster. You can verify the resource contention before actually registering your job by running the [`plan`][plan] command: 198 199```shell 200$ nomad plan redis.nomad 201+ Job: "redis" 202+ Task Group: "cache1" (1 create) 203 + Task: "redis" (forces create) 204 205Scheduler dry-run: 206- WARNING: Failed to place all allocations. 207 Task Group "cache1" (failed to place 1 allocation): 208 * Resources exhausted on 3 nodes 209 * Dimension "memory" exhausted on 3 nodes 210``` 211Run the job to see that the allocation will be queued: 212 213```shell 214$ nomad run redis.nomad 215==> Monitoring evaluation "1e54e283" 216 Evaluation triggered by job "redis" 217 Evaluation status changed: "pending" -> "complete" 218==> Evaluation "1e54e283" finished with status "complete" but failed to place all allocations: 219 Task Group "cache1" (failed to place 1 allocation): 220 * Resources exhausted on 3 nodes 221 * Dimension "memory" exhausted on 3 nodes 222 Evaluation "1512251a" waiting for additional capacity to place remainder 223``` 224 225You may also verify the allocation has been queued by now checking the status of the job: 226 227```shell 228$ nomad status redis 229ID = redis 230Name = redis 231Submit Date = 2019-06-19T03:33:17Z 232Type = service 233Priority = 80 234... 235Placement Failure 236Task Group "cache1": 237 * Resources exhausted on 3 nodes 238 * Dimension "memory" exhausted on 3 nodes 239 240Allocations 241No allocations placed 242``` 243You may remove this job now. In the next steps, we will enable service job preemption and re-deploy: 244 245```shell 246$ nomad stop -purge redis 247==> Monitoring evaluation "153db6c0" 248 Evaluation triggered by job "redis" 249 Evaluation status changed: "pending" -> "complete" 250==> Evaluation "153db6c0" finished with status "complete" 251``` 252 253### Step 5: Enable Service Job Preemption 254 255Verify the [scheduler configuration][scheduler-configuration] with the following 256command: 257 258```shell 259$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq 260{ 261 "SchedulerConfig": { 262 "PreemptionConfig": { 263 "SystemSchedulerEnabled": true, 264 "BatchSchedulerEnabled": false, 265 "ServiceSchedulerEnabled": false 266 }, 267 "CreateIndex": 5, 268 "ModifyIndex": 506 269 }, 270 "Index": 506, 271 "LastContact": 0, 272 "KnownLeader": true 273} 274``` 275 276Note that [BatchSchedulerEnabled][batch-enabled] and 277[ServiceSchedulerEnabled][service-enabled] are both set to `false` by default. 278Since we are preempting service jobs in this guide, we need to set 279`ServiceSchedulerEnabled` to `true`. We will do this by directly interacting 280with the [API][update-scheduler]. 281 282Create the following JSON payload and place it in a file named `scheduler.json`: 283 284```json 285{ 286 "PreemptionConfig": { 287 "SystemSchedulerEnabled": true, 288 "BatchSchedulerEnabled": false, 289 "ServiceSchedulerEnabled": true 290 } 291} 292``` 293Note that [ServiceSchedulerEnabled][service-enabled] has been set to `true`. 294 295Run the following command to update the scheduler configuration: 296 297```shell 298$ curl -XPOST localhost:4646/v1/operator/scheduler/configuration -d @scheduler.json 299``` 300You should now be able to check the scheduler configuration again and see that 301preemption has been enabled for service jobs (output below is abbreviated): 302 303```shell 304$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq 305{ 306 "SchedulerConfig": { 307 "PreemptionConfig": { 308 "SystemSchedulerEnabled": true, 309 "BatchSchedulerEnabled": false, 310 "ServiceSchedulerEnabled": true 311 }, 312... 313} 314``` 315 316### Step 6: Try Running `redis.nomad` Again 317 318Now that you have enabled preemption on service jobs, deploying your `redis` job 319should evict one of the lower priority `webserver` allocations and place it into 320a queue. You can run `nomad plan` to see a preview of what will happen: 321 322```shell 323$ nomad plan redis.nomad 324+ Job: "redis" 325+ Task Group: "cache1" (1 create) 326 + Task: "redis" (forces create) 327 328Scheduler dry-run: 329- All tasks successfully allocated. 330 331Preemptions: 332 333Alloc ID Job ID Task Group 334725d3b49-d5cf-6ba2-be3d-cb441c10a8b3 webserver webserver 335... 336``` 337 338Note that Nomad is indicating one of the `webserver` allocations will be 339evicted. 340 341Now run the `redis` job: 342 343```shell 344$ nomad run redis.nomad 345==> Monitoring evaluation "7ada9d9f" 346 Evaluation triggered by job "redis" 347 Allocation "8bfcdda3" created: node "16653ac1", group "cache1" 348 Evaluation status changed: "pending" -> "complete" 349==> Evaluation "7ada9d9f" finished with status "complete" 350``` 351You can check the status of the `webserver` job and verify one of the allocations has been evicted: 352 353```shell 354$ nomad status webserver 355ID = webserver 356Name = webserver 357Submit Date = 2019-06-19T04:20:32Z 358Type = service 359Priority = 40 360... 361Summary 362Task Group Queued Starting Running Failed Complete Lost 363webserver 1 0 2 0 1 0 364 365Placement Failure 366Task Group "webserver": 367 * Resources exhausted on 3 nodes 368 * Dimension "memory" exhausted on 3 nodes 369 370Allocations 371ID Node ID Task Group Version Desired Status Created Modified 372725d3b49 16653ac1 webserver 0 evict complete 4m10s ago 33s ago 373e2f9cb3d f765c6e8 webserver 0 run running 4m10s ago 3m54s ago 374e9d8df1b b0700ec0 webserver 0 run running 4m10s ago 3m51s ago 375``` 376 377### Step 7: Stop the Redis Job 378 379Stop the `redis` job and verify that evicted/queued `webserver` allocation 380starts running again: 381 382```shell 383$ nomad stop redis 384==> Monitoring evaluation "670922e9" 385 Evaluation triggered by job "redis" 386 Evaluation status changed: "pending" -> "complete" 387==> Evaluation "670922e9" finished with status "complete" 388``` 389You should now be able to see from the `webserver` status that the third allocation that was previously preempted is running again: 390 391```shell 392$ nomad status webserver 393ID = webserver 394Name = webserver 395Submit Date = 2019-06-19T04:20:32Z 396Type = service 397Priority = 40 398Datacenters = dc1 399Status = running 400Periodic = false 401Parameterized = false 402 403Summary 404Task Group Queued Starting Running Failed Complete Lost 405webserver 0 0 3 0 1 0 406 407Allocations 408ID Node ID Task Group Version Desired Status Created Modified 409f623eb81 16653ac1 webserver 0 run running 13s ago 7s ago 410725d3b49 16653ac1 webserver 0 evict complete 6m44s ago 3m7s ago 411e2f9cb3d f765c6e8 webserver 0 run running 6m44s ago 6m28s ago 412e9d8df1b b0700ec0 webserver 0 run running 6m44s ago 6m25s ago 413``` 414 415## Next Steps 416 417The process you learned in this guide can also be applied to 418[batch][batch-enabled] jobs as well. Read more about preemption in Nomad 419Enterprise [here][enterprise-preemption]. 420 421[batch-enabled]: /api/operator.html#batchschedulerenabled-1 422[batch-job]: /docs/schedulers.html#batch 423[count]: /docs/job-specification/group.html#count 424[enterprise]: /docs/enterprise/index.html 425[enterprise-preemption]: /docs/enterprise/index.html#preemption 426[memory]: /docs/job-specification/resources.html#memory 427[payload-preemption-config]: /api/operator.html#sample-payload-1 428[plan]: /docs/commands/job/plan.html 429[preemption]: /docs/internals/scheduling/preemption.html 430[preemption-config]: /api/operator.html#preemptionconfig-1 431[priority]: /docs/job-specification/job.html#priority 432[service-enabled]: /api/operator.html#serviceschedulerenabled-1 433[service-job]: /docs/schedulers.html#service 434[step-1]: #step-1-create-a-job-with-low-priority 435[system-job]: /docs/schedulers.html#system 436[t2-micro]: https://aws.amazon.com/ec2/instance-types/ 437[update-scheduler]: /api/operator.html#update-scheduler-configuration 438[scheduler-configuration]: /api/operator.html#read-scheduler-configuration 439