operating-a-job/advanced-scheduling/preemption-service-batch.html.md

---
layout: "guides"
page_title: "Preemption (Service and Batch Jobs)"
sidebar_current: "guides-operating-a-job-preemption-service-batch"
description: |-
  The following guide walks the user through enabling and using preemption on
  service and batch jobs in Nomad Enterprise (0.9.3 and above).
---

# Preemption for Service and Batch Jobs

~> **Enterprise Only!** This functionality only exists in Nomad Enterprise. This
is not present in the open source version of Nomad.

Prior to Nomad 0.9, job [priority][priority] in Nomad was used to process
scheduling requests in priority order. Preemption, implemented in Nomad 0.9
allows Nomad to evict running allocations to place allocations of a higher
priority. Allocations of a job that are blocked temporarily go into "pending"
status until the cluster has additional capacity to run them. This is useful
when operators need to run relatively higher priority tasks sooner even under
resource contention across the cluster.

While Nomad 0.9 introduced preemption for [system][system-job] jobs, Nomad 0.9.3
[Enterprise][enterprise] additionally allows preemption for
[service][service-job] and [batch][batch-job] jobs. This functionality can
easily be enabled by sending a [payload][payload-preemption-config] with the
appropriate options specified to the [scheduler
configuration][update-scheduler] API endpoint.

## Reference Material

- [Preemption][preemption]
- [Nomad Enterprise Preemption][enterprise-preemption]

## Estimated Time to Complete

20 minutes

## Prerequisites

To perform the tasks described in this guide, you need to have a Nomad
environment with Consul installed. You can use this
[repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud)
to easily provision a sandbox environment. This guide will assume a cluster with
one server node and three client nodes. To simulate resource contention, the
nodes in this environment will each have 1 GB RAM (For AWS, you can choose the
[t2.micro][t2-micro] instance type). Remember that service and batch job
preemption require Nomad 0.9.3 [Enterprise][enterprise].

-> **Please Note:** This guide is for demo purposes and is only using a single
server node. In a production cluster, 3 or 5 server nodes are recommended.

## Steps

### Step 1: Create a Job with Low Priority

Start by creating a job with relatively lower priority into your Nomad cluster.
One of the allocations from this job will be preempted in a subsequent
deployment when there is a resource contention in the cluster. Copy the
following job into a file and name it `webserver.nomad`.

```hcl
job "webserver" {
  datacenters = ["dc1"]
  type        = "service"
  priority    = 40

  group "webserver" {
    count = 3

    task "apache" {
      driver = "docker"

      config {
        image = "httpd:latest"

        port_map {
          http = 80
        }
      }

      resources {
        network {
          mbits = 10
          port  "http"{}
        }

        memory = 600
      }

      service {
        name = "apache-webserver"
        port = "http"

        check {
          name     = "alive"
          type     = "http"
          path     = "/"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}
```
Note that the [count][count] is 3 and that each allocation is specifying 600 MB
of [memory][memory]. Remember that each node only has 1 GB of RAM.

### Step 2: Run the Low Priority Job

Register `webserver.nomad`:

```shell
$ nomad run webserver.nomad
==> Monitoring evaluation "1596bfc8"
    Evaluation triggered by job "webserver"
    Allocation "725d3b49" created: node "16653ac1", group "webserver"
    Allocation "e2f9cb3d" created: node "f765c6e8", group "webserver"
    Allocation "e9d8df1b" created: node "b0700ec0", group "webserver"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "1596bfc8" finished with status "complete"
```
You should be able to check the status of the `webserver` job at this point and see that an allocation has been placed on each client node in the cluster:

```shell
$ nomad status webserver
ID            = webserver
Name          = webserver
Submit Date   = 2019-06-19T04:20:32Z
Type          = service
Priority      = 40
...
Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
725d3b49  16653ac1  webserver   0        run      running  1m18s ago  59s ago
e2f9cb3d  f765c6e8  webserver   0        run      running  1m18s ago  1m2s ago
e9d8df1b  b0700ec0  webserver   0        run      running  1m18s ago  59s ago
```

### Step 3: Create a Job with High Priority

Create another job with a [priority][priority] greater than the job you just deployed. Copy the following into a file named `redis.nomad`:

```hcl
job "redis" {
  datacenters = ["dc1"]
  type        = "service"
  priority    = 80

  group "cache1" {
    count = 1

    task "redis" {
      driver = "docker"

      config {
        image = "redis:latest"

        port_map {
          db = 6379
        }
      }

      resources {
        network {
          port "db" {}
        }

        memory = 700
      }

      service {
        name = "redis-cache"
        port = "db"

        check {
          name     = "alive"
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}
```
Note that this job has a priority of 80 (greater than the priority of the job
from [Step 1][step-1]) and requires 700 MB of memory. This allocation will
create a resource contention in the cluster since each node only has 1 GB of
memory with a 600 MB allocation already placed on it.

### Step 4: Try to Run `redis.nomad`

Remember that preemption for service and batch jobs are [disabled by
default][preemption-config]. This means that the `redis` job will be queued due
to resource contention in the cluster. You can verify the resource contention before actually registering your job by running the [`plan`][plan] command:

```shell
$ nomad plan redis.nomad
+ Job: "redis"
+ Task Group: "cache1" (1 create)
  + Task: "redis" (forces create)

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "cache1" (failed to place 1 allocation):
    * Resources exhausted on 3 nodes
    * Dimension "memory" exhausted on 3 nodes
```
Run the job to see that the allocation will be queued:

```shell
$ nomad run redis.nomad
==> Monitoring evaluation "1e54e283"
    Evaluation triggered by job "redis"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "1e54e283" finished with status "complete" but failed to place all allocations:
    Task Group "cache1" (failed to place 1 allocation):
      * Resources exhausted on 3 nodes
      * Dimension "memory" exhausted on 3 nodes
    Evaluation "1512251a" waiting for additional capacity to place remainder
```

You may also verify the allocation has been queued by now checking the status of the job:

```shell
$ nomad status redis
ID            = redis
Name          = redis
Submit Date   = 2019-06-19T03:33:17Z
Type          = service
Priority      = 80
...
Placement Failure
Task Group "cache1":
  * Resources exhausted on 3 nodes
  * Dimension "memory" exhausted on 3 nodes

Allocations
No allocations placed
```
You may remove this job now. In the next steps, we will enable service job preemption and re-deploy:

```shell
$ nomad stop -purge redis
==> Monitoring evaluation "153db6c0"
    Evaluation triggered by job "redis"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "153db6c0" finished with status "complete"
```

### Step 5: Enable Service Job Preemption

Verify the [scheduler configuration][scheduler-configuration] with the following
command:

```shell
$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
{
  "SchedulerConfig": {
    "PreemptionConfig": {
      "SystemSchedulerEnabled": true,
      "BatchSchedulerEnabled": false,
      "ServiceSchedulerEnabled": false
    },
    "CreateIndex": 5,
    "ModifyIndex": 506
  },
  "Index": 506,
  "LastContact": 0,
  "KnownLeader": true
}
```

Note that [BatchSchedulerEnabled][batch-enabled] and
[ServiceSchedulerEnabled][service-enabled] are both set to `false` by default.
Since we are preempting service jobs in this guide, we need to set
`ServiceSchedulerEnabled` to `true`. We will do this by directly interacting
with the [API][update-scheduler].

Create the following JSON payload and place it in a file named `scheduler.json`:

```json
{
  "PreemptionConfig": {
    "SystemSchedulerEnabled": true,
    "BatchSchedulerEnabled": false,
    "ServiceSchedulerEnabled": true
  }
}
```
Note that [ServiceSchedulerEnabled][service-enabled] has been set to `true`.

Run the following command to update the scheduler configuration:

```shell
$ curl -XPOST localhost:4646/v1/operator/scheduler/configuration -d @scheduler.json
```
You should now be able to check the scheduler configuration again and see that
preemption has been enabled for service jobs (output below is abbreviated):

```shell
$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
{
  "SchedulerConfig": {
    "PreemptionConfig": {
      "SystemSchedulerEnabled": true,
      "BatchSchedulerEnabled": false,
      "ServiceSchedulerEnabled": true
    },
...
}
```

### Step 6: Try Running `redis.nomad` Again

Now that you have enabled preemption on service jobs, deploying your `redis` job
should evict one of the lower priority `webserver` allocations and place it into
a queue. You can run `nomad plan` to see a preview of what will happen:

```shell
$ nomad plan redis.nomad
+ Job: "redis"
+ Task Group: "cache1" (1 create)
  + Task: "redis" (forces create)

Scheduler dry-run:
- All tasks successfully allocated.

Preemptions:

Alloc ID                              Job ID     Task Group
725d3b49-d5cf-6ba2-be3d-cb441c10a8b3  webserver  webserver
...
```

Note that Nomad is indicating one of the `webserver` allocations will be
evicted.

Now run the `redis` job:

```shell
$ nomad run redis.nomad
==> Monitoring evaluation "7ada9d9f"
    Evaluation triggered by job "redis"
    Allocation "8bfcdda3" created: node "16653ac1", group "cache1"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "7ada9d9f" finished with status "complete"
```
You can check the status of the `webserver` job and verify one of the allocations has been evicted:

```shell
$ nomad status webserver
ID            = webserver
Name          = webserver
Submit Date   = 2019-06-19T04:20:32Z
Type          = service
Priority      = 40
...
Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
webserver   1       0         2        0       1         0

Placement Failure
Task Group "webserver":
  * Resources exhausted on 3 nodes
  * Dimension "memory" exhausted on 3 nodes

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
725d3b49  16653ac1  webserver   0        evict    complete  4m10s ago  33s ago
e2f9cb3d  f765c6e8  webserver   0        run      running   4m10s ago  3m54s ago
e9d8df1b  b0700ec0  webserver   0        run      running   4m10s ago  3m51s ago
```

### Step 7: Stop the Redis Job

Stop the `redis` job and verify that evicted/queued `webserver` allocation
starts running again:

```shell
$ nomad stop redis
==> Monitoring evaluation "670922e9"
    Evaluation triggered by job "redis"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "670922e9" finished with status "complete"
```
You should now be able to see from the `webserver` status that the third allocation that was previously preempted is running again:

```shell
$ nomad status webserver
ID            = webserver
Name          = webserver
Submit Date   = 2019-06-19T04:20:32Z
Type          = service
Priority      = 40
Datacenters   = dc1
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
webserver   0       0         3        0       1         0

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
f623eb81  16653ac1  webserver   0        run      running   13s ago    7s ago
725d3b49  16653ac1  webserver   0        evict    complete  6m44s ago  3m7s ago
e2f9cb3d  f765c6e8  webserver   0        run      running   6m44s ago  6m28s ago
e9d8df1b  b0700ec0  webserver   0        run      running   6m44s ago  6m25s ago
```

## Next Steps

The process you learned in this guide can also be applied to
[batch][batch-enabled] jobs as well. Read more about preemption in Nomad
Enterprise [here][enterprise-preemption].

[batch-enabled]: /api/operator.html#batchschedulerenabled-1
[batch-job]: /docs/schedulers.html#batch
[count]: /docs/job-specification/group.html#count
[enterprise]: /docs/enterprise/index.html
[enterprise-preemption]: /docs/enterprise/index.html#preemption
[memory]: /docs/job-specification/resources.html#memory
[payload-preemption-config]: /api/operator.html#sample-payload-1
[plan]: /docs/commands/job/plan.html
[preemption]: /docs/internals/scheduling/preemption.html
[preemption-config]: /api/operator.html#preemptionconfig-1
[priority]: /docs/job-specification/job.html#priority
[service-enabled]: /api/operator.html#serviceschedulerenabled-1
[service-job]: /docs/schedulers.html#service
[step-1]: #step-1-create-a-job-with-low-priority
[system-job]: /docs/schedulers.html#system
[t2-micro]: https://aws.amazon.com/ec2/instance-types/
[update-scheduler]: /api/operator.html#update-scheduler-configuration
[scheduler-configuration]: /api/operator.html#read-scheduler-configuration