• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..27-Aug-2018-

agent/H27-Aug-2018-3,2942,317

api/H27-Aug-2018-74,35768,299

ca/H27-Aug-2018-4,7033,280

connectionbroker/H27-Aug-2018-12482

identity/H27-Aug-2018-7124

ioutils/H27-Aug-2018-4135

log/H27-Aug-2018-12869

manager/H27-Aug-2018-25,12718,438

node/H27-Aug-2018-1,357922

protobuf/H27-Aug-2018-1,3141,217

remotes/H27-Aug-2018-204109

template/H27-Aug-2018-517377

watch/H27-Aug-2018-453293

xnet/H27-Aug-2018-5330

LICENSEH A D27-Aug-201811.1 KiB202169

README.mdH A D27-Aug-201816.6 KiB335244

vendor.confH A D27-Aug-20183.7 KiB6851

README.md

1# [SwarmKit](https://github.com/docker/swarmkit)
2
3[![GoDoc](https://godoc.org/github.com/docker/swarmkit?status.svg)](https://godoc.org/github.com/docker/swarmkit)
4[![Circle CI](https://circleci.com/gh/docker/swarmkit.svg?style=shield&circle-token=a7bf494e28963703a59de71cf19b73ad546058a7)](https://circleci.com/gh/docker/swarmkit)
5[![codecov.io](https://codecov.io/github/docker/swarmkit/coverage.svg?branch=master&token=LqD1dzTjsN)](https://codecov.io/github/docker/swarmkit?branch=master)
6[![Badge Badge](http://doyouevenbadge.com/github.com/docker/swarmkit)](http://doyouevenbadge.com/report/github.com/docker/swarmkit)
7
8*SwarmKit* is a toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
9
10Its main benefits are:
11
12-   **Distributed**: *SwarmKit* uses the [Raft Consensus Algorithm](https://raft.github.io/) in order to coordinate and does not rely on a single point of failure to perform decisions.
13-   **Secure**: Node communication and membership within a *Swarm* are secure out of the box. *SwarmKit* uses mutual TLS for node *authentication*, *role authorization* and *transport encryption*, automating both certificate issuance and rotation.
14-   **Simple**: *SwarmKit* is operationally simple and minimizes infrastructure dependencies. It does not need an external database to operate.
15
16## Overview
17
18Machines running *SwarmKit* can be grouped together in order to form a *Swarm*, coordinating tasks with each other.
19Once a machine joins, it becomes a *Swarm Node*. Nodes can either be *worker* nodes or *manager* nodes.
20
21-   **Worker Nodes** are responsible for running Tasks using an *Executor*. *SwarmKit* comes with a default *Docker Container Executor* that can be easily swapped out.
22-   **Manager Nodes** on the other hand accept specifications from the user and are responsible for reconciling the desired state with the actual cluster state.
23
24An operator can dynamically update a Node's role by promoting a Worker to Manager or demoting a Manager to Worker.
25
26*Tasks* are organized in *Services*. A service is a higher level abstraction that allows the user to declare the desired state of a group of tasks.
27Services define what type of task should be created as well as how to execute them (e.g. run this many replicas at all times) and how to update them (e.g. rolling updates).
28
29## Features
30
31Some of *SwarmKit*'s main features are:
32
33-   **Orchestration**
34
35    -   **Desired State Reconciliation**: *SwarmKit* constantly compares the desired state against the current cluster state and reconciles the two if necessary. For instance, if a node fails, *SwarmKit* reschedules its tasks onto a different node.
36
37    -   **Service Types**: There are different types of services. The project currently ships with two of them out of the box
38
39        -   **Replicated Services** are scaled to the desired number of replicas.
40        -   **Global Services** run one task on every available node in the cluster.
41
42    -   **Configurable Updates**: At any time, you can change the value of one or more fields for a service. After you make the update, *SwarmKit* reconciles the desired state by ensuring all tasks are using the desired settings. By default, it performs a lockstep update - that is, update all tasks at the same time. This can be configured through different knobs:
43
44        -   **Parallelism** defines how many updates can be performed at the same time.
45        -   **Delay** sets the minimum delay between updates. *SwarmKit* will start by shutting down the previous task, bring up a new one, wait for it to transition to the *RUNNING* state *then* wait for the additional configured delay. Finally, it will move onto other tasks.
46
47    -   **Restart Policies**: The orchestration layer monitors tasks and reacts to failures based on the specified policy. The operator can define restart conditions, delays and limits (maximum number of attempts in a given time window). *SwarmKit* can decide to restart a task on a different machine. This means that faulty nodes will gradually be drained of their tasks.
48
49-   **Scheduling**
50
51    -   **Resource Awareness**: *SwarmKit* is aware of resources available on nodes and will place tasks accordingly.
52    -   **Constraints**: Operators can limit the set of nodes where a task can be scheduled by defining constraint expressions. Multiple constraints find nodes that satisfy every expression, i.e., an `AND` match. Constraints can match node attributes in the following table. Note that `engine.labels` are collected from Docker Engine with information like operating system, drivers, etc. `node.labels` are added by cluster administrators for operational purpose. For example, some nodes have security compliant labels to run tasks with compliant requirements.
53
54        | node attribute | matches | example |
55        |:------------- |:-------------| :-------------|
56        | node.id | node's ID | `node.id == 2ivku8v2gvtg4`|
57        | node.hostname | node's hostname | `node.hostname != node-2`|
58        | node.ip | node's IP address | `node.ip != 172.19.17.0/24`|
59        | node.role |  node's manager or worker role | `node.role == manager`|
60        | node.platform.os |  node's operating system | `node.platform.os == linux`|
61        | node.platform.arch |  node's architecture | `node.platform.arch == x86_64`|
62        | node.labels | node's labels added by cluster admins | `node.labels.security == high`|
63        | engine.labels | Docker Engine's labels | `engine.labels.operatingsystem == ubuntu 14.04`|
64
65    -   **Strategies**: The project currently ships with a *spread strategy* which will attempt to schedule tasks on the least loaded
66    nodes, provided they meet the constraints and resource requirements.
67
68-   **Cluster Management**
69
70    -   **State Store**: Manager nodes maintain a strongly consistent, replicated (Raft based) and extremely fast (in-memory reads) view of the cluster which allows them to make quick scheduling decisions while tolerating failures.
71    -   **Topology Management**: Node roles (*Worker* / *Manager*) can be dynamically changed through API/CLI calls.
72    -   **Node Management**: An operator can alter the desired availability of a node: Setting it to *Paused* will prevent any further tasks from being scheduled to it while *Drained* will have the same effect while also re-scheduling its tasks somewhere else (mostly for maintenance scenarios).
73
74-   **Security**
75
76    -   **Mutual TLS**: All nodes communicate with each other using mutual *TLS*. Swarm managers act as a *Root Certificate Authority*, issuing certificates to new nodes.
77    -   **Token-based Join**: All nodes require a cryptographic token to join the swarm, which defines that node's role. Tokens can be rotated as often as desired without affecting already-joined nodes.
78    -   **Certificate Rotation**: TLS Certificates are rotated and reloaded transparently on every node, allowing a user to set how frequently rotation should happen (the current default is 3 months, the minimum is 30 minutes).
79
80## Build
81
82Requirements:
83
84-   Go 1.6 or higher
85-   A [working golang](https://golang.org/doc/code.html) environment
86-   [Protobuf 3.x or higher](https://developers.google.com/protocol-buffers/docs/downloads) to regenerate protocol buffer files (e.g. using `make generate`)
87
88*SwarmKit* is built in Go and leverages a standard project structure to work well with Go tooling.
89If you are new to Go, please see [BUILDING.md](BUILDING.md) for a more detailed guide.
90
91Once you have *SwarmKit* checked out in your `$GOPATH`, the `Makefile` can be used for common tasks.
92
93From the project root directory, run the following to build `swarmd` and `swarmctl`:
94
95```sh
96$ make binaries
97```
98
99## Test
100
101Before running tests for the first time, setup the tooling:
102
103```sh
104$ make setup
105```
106
107Then run:
108
109```sh
110$ make all
111```
112
113## Usage Examples
114
115### Setting up a Swarm
116
117These instructions assume that `swarmd` and `swarmctl` are in your PATH.
118
119(Before starting, make sure `/tmp/node-N` don't exist)
120
121Initialize the first node:
122
123```sh
124$ swarmd -d /tmp/node-1 --listen-control-api /tmp/node-1/swarm.sock --hostname node-1
125```
126
127Before joining cluster, the token should be fetched:
128
129```
130$ export SWARM_SOCKET=/tmp/node-1/swarm.sock
131$ swarmctl cluster inspect default
132ID          : 87d2ecpg12dfonxp3g562fru1
133Name        : default
134Orchestration settings:
135  Task history entries: 5
136Dispatcher settings:
137  Dispatcher heartbeat period: 5s
138Certificate Authority settings:
139  Certificate Validity Duration: 2160h0m0s
140  Join Tokens:
141    Worker: SWMTKN-1-3vi7ajem0jed8guusgvyl98nfg18ibg4pclify6wzac6ucrhg3-0117z3s2ytr6egmmnlr6gd37n
142    Manager: SWMTKN-1-3vi7ajem0jed8guusgvyl98nfg18ibg4pclify6wzac6ucrhg3-d1ohk84br3ph0njyexw0wdagx
143```
144
145In two additional terminals, join two nodes. From the example below, replace `127.0.0.1:4242`
146with the address of the first node, and use the `<Worker Token>` acquired above.
147In this example, the `<Worker Token>` is `SWMTKN-1-3vi7ajem0jed8guusgvyl98nfg18ibg4pclify6wzac6ucrhg3-0117z3s2ytr6egmmnlr6gd37n`.
148If the joining nodes run on the same host as `node-1`, select a different remote
149listening port, e.g., `--listen-remote-api 127.0.0.1:4343`.
150
151```sh
152$ swarmd -d /tmp/node-2 --hostname node-2 --join-addr 127.0.0.1:4242 --join-token <Worker Token>
153$ swarmd -d /tmp/node-3 --hostname node-3 --join-addr 127.0.0.1:4242 --join-token <Worker Token>
154```
155
156If joining as a manager, also specify the listen-control-api.
157
158```sh
159$ swarmd -d /tmp/node-4 --hostname node-4 --join-addr 127.0.0.1:4242 --join-token <Manager Token> --listen-control-api /tmp/node-4/swarm.sock --listen-remote-api 127.0.0.1:4245
160```
161
162In a fourth terminal, use `swarmctl` to explore and control the cluster. Before
163running `swarmctl`, set the `SWARM_SOCKET` environment variable to the path of the
164manager socket that was specified in `--listen-control-api` when starting the
165manager.
166
167To list nodes:
168
169```
170$ export SWARM_SOCKET=/tmp/node-1/swarm.sock
171$ swarmctl node ls
172ID                         Name    Membership  Status  Availability  Manager Status
173--                         ----    ----------  ------  ------------  --------------
1743x12fpoi36eujbdkgdnbvbi6r  node-2  ACCEPTED    READY   ACTIVE
1754spl3tyipofoa2iwqgabsdcve  node-1  ACCEPTED    READY   ACTIVE        REACHABLE *
176dknwk1uqxhnyyujq66ho0h54t  node-3  ACCEPTED    READY   ACTIVE
177zw3rwfawdasdewfq66ho34eaw  node-4  ACCEPTED    READY   ACTIVE        REACHABLE
178
179
180```
181
182### Creating Services
183
184Start a *redis* service:
185
186```
187$ swarmctl service create --name redis --image redis:3.0.5
18808ecg7vc7cbf9k57qs722n2le
189```
190
191List the running services:
192
193```
194$ swarmctl service ls
195ID                         Name   Image        Replicas
196--                         ----   -----        --------
19708ecg7vc7cbf9k57qs722n2le  redis  redis:3.0.5  1/1
198```
199
200Inspect the service:
201
202```
203$ swarmctl service inspect redis
204ID                : 08ecg7vc7cbf9k57qs722n2le
205Name              : redis
206Replicas          : 1/1
207Template
208 Container
209  Image           : redis:3.0.5
210
211Task ID                      Service    Slot    Image          Desired State    Last State                Node
212-------                      -------    ----    -----          -------------    ----------                ----
2130xk1ir8wr85lbs8sqg0ug03vr    redis      1       redis:3.0.5    RUNNING          RUNNING 1 minutes ago    node-1
214```
215
216### Updating Services
217
218You can update any attribute of a service.
219
220For example, you can scale the service by changing the instance count:
221
222```
223$ swarmctl service update redis --replicas 6
22408ecg7vc7cbf9k57qs722n2le
225
226$ swarmctl service inspect redis
227ID                : 08ecg7vc7cbf9k57qs722n2le
228Name              : redis
229Replicas          : 6/6
230Template
231 Container
232  Image           : redis:3.0.5
233
234Task ID                      Service    Slot    Image          Desired State    Last State                Node
235-------                      -------    ----    -----          -------------    ----------                ----
2360xk1ir8wr85lbs8sqg0ug03vr    redis      1       redis:3.0.5    RUNNING          RUNNING 3 minutes ago    node-1
23725m48y9fevrnh77til1d09vqq    redis      2       redis:3.0.5    RUNNING          RUNNING 28 seconds ago    node-3
23842vwc8z93c884anjgpkiatnx6    redis      3       redis:3.0.5    RUNNING          RUNNING 28 seconds ago    node-2
239d41f3wnf9dex3mk6jfqp4tdjw    redis      4       redis:3.0.5    RUNNING          RUNNING 28 seconds ago    node-2
24066lefnooz63met6yfrsk6myvg    redis      5       redis:3.0.5    RUNNING          RUNNING 28 seconds ago    node-1
2413a2sawtoyk19wqhmtuiq7z9pt    redis      6       redis:3.0.5    RUNNING          RUNNING 28 seconds ago    node-3
242```
243
244Changing *replicas* from *1* to *6* forced *SwarmKit* to create *5* additional Tasks in order to
245comply with the desired state.
246
247Every other field can be changed as well, such as image, args, env, ...
248
249Let's change the image from *redis:3.0.5* to *redis:3.0.6* (e.g. upgrade):
250
251```
252$ swarmctl service update redis --image redis:3.0.6
25308ecg7vc7cbf9k57qs722n2le
254
255$ swarmctl service inspect redis
256ID                   : 08ecg7vc7cbf9k57qs722n2le
257Name                 : redis
258Replicas             : 6/6
259Update Status
260 State               : COMPLETED
261 Started             : 3 minutes ago
262 Completed           : 1 minute ago
263 Message             : update completed
264Template
265 Container
266  Image              : redis:3.0.6
267
268Task ID                      Service    Slot    Image          Desired State    Last State              Node
269-------                      -------    ----    -----          -------------    ----------              ----
2700udsjss61lmwz52pke5hd107g    redis      1       redis:3.0.6    RUNNING          RUNNING 1 minute ago    node-3
271b8o394v840thk10tamfqlwztb    redis      2       redis:3.0.6    RUNNING          RUNNING 1 minute ago    node-1
272efw7j66xqpoj3cn3zjkdrwff7    redis      3       redis:3.0.6    RUNNING          RUNNING 1 minute ago    node-3
2738ajeipzvxucs3776e4z8gemey    redis      4       redis:3.0.6    RUNNING          RUNNING 1 minute ago    node-2
274f05f2lbqzk9fh4kstwpulygvu    redis      5       redis:3.0.6    RUNNING          RUNNING 1 minute ago    node-2
2757sbpoy82deq7hu3q9cnucfin6    redis      6       redis:3.0.6    RUNNING          RUNNING 1 minute ago    node-1
276```
277
278By default, all tasks are updated at the same time.
279
280This behavior can be changed by defining update options.
281
282For instance, in order to update tasks 2 at a time and wait at least 10 seconds between updates:
283
284```
285$ swarmctl service update redis --image redis:3.0.7 --update-parallelism 2 --update-delay 10s
286$ watch -n1 "swarmctl service inspect redis"  # watch the update
287```
288
289This will update 2 tasks, wait for them to become *RUNNING*, then wait an additional 10 seconds before moving to other tasks.
290
291Update options can be set at service creation and updated later on. If an update command doesn't specify update options, the last set of options will be used.
292
293### Node Management
294
295*SwarmKit* monitors node health. In the case of node failures, it re-schedules tasks to other nodes.
296
297An operator can manually define the *Availability* of a node and can *Pause* and *Drain* nodes.
298
299Let's put `node-1` into maintenance mode:
300
301```
302$ swarmctl node drain node-1
303
304$ swarmctl node ls
305ID                         Name    Membership  Status  Availability  Manager Status
306--                         ----    ----------  ------  ------------  --------------
3073x12fpoi36eujbdkgdnbvbi6r  node-2  ACCEPTED    READY   ACTIVE
3084spl3tyipofoa2iwqgabsdcve  node-1  ACCEPTED    READY   DRAIN         REACHABLE *
309dknwk1uqxhnyyujq66ho0h54t  node-3  ACCEPTED    READY   ACTIVE
310
311$ swarmctl service inspect redis
312ID                   : 08ecg7vc7cbf9k57qs722n2le
313Name                 : redis
314Replicas             : 6/6
315Update Status
316 State               : COMPLETED
317 Started             : 2 minutes ago
318 Completed           : 1 minute ago
319 Message             : update completed
320Template
321 Container
322  Image              : redis:3.0.7
323
324Task ID                      Service    Slot    Image          Desired State    Last State                Node
325-------                      -------    ----    -----          -------------    ----------                ----
3268uy2fy8dqbwmlvw5iya802tj0    redis      1       redis:3.0.7    RUNNING          RUNNING 23 seconds ago    node-2
3277h9lgvidypcr7q1k3lfgohb42    redis      2       redis:3.0.7    RUNNING          RUNNING 2 minutes ago     node-3
328ae4dl0chk3gtwm1100t5yeged    redis      3       redis:3.0.7    RUNNING          RUNNING 23 seconds ago    node-3
3299fz7fxbg0igypstwliyameobs    redis      4       redis:3.0.7    RUNNING          RUNNING 2 minutes ago     node-3
330drzndxnjz3c8iujdewzaplgr6    redis      5       redis:3.0.7    RUNNING          RUNNING 23 seconds ago    node-2
3317rcgciqhs4239quraw7evttyf    redis      6       redis:3.0.7    RUNNING          RUNNING 2 minutes ago     node-2
332```
333
334As you can see, every Task running on `node-1` was rebalanced to either `node-2` or `node-3` by the reconciliation loop.
335