• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..30-Nov-2020-

example/H30-Nov-2020-17888

logging/H30-Nov-2020-7747

runc/H30-Nov-2020-3,7933,347

shim/H30-Nov-2020-1,248862

task/H30-Nov-2020-7,4237,128

README.mdH A D30-Nov-202010 KiB255191

binary.goH A D30-Nov-20204.6 KiB178145

bundle.goH A D30-Nov-20204 KiB157117

manager.goH A D30-Nov-20209 KiB329266

manager_unix.goH A D30-Nov-2020767 287

manager_windows.goH A D30-Nov-2020789 3110

process.goH A D30-Nov-20203.8 KiB155123

shim.goH A D30-Nov-202011.6 KiB480418

shim_unix.goH A D30-Nov-20201.3 KiB4923

shim_unix_test.goH A D30-Nov-20201.6 KiB5025

shim_windows.goH A D30-Nov-20202.3 KiB9868

shim_windows_test.goH A D30-Nov-20201.2 KiB4221

README.md

1# Runtime v2
2
3Runtime v2 introduces a first class shim API for runtime authors to integrate with containerd.
4The shim API is minimal and scoped to the execution lifecycle of a container.
5
6## Binary Naming
7
8Users specify the runtime they wish to use when creating a container.
9The runtime can also be changed via a container update.
10
11```bash
12> ctr run --runtime io.containerd.runc.v1
13```
14
15When a user specifies a runtime name, `io.containerd.runc.v1`, they will specify the name and version of the runtime.
16This will be translated by containerd into a binary name for the shim.
17
18`io.containerd.runc.v1` -> `containerd-shim-runc-v1`
19
20containerd keeps the `containerd-shim-*` prefix so that users can `ps aux | grep containerd-shim` to see running shims on their system.
21
22## Shim Authoring
23
24This section is dedicated to runtime authors wishing to build a shim.
25It will detail how the API works and different considerations when building shim.
26
27### Commands
28
29Container information is provided to a shim in two ways.
30The OCI Runtime Bundle and on the `Create` rpc request.
31
32#### `start`
33
34Each shim MUST implement a `start` subcommand.
35This command will launch new shims.
36The start command MUST accept the following flags:
37
38* `-namespace` the namespace for the container
39* `-address` the address of the containerd's main socket
40* `-publish-binary` the binary path to publish events back to containerd
41* `-id` the id of the container
42
43The start command, as well as all binary calls to the shim, has the bundle for the container set as the `cwd`.
44
45The start command MUST return an address to a shim for containerd to issue API requests for container operations.
46
47The start command can either start a new shim or return an address to an existing shim based on the shim's logic.
48
49#### `delete`
50
51Each shim MUST implement a `delete` subcommand.
52This command allows containerd to delete any container resources created, mounted, and/or run by a shim when containerd can no longer communicate over rpc.
53This happens if a shim is SIGKILL'd with a running container.
54These resources will need to be cleaned up when containerd looses the connection to a shim.
55This is also used when containerd boots and reconnects to shims.
56If a bundle is still on disk but containerd cannot connect to a shim, the delete command is invoked.
57
58The delete command MUST accept the following flags:
59
60* `-namespace` the namespace for the container
61* `-address` the address of the containerd's main socket
62* `-publish-binary` the binary path to publish events back to containerd
63* `-id` the id of the container
64* `-bundle` the path to the bundle to delete. On non-Windows platforms this will match `cwd`
65
66The delete command will be executed in the container's bundle as its `cwd` except for on the Windows platform.
67
68### Host Level Shim Configuration
69
70containerd does not provide any host level configuration for shims via the API.
71If a shim needs configuration from the user with host level information across all instances, a shim specific configuration file can be setup.
72
73### Container Level Shim Configuration
74
75On the create request, there is a generic `*protobuf.Any` that allows a user to specify container level configuration for the shim.
76
77```proto
78message CreateTaskRequest {
79	string id = 1;
80	...
81	google.protobuf.Any options = 10;
82}
83```
84
85A shim author can create their own protobuf message for configuration and clients can import and provide this information is needed.
86
87### I/O
88
89I/O for a container is provided by the client to the shim via fifo on Linux, named pipes on Windows, or log files on disk.
90The paths to these files are provided on the `Create` rpc for the initial creation and on the `Exec` rpc for additional processes.
91
92```proto
93message CreateTaskRequest {
94	string id = 1;
95	bool terminal = 4;
96	string stdin = 5;
97	string stdout = 6;
98	string stderr = 7;
99}
100```
101
102```proto
103message ExecProcessRequest {
104	string id = 1;
105	string exec_id = 2;
106	bool terminal = 3;
107	string stdin = 4;
108	string stdout = 5;
109	string stderr = 6;
110}
111```
112
113Containers that are to be launched with an interactive terminal will have the `terminal` field set to `true`, data is still copied over the files(fifos,pipes) in the same way as non interactive containers.
114
115### Root Filesystems
116
117The root filesystem for the containers is provided by on the `Create` rpc.
118Shims are responsible for managing the lifecycle of the filesystem mount during the lifecycle of a container.
119
120```proto
121message CreateTaskRequest {
122	string id = 1;
123	string bundle = 2;
124	repeated containerd.types.Mount rootfs = 3;
125	...
126}
127```
128
129The mount protobuf message is:
130
131```proto
132message Mount {
133	// Type defines the nature of the mount.
134	string type = 1;
135	// Source specifies the name of the mount. Depending on mount type, this
136	// may be a volume name or a host path, or even ignored.
137	string source = 2;
138	// Target path in container
139	string target = 3;
140	// Options specifies zero or more fstab style mount options.
141	repeated string options = 4;
142}
143```
144
145Shims are responsible for mounting the filesystem into the `rootfs/` directory of the bundle.
146Shims are also responsible for unmounting of the filesystem.
147During a `delete` binary call, the shim MUST ensure that filesystem is also unmounted.
148Filesystems are provided by the containerd snapshotters.
149
150### Events
151
152The Runtime v2 supports an async event model. In order for the an upstream caller (such as Docker) to get these events in the correct order a Runtime v2 shim MUST implement the following events where `Compliance=MUST`. This avoids race conditions between the shim and shim client where for example a call to `Start` can signal a `TaskExitEventTopic` before even returning the results from the `Start` call. With these guarantees of a Runtime v2 shim a call to `Start` is required to have published the async event `TaskStartEventTopic` before the shim can publish the `TaskExitEventTopic`.
153
154#### Tasks
155
156| Topic | Compliance | Description |
157| ----- | ---------- | ----------- |
158| `runtime.TaskCreateEventTopic`       | MUST                                                                          | When a task is successfully created |
159| `runtime.TaskStartEventTopic`        | MUST (follow `TaskCreateEventTopic`)                                          | When a task is successfully started |
160| `runtime.TaskExitEventTopic`         | MUST (follow `TaskStartEventTopic`)                                           | When a task exits expected or unexpected |
161| `runtime.TaskDeleteEventTopic`       | MUST (follow `TaskExitEventTopic` or `TaskCreateEventTopic` if never started) | When a task is removed from a shim |
162| `runtime.TaskPausedEventTopic`       | SHOULD                                                                        | When a task is successfully paused |
163| `runtime.TaskResumedEventTopic`      | SHOULD (follow `TaskPausedEventTopic`)                                        | When a task is successfully resumed |
164| `runtime.TaskCheckpointedEventTopic` | SHOULD                                                                        | When a task is checkpointed |
165| `runtime.TaskOOMEventTopic`          | SHOULD                                                                        | If the shim collects Out of Memory events |
166
167#### Execs
168
169| Topic | Compliance | Description |
170| ----- | ---------- | ----------- |
171| `runtime.TaskExecAddedEventTopic`   | MUST (follow `TaskCreateEventTopic` )     | When an exec is successfully added |
172| `runtime.TaskExecStartedEventTopic` | MUST (follow `TaskExecAddedEventTopic`)   | When an exec is successfully started |
173| `runtime.TaskExitEventTopic`        | MUST (follow `TaskExecStartedEventTopic`) | When an exec (other than the init exec) exits expected or unexpected |
174| `runtime.TaskDeleteEventTopic`      | SHOULD (follow `TaskExitEventTopic` or `TaskExecAddedEventTopic` if never started) | When an exec is removed from a shim |
175
176#### Logging
177
178Shims may support pluggable logging via STDIO URIs.
179Current supported schemes for logging are:
180
181* fifo - Linux
182* binary - Linux & Windows
183* file - Linux & Windows
184* npipe - Windows
185
186Binary logging has the ability to forward a container's STDIO to an external binary for consumption.
187A sample logging driver that forwards the container's STDOUT and STDERR to `journald` is:
188
189```go
190package main
191
192import (
193	"bufio"
194	"context"
195	"fmt"
196	"io"
197	"sync"
198
199	"github.com/containerd/containerd/runtime/v2/logging"
200	"github.com/coreos/go-systemd/journal"
201)
202
203func main() {
204	logging.Run(log)
205}
206
207func log(ctx context.Context, config *logging.Config, ready func() error) error {
208	// construct any log metadata for the container
209	vars := map[string]string{
210		"SYSLOG_IDENTIFIER": fmt.Sprintf("%s:%s", config.Namespace, config.ID),
211	}
212	var wg sync.WaitGroup
213	wg.Add(2)
214	// forward both stdout and stderr to the journal
215	go copy(&wg, config.Stdout, journal.PriInfo, vars)
216	go copy(&wg, config.Stderr, journal.PriErr, vars)
217
218	// signal that we are ready and setup for the container to be started
219	if err := ready(); err != nil {
220		return err
221	}
222	wg.Wait()
223	return nil
224}
225
226func copy(wg *sync.WaitGroup, r io.Reader, pri journal.Priority, vars map[string]string) {
227	defer wg.Done()
228	s := bufio.NewScanner(r)
229	for s.Scan() {
230		journal.Send(s.Text(), pri, vars)
231	}
232}
233```
234
235### Other
236
237#### Unsupported rpcs
238
239If a shim does not or cannot implement an rpc call, it MUST return a `github.com/containerd/containerd/errdefs.ErrNotImplemented` error.
240
241#### Debugging and Shim Logs
242
243A fifo on unix or named pipe on Windows will be provided to the shim.
244It can be located inside the `cwd` of the shim named "log".
245The shims can use the existing `github.com/containerd/containerd/log` package to log debug messages.
246Messages will automatically be output in the containerd's daemon logs with the correct fields and runtime set.
247
248#### ttrpc
249
250[ttrpc](https://github.com/containerd/ttrpc) is the only currently supported protocol for shims.
251It works with standard protobufs and GRPC services as well as generating clients.
252The only difference between grpc and ttrpc is the wire protocol.
253ttrpc removes the http stack in order to save memory and binary size to keep shims small.
254It is recommended to use ttrpc in your shim but grpc support is also in development.
255