1## DOCKER CP DESIGN DOC
2
3### Initial Rundown of docker cp
4
5Docker cp has many possible scenarios that we will have to account for and they will be described in this section.
6
7We will be supporting docker cp in it's entirety. This means we will need to copy files to and from a container regardless of whether it is on or off. This will lead to 4 scenarios.
8
91. The container is *on*, and we want to copy data *from* it.
10
112. The container is *on*, and we want to copy data *to* it.
12
133. The container is *off*, and we want to copy data *from* it.(will constitute multiple calls from the personality)
14
154. The container is *off*, an we want to copy data *to* it.(will constitute multiple calls from the personality)
16
17According to the docker remote api the target for a copy must be in `identity (no compression), gzip, bzip2, xz`. As the docker cli handles packaging a *to* request we can expect on of these formats. We must package our *from* response in one of these formats as well.
18
19 _NOTE_: docker has discussed copying between containers in the past. This will be yet another set of 4 possibilities and endpoints. Something that we can add to this plan later if it is needed in the future.
20
21Here is a more complex look at the call operation/state flow.
22
23|Operation| state| volumes | scenario |
24|---|---|---|---|
25|  Export | ContainerRunning | No | Single call from personality to portlayer. Guest Tools will be used on the target rather than `Read` |
26| Export | ContainerRunning | Yes |  Single call from personality to portlayer. Guest Tools will be used on the target rather than `Read` |
27| Export | ContainerStopped | No | Single call from personality to portlayer. `Read` is called based on the supplied filespec. |
28| Export | ContainerStopped | Yes | Multiple calls from personality to portlayer. One for the r/w layer, and n more calls where n is the number of volumes. `Read` is invoked n+1 times based on each call. Powerstatus likely won't matter here because we mount non-persistent disks|
29| Import |  ContainerRunning | No | Single call from personality to portlayer. Guest tools will be used on the target rather than a `Write` |
30|  Import | ContainerRunning | Yes |  Single call from personality to portlayer. Guest tools will be used on the target rather than a `Write` |
31| Import | ContainerStopped |  No | Single call from personality to portlayer. `Write` will be used to mount the r/w layer and then write the contents based on the supplied filespec  |
32| Import | ContainerStopped | Yes |  Multiple calls from personality to portlayer. One for the r/w layer, and n more calls where n is the number of volumes. `Write` is invoked n+1 times based on each call from the personality. If the container is started during this time we cannot mount the volumes or the r/w layer and we will report a failure requesting the user to try again. Alternatively, we block start events until operation completion. |
33
34
35### Personality to Portlayer Communication
36
37Since all 4 of these must be supported we will require both distinctions for when the container is off and which endpoint to call to distinguish between *to* and *from* operations.
38
39The proposed solution is to have two endpoints designed for the portlayer api swagger server. Both endpoints will exist along the same request path as two different verbs. The request path should be as such :
40
41The portlayer functional target for `ExportArchive` and `ImportArchive` will need to be called multiple times from the personality for each device that constitutes the full filesystem of the container. A `PathSpec` struct will be used to mark inclusion and exclusion paths. We will also need to know when to strip the paths provided to the portlayer since the view of a volume would be `/` while in a running container the target path could be `/path/to/volume/` where the path before the final `/` exists on the container r/w layer(or another volume, but we are not worrying about that now.) Because there will be multiple calls for the potential Import of the archive(write calls) we will need to have a way to pass or reference the stream of information to the portlayer multiple times. This may constitute another input for the function header in order to identify the stream that we care about.
42
43the portlayer functional targets will look as such:
44
45__ExportArchive__
46
47```
48// ID : container/vm target ID mainly for power state checking in the portlayer to determine Exports logical behavior.
49// deviceID : target deviceID for copying
50// fileSpec : just a map[string]string of keyed paths to exclusion and strip operations determined by the desired behavior of the read
51//
52func Read(ID, deviceID, filespec map[string]string) (io.reader, error)
53
54// no need for the data bool since we know that this operation involves data. we can set that as always true on the portlayer side of things.
55```
56
57
58__ImportArchive__
59```
60// StoreID : ID of the store where the target device resides be it image store, volume store, or something else in the future.
61// ID : target container/vm for the portlayer to check powerstate to determine the logical approach of the ImportArchive function
62// deviceID : device id found in the targeted store that is to receive the archive
63// fileSpec : just a map[string]string of keyed paths to exclusion and strip operations determined by the desired behavior of the write
64// tarStream : the actual tar stream to be imported to the target
65func Write(ID, deviceID string, fileSpec map[string]string, tarStream io.Writer) error
66```
67
68Note: this call will involve multiple docker endpoints, Stat is needed with we plan to support the `-L` functionality. We will also need the interaction piece for streaming tar archive's to the portlayer with the same tar stream from docker being streamed to multiple calls(hopefully we might have a way to not stream the same data several times over the network).
69
70## Personality design
71
72
73### Copy Operation
74
75The personality has Three endpoints that are called depending on the to/from situation above(5 if you count the two deprecated calls that were made previously.). The portlayer will divide the behavior of to/from as two separate endpoints. These endpoints will behave in an `Export/Import` behavior. Aside from the Import/Export behavior the docker cli also calls `ContainerStatPath` when determining whether to follow symlinks via the `-L` option. We will need this endpoint implemented all the way through to the portlayer if we want to support following symlinks.
76
77Error codes expected by docker :
78
79```
80200
81
82400
83
84403 - in the event of a readonly filesystem
85
86404
87
88409
89
90500
91
92```
93
94Docker does not do much in the way of string checking as far as the path is concerned. This is likely not a big concern for the personality as the actual copy command will fail if a bad path string is provided.
95
96For the multiple calls to the portlayer we will need to assemble Pathspecs which detail inclusion, exclusion, rebase,and strip configurations for the tar stream on both reads and writes. Below will be some scenarios and what the path spec should look like for those scenarios.
97
98Exclude the tether path. Exclude any paths that docker does not allow for a copy. the filespec map with a serialized struct or keys for `exclusion`, `strip`.
99
100examples of a pathspec(please note that these are simple scenarios for the time being):
101```
102For a CopyTo to path "/volumes" :
103container has one volume mounted at /volumes/data
104
105this will invoke two write calls.
106
107first pathspec:
108
109// this pathspec would be empty since you would be writing against the r/w of the containerFS and the starting path would be '/' of the containerFS. No stripping of the write headers necessary.
110spec.rebase="volumes"
111spec.strip = ""
112spec.includes = map[string]string {"": struct{}{}}
113spec.excludes = map[string]string {"volumes/data": struct{}{}}
114
115Second Pathspec:
116spec.Strip = "volumes/data/"  // stripped since the volume will be mounted and the starting path will be "/"
117spec.Rebase = ""
118spec.includes = map[string]string {"": struct{}{}}
119}
120```
121
122```
123For a Copy from path "/volumes/" :
124container has one volume mounted at /volumes/data
125
126this will invoke two read calls
127
128// this pathspec would be empty since you would be writing against the r/w of the containerFS and the starting path would be '/' of the containerFS. No rebase necessary for the headers.
129spec.Rebase = "volumes"
130spec.Strip = "volumes"
131// in this case strip and rebase are the same since this is a 1st level directory e.g for a target of /volumes/data rebase would be "data" and strip would be "volumes/data"
132spec.include = map[string]string {"volumes":struct{}{}}
133spec.Exclude = map[string]string {"": struct{}{}}
134
135// this pathspec will be for the volume filesystem
136spec.Rebase = "volumes"
137spec.Strip = ""
138spec.include = map[string]string {"":struct{}{}}
139
140```
141
142
143NOTE: use TarAppender in docker/pkg/archive to possibly merge the different tar streams during an ExportArchive.
144
145### Stat Operation
146
147
148
149## Portlayer Design
150
151
152
153 There will be two situations to be concerned with in the portlayer. When the container is *on* and when the container is *off*.
154
155 in the event of the *off* scenario we will convey the requested operations to the `Read`/`Write` calls in the storage portion of the portlayer. The results of those calls should be returned to the user.
156
157 Regardless of whether the container is on or off. Some investigation has shown that it works in the scenario of both containers being on, off, and one on and one off. We will need to architect a way for this to work for us. If the target is a volume that is attached to a container we will need to understand what is needed in the case of a vmdk volume target. This behavior should not be an issue for the nfs volume store targets.
158
159 The portlayer should behave at a basic level as such:
160
161 ```
162 1. look up appropriate storage object
1632. check object usage
1643. mount object or initiate copy based on (2)
1653b. check for error in initialization of copy due to power state change, repeat (2) if found
1664. mounted disk prevents incompatible operations, or, tether copy in progress squashes in guest shutdown
167 ```
168
169 some additiona notes surrounding the portlayer and swagger endpoint design:
170
171
172use query string to pass FilterSpec so that it's not needed to be packaged in a custom format in body along with the tar
173this assumes that callers have some knowledge of mount topology and correctly populate the FilterSpec to avoid recursing into mounts that will be directly addressed by a later call.
174because it's possible for the various storage elements of a container to be in different states (e.g. container is not running so r/w and volumeA are not in use, but volumeB from the container is now used by container X) we require that separate calls be made to Import/Export for each of the distinct storage elements. This allows those calls to be routed appropriately based on current owners and usage of the element in question.
175
176
177### Stat Operation
178
179We will need a `stat` endpoint which will be used to target devices with a filesystem that can return fileinfo back to the caller. This will need to be tolerant of the power status of the target, if the target container/vm is on then we can use guest tools. Otherwise, mounting the appropriate device for the stat is necessary. Like the Read/Write calls we should expect a store ID and a device ID in addition to the target of the stat operation. if the compute is not active then we should mount the target device specified for the stat.
180
181```
182underlying filesystem stat :
183// storename : this is the store that the target device resides on
184// deviceID : this is the target device for the stat operation
185// target : is the filesys
186//
187// stat will mount the target and stat the filesystem target in order to obtain the requested filesystem info.
188func Stat(storename string, deviceID string, spec archive.FilterSpec) (FileInfo, error)
189
190// it is the responsibility of the caller to determine the status of this device. If it is already mounted or in use the caller must determine the action to be taken.
191```
192
193
194### Portlayer Import/Export Behavior for writing and reading to storage devices.
195The following are the core interfaces that allow hiding of the current usage state of a given storage element such as a volume:
196
197```
198When a container is online and a copy is attempted guest tools will be utilized to move the files onto the container. Below are the defined interfaces that will be used for reading and writing to the devices. Note that there can be multiple datasources and datasinks for the same device backing. This is due to online and offline behavior.
199
200A docker stop should be blocked until the copy.(currently this is not the case)
201
202// DataSource defines the methods for exporting data from a specific storage element as a tar stream
203type DataSource interface {
204	// Close releases all resources associated with this source. Shared resources should be reference counted.
205	io.Closer
206
207	// Export performs an export of the specified files, returning the data as a tar stream. This is single use; once
208	// the export has completed it should not be assumed that the source remains functional.
209	//
210	// spec: specifies which files will be included/excluded in the export and allows for path rebasing/stripping
211	// data: if true the actual file data is included, if false only the file headers are present
212	Export(op trace.Operation, spec *archive.FilterSpec, data bool) (io.ReadCloser, error)
213
214	// Source returns the mechanism by which the data source is accessed
215	// Examples:
216	//     vmdk mounted locally: *os.File
217	//     nfs volume:  		 XDR-client
218	//     via guesttools:  	 toolbox client
219	Source() interface{}
220}
221
222// DataSink defines the methods for importing data to a specific storage element from a tar stream
223type DataSink interface {
224	// Close releases all resources associated with this sink. Shared resources should be reference counted.
225	io.Closer
226
227	// Import performs an import of the tar stream to the source held by this DataSink.  This is single use; once
228	// the export has completed it should not be assumed that the sink remains functional.
229	//
230	// spec: specifies which files will be included/excluded in the import and allows for path rebasing/stripping
231	// tarStream: the tar stream to from which to import data
232	Import(op trace.Operation, spec *archive.FilterSpec, tarStream io.ReadCloser) error
233
234	// Sink returns the mechanism by which the data sink is accessed
235	// Examples:
236	//     vmdk mounted locally: *os.File
237	//     nfs volume:  		 XDR-client
238	//     via guesttools:  	 toolbox client
239	Sink() interface{}
240}
241
242// Importer defines the methods needed to write data into a storage element. This should be implemented by the various
243// store types.
244type Importer interface {
245	// Import allows direct construction and invocation of a data sink for the specified ID.
246	Import(op trace.Operation, id string, spec *archive.FilterSpec, tarStream io.ReadCloser) error
247
248	// NewDataSink constructs a data sink for the specified ID within the context of the Importer. This is a single
249	// use sink which may hold resources until Closed.
250	NewDataSink(op trace.Operation, id string) (DataSink, error)
251}
252
253// Exporter defines the methods needed to read data from a storage element, optionally diff with an ancestor. This
254// shoiuld be implemented by the various store types.
255type Exporter interface {
256	// Export allows direct construction and invocation of a data source for the specified ID.
257	Export(op trace.Operation, id, ancestor string, spec *archive.FilterSpec, data bool) (io.ReadCloser, error)
258
259	// NewDataSource constructs a data source for the specified ID within the context of the Exporter. This is a single
260	// use source which may hold resources until Closed.
261	NewDataSource(op trace.Operation, id string) (DataSource, error)
262}
263```
264
265This provides a pair of helper functions per store, supporting generalized implementation of the above Import/Export logic:
266
267```
268// Resolver defines methods for mapping ids to URLS, and urls to owners of that device
269type Resolver interface {
270	// URL returns a url to the data source representing `id`
271	// For historic reasons this is not the same URL that other parts of the storage component use, but an actual
272	// URL suited for locating the storage element without having additional precursor knowledge.
273        // Of the form `ds://[datastore]/path/on/datastore/element.vmdk`
274	URL(op trace.Operation, id string) (*url.URL, error)
275
276	// Owners returns a list of VMs that are using the resource specified by `url`
277	Owners(op trace.Operation, url *url.URL, filter func(vm *mo.VirtualMachine) bool) ([]*vm.VirtualMachine, error)
278}
279```
280
281Example usage:
282
283```
284func (h *handler) ImportArchive(store, id string, spec *archive.FilterSpec, tar io.ReadCloser) middleware.Responder {
285	op := trace.NewOperation(context.Background(), "ImportArchive: %s:%s", store, id)
286
287	s, ok := storage.GetImporter(store)
288	if !ok {
289		op.Errorf("Failed to locate import capable store %s", store)
290		op.Debugf("Available importers are: %+q", storage.GetImporters())
291
292		return storage.NewImportArchiveNotFound()
293	}
294
295	err := s.Import(op, id, spec, tar)
296	if err != nil {
297                // This should be usefully typed errors
298		return storage.NewExportArchiveInternalServerError()
299	}
300
301	return storage.NewImportArchiveOK()
302}
303```
304