1## DOCKER CP DESIGN DOC 2 3### Initial Rundown of docker cp 4 5Docker cp has many possible scenarios that we will have to account for and they will be described in this section. 6 7We will be supporting docker cp in it's entirety. This means we will need to copy files to and from a container regardless of whether it is on or off. This will lead to 4 scenarios. 8 91. The container is *on*, and we want to copy data *from* it. 10 112. The container is *on*, and we want to copy data *to* it. 12 133. The container is *off*, and we want to copy data *from* it.(will constitute multiple calls from the personality) 14 154. The container is *off*, an we want to copy data *to* it.(will constitute multiple calls from the personality) 16 17According to the docker remote api the target for a copy must be in `identity (no compression), gzip, bzip2, xz`. As the docker cli handles packaging a *to* request we can expect on of these formats. We must package our *from* response in one of these formats as well. 18 19 _NOTE_: docker has discussed copying between containers in the past. This will be yet another set of 4 possibilities and endpoints. Something that we can add to this plan later if it is needed in the future. 20 21Here is a more complex look at the call operation/state flow. 22 23|Operation| state| volumes | scenario | 24|---|---|---|---| 25| Export | ContainerRunning | No | Single call from personality to portlayer. Guest Tools will be used on the target rather than `Read` | 26| Export | ContainerRunning | Yes | Single call from personality to portlayer. Guest Tools will be used on the target rather than `Read` | 27| Export | ContainerStopped | No | Single call from personality to portlayer. `Read` is called based on the supplied filespec. | 28| Export | ContainerStopped | Yes | Multiple calls from personality to portlayer. One for the r/w layer, and n more calls where n is the number of volumes. `Read` is invoked n+1 times based on each call. Powerstatus likely won't matter here because we mount non-persistent disks| 29| Import | ContainerRunning | No | Single call from personality to portlayer. Guest tools will be used on the target rather than a `Write` | 30| Import | ContainerRunning | Yes | Single call from personality to portlayer. Guest tools will be used on the target rather than a `Write` | 31| Import | ContainerStopped | No | Single call from personality to portlayer. `Write` will be used to mount the r/w layer and then write the contents based on the supplied filespec | 32| Import | ContainerStopped | Yes | Multiple calls from personality to portlayer. One for the r/w layer, and n more calls where n is the number of volumes. `Write` is invoked n+1 times based on each call from the personality. If the container is started during this time we cannot mount the volumes or the r/w layer and we will report a failure requesting the user to try again. Alternatively, we block start events until operation completion. | 33 34 35### Personality to Portlayer Communication 36 37Since all 4 of these must be supported we will require both distinctions for when the container is off and which endpoint to call to distinguish between *to* and *from* operations. 38 39The proposed solution is to have two endpoints designed for the portlayer api swagger server. Both endpoints will exist along the same request path as two different verbs. The request path should be as such : 40 41The portlayer functional target for `ExportArchive` and `ImportArchive` will need to be called multiple times from the personality for each device that constitutes the full filesystem of the container. A `PathSpec` struct will be used to mark inclusion and exclusion paths. We will also need to know when to strip the paths provided to the portlayer since the view of a volume would be `/` while in a running container the target path could be `/path/to/volume/` where the path before the final `/` exists on the container r/w layer(or another volume, but we are not worrying about that now.) Because there will be multiple calls for the potential Import of the archive(write calls) we will need to have a way to pass or reference the stream of information to the portlayer multiple times. This may constitute another input for the function header in order to identify the stream that we care about. 42 43the portlayer functional targets will look as such: 44 45__ExportArchive__ 46 47``` 48// ID : container/vm target ID mainly for power state checking in the portlayer to determine Exports logical behavior. 49// deviceID : target deviceID for copying 50// fileSpec : just a map[string]string of keyed paths to exclusion and strip operations determined by the desired behavior of the read 51// 52func Read(ID, deviceID, filespec map[string]string) (io.reader, error) 53 54// no need for the data bool since we know that this operation involves data. we can set that as always true on the portlayer side of things. 55``` 56 57 58__ImportArchive__ 59``` 60// StoreID : ID of the store where the target device resides be it image store, volume store, or something else in the future. 61// ID : target container/vm for the portlayer to check powerstate to determine the logical approach of the ImportArchive function 62// deviceID : device id found in the targeted store that is to receive the archive 63// fileSpec : just a map[string]string of keyed paths to exclusion and strip operations determined by the desired behavior of the write 64// tarStream : the actual tar stream to be imported to the target 65func Write(ID, deviceID string, fileSpec map[string]string, tarStream io.Writer) error 66``` 67 68Note: this call will involve multiple docker endpoints, Stat is needed with we plan to support the `-L` functionality. We will also need the interaction piece for streaming tar archive's to the portlayer with the same tar stream from docker being streamed to multiple calls(hopefully we might have a way to not stream the same data several times over the network). 69 70## Personality design 71 72 73### Copy Operation 74 75The personality has Three endpoints that are called depending on the to/from situation above(5 if you count the two deprecated calls that were made previously.). The portlayer will divide the behavior of to/from as two separate endpoints. These endpoints will behave in an `Export/Import` behavior. Aside from the Import/Export behavior the docker cli also calls `ContainerStatPath` when determining whether to follow symlinks via the `-L` option. We will need this endpoint implemented all the way through to the portlayer if we want to support following symlinks. 76 77Error codes expected by docker : 78 79``` 80200 81 82400 83 84403 - in the event of a readonly filesystem 85 86404 87 88409 89 90500 91 92``` 93 94Docker does not do much in the way of string checking as far as the path is concerned. This is likely not a big concern for the personality as the actual copy command will fail if a bad path string is provided. 95 96For the multiple calls to the portlayer we will need to assemble Pathspecs which detail inclusion, exclusion, rebase,and strip configurations for the tar stream on both reads and writes. Below will be some scenarios and what the path spec should look like for those scenarios. 97 98Exclude the tether path. Exclude any paths that docker does not allow for a copy. the filespec map with a serialized struct or keys for `exclusion`, `strip`. 99 100examples of a pathspec(please note that these are simple scenarios for the time being): 101``` 102For a CopyTo to path "/volumes" : 103container has one volume mounted at /volumes/data 104 105this will invoke two write calls. 106 107first pathspec: 108 109// this pathspec would be empty since you would be writing against the r/w of the containerFS and the starting path would be '/' of the containerFS. No stripping of the write headers necessary. 110spec.rebase="volumes" 111spec.strip = "" 112spec.includes = map[string]string {"": struct{}{}} 113spec.excludes = map[string]string {"volumes/data": struct{}{}} 114 115Second Pathspec: 116spec.Strip = "volumes/data/" // stripped since the volume will be mounted and the starting path will be "/" 117spec.Rebase = "" 118spec.includes = map[string]string {"": struct{}{}} 119} 120``` 121 122``` 123For a Copy from path "/volumes/" : 124container has one volume mounted at /volumes/data 125 126this will invoke two read calls 127 128// this pathspec would be empty since you would be writing against the r/w of the containerFS and the starting path would be '/' of the containerFS. No rebase necessary for the headers. 129spec.Rebase = "volumes" 130spec.Strip = "volumes" 131// in this case strip and rebase are the same since this is a 1st level directory e.g for a target of /volumes/data rebase would be "data" and strip would be "volumes/data" 132spec.include = map[string]string {"volumes":struct{}{}} 133spec.Exclude = map[string]string {"": struct{}{}} 134 135// this pathspec will be for the volume filesystem 136spec.Rebase = "volumes" 137spec.Strip = "" 138spec.include = map[string]string {"":struct{}{}} 139 140``` 141 142 143NOTE: use TarAppender in docker/pkg/archive to possibly merge the different tar streams during an ExportArchive. 144 145### Stat Operation 146 147 148 149## Portlayer Design 150 151 152 153 There will be two situations to be concerned with in the portlayer. When the container is *on* and when the container is *off*. 154 155 in the event of the *off* scenario we will convey the requested operations to the `Read`/`Write` calls in the storage portion of the portlayer. The results of those calls should be returned to the user. 156 157 Regardless of whether the container is on or off. Some investigation has shown that it works in the scenario of both containers being on, off, and one on and one off. We will need to architect a way for this to work for us. If the target is a volume that is attached to a container we will need to understand what is needed in the case of a vmdk volume target. This behavior should not be an issue for the nfs volume store targets. 158 159 The portlayer should behave at a basic level as such: 160 161 ``` 162 1. look up appropriate storage object 1632. check object usage 1643. mount object or initiate copy based on (2) 1653b. check for error in initialization of copy due to power state change, repeat (2) if found 1664. mounted disk prevents incompatible operations, or, tether copy in progress squashes in guest shutdown 167 ``` 168 169 some additiona notes surrounding the portlayer and swagger endpoint design: 170 171 172use query string to pass FilterSpec so that it's not needed to be packaged in a custom format in body along with the tar 173this assumes that callers have some knowledge of mount topology and correctly populate the FilterSpec to avoid recursing into mounts that will be directly addressed by a later call. 174because it's possible for the various storage elements of a container to be in different states (e.g. container is not running so r/w and volumeA are not in use, but volumeB from the container is now used by container X) we require that separate calls be made to Import/Export for each of the distinct storage elements. This allows those calls to be routed appropriately based on current owners and usage of the element in question. 175 176 177### Stat Operation 178 179We will need a `stat` endpoint which will be used to target devices with a filesystem that can return fileinfo back to the caller. This will need to be tolerant of the power status of the target, if the target container/vm is on then we can use guest tools. Otherwise, mounting the appropriate device for the stat is necessary. Like the Read/Write calls we should expect a store ID and a device ID in addition to the target of the stat operation. if the compute is not active then we should mount the target device specified for the stat. 180 181``` 182underlying filesystem stat : 183// storename : this is the store that the target device resides on 184// deviceID : this is the target device for the stat operation 185// target : is the filesys 186// 187// stat will mount the target and stat the filesystem target in order to obtain the requested filesystem info. 188func Stat(storename string, deviceID string, spec archive.FilterSpec) (FileInfo, error) 189 190// it is the responsibility of the caller to determine the status of this device. If it is already mounted or in use the caller must determine the action to be taken. 191``` 192 193 194### Portlayer Import/Export Behavior for writing and reading to storage devices. 195The following are the core interfaces that allow hiding of the current usage state of a given storage element such as a volume: 196 197``` 198When a container is online and a copy is attempted guest tools will be utilized to move the files onto the container. Below are the defined interfaces that will be used for reading and writing to the devices. Note that there can be multiple datasources and datasinks for the same device backing. This is due to online and offline behavior. 199 200A docker stop should be blocked until the copy.(currently this is not the case) 201 202// DataSource defines the methods for exporting data from a specific storage element as a tar stream 203type DataSource interface { 204 // Close releases all resources associated with this source. Shared resources should be reference counted. 205 io.Closer 206 207 // Export performs an export of the specified files, returning the data as a tar stream. This is single use; once 208 // the export has completed it should not be assumed that the source remains functional. 209 // 210 // spec: specifies which files will be included/excluded in the export and allows for path rebasing/stripping 211 // data: if true the actual file data is included, if false only the file headers are present 212 Export(op trace.Operation, spec *archive.FilterSpec, data bool) (io.ReadCloser, error) 213 214 // Source returns the mechanism by which the data source is accessed 215 // Examples: 216 // vmdk mounted locally: *os.File 217 // nfs volume: XDR-client 218 // via guesttools: toolbox client 219 Source() interface{} 220} 221 222// DataSink defines the methods for importing data to a specific storage element from a tar stream 223type DataSink interface { 224 // Close releases all resources associated with this sink. Shared resources should be reference counted. 225 io.Closer 226 227 // Import performs an import of the tar stream to the source held by this DataSink. This is single use; once 228 // the export has completed it should not be assumed that the sink remains functional. 229 // 230 // spec: specifies which files will be included/excluded in the import and allows for path rebasing/stripping 231 // tarStream: the tar stream to from which to import data 232 Import(op trace.Operation, spec *archive.FilterSpec, tarStream io.ReadCloser) error 233 234 // Sink returns the mechanism by which the data sink is accessed 235 // Examples: 236 // vmdk mounted locally: *os.File 237 // nfs volume: XDR-client 238 // via guesttools: toolbox client 239 Sink() interface{} 240} 241 242// Importer defines the methods needed to write data into a storage element. This should be implemented by the various 243// store types. 244type Importer interface { 245 // Import allows direct construction and invocation of a data sink for the specified ID. 246 Import(op trace.Operation, id string, spec *archive.FilterSpec, tarStream io.ReadCloser) error 247 248 // NewDataSink constructs a data sink for the specified ID within the context of the Importer. This is a single 249 // use sink which may hold resources until Closed. 250 NewDataSink(op trace.Operation, id string) (DataSink, error) 251} 252 253// Exporter defines the methods needed to read data from a storage element, optionally diff with an ancestor. This 254// shoiuld be implemented by the various store types. 255type Exporter interface { 256 // Export allows direct construction and invocation of a data source for the specified ID. 257 Export(op trace.Operation, id, ancestor string, spec *archive.FilterSpec, data bool) (io.ReadCloser, error) 258 259 // NewDataSource constructs a data source for the specified ID within the context of the Exporter. This is a single 260 // use source which may hold resources until Closed. 261 NewDataSource(op trace.Operation, id string) (DataSource, error) 262} 263``` 264 265This provides a pair of helper functions per store, supporting generalized implementation of the above Import/Export logic: 266 267``` 268// Resolver defines methods for mapping ids to URLS, and urls to owners of that device 269type Resolver interface { 270 // URL returns a url to the data source representing `id` 271 // For historic reasons this is not the same URL that other parts of the storage component use, but an actual 272 // URL suited for locating the storage element without having additional precursor knowledge. 273 // Of the form `ds://[datastore]/path/on/datastore/element.vmdk` 274 URL(op trace.Operation, id string) (*url.URL, error) 275 276 // Owners returns a list of VMs that are using the resource specified by `url` 277 Owners(op trace.Operation, url *url.URL, filter func(vm *mo.VirtualMachine) bool) ([]*vm.VirtualMachine, error) 278} 279``` 280 281Example usage: 282 283``` 284func (h *handler) ImportArchive(store, id string, spec *archive.FilterSpec, tar io.ReadCloser) middleware.Responder { 285 op := trace.NewOperation(context.Background(), "ImportArchive: %s:%s", store, id) 286 287 s, ok := storage.GetImporter(store) 288 if !ok { 289 op.Errorf("Failed to locate import capable store %s", store) 290 op.Debugf("Available importers are: %+q", storage.GetImporters()) 291 292 return storage.NewImportArchiveNotFound() 293 } 294 295 err := s.Import(op, id, spec, tar) 296 if err != nil { 297 // This should be usefully typed errors 298 return storage.NewExportArchiveInternalServerError() 299 } 300 301 return storage.NewImportArchiveOK() 302} 303``` 304