xref: /minix/minix/drivers/storage/vnd/NOTES (revision 00b67f09)
1Development notes regarding VND. Original document by David van Moolenbroek.
2
3
4DESIGN DECISIONS
5
6As simple as the VND driver implementation looks, several important decisions
7had to be made in the design process. These decisions are listed here.
8
9Multiple instances instead of a single instance: The decision to spawn a
10separate driver instance for each VND unit was not ideologically inspired, but
11rather based on a practical issue. Namely, users may reasonably expect to be
12able to set up a VND using a backing file that resides on a file system hosted
13on another VND. If one single driver instance were to host both VND units, its
14implementation would have to perform all its backcalls to VFS asynchronously,
15so as to be able to process another incoming request that was initiated as part
16of such an ongoing backcall. As of writing, MINIX3 does not support any form of
17asynchronous I/O, but this would not even be sufficient: the asynchrony would
18have to extend even to the close(2) call that takes place during device
19unconfiguration, as this call could spark I/O to another VND device.
20Ultimately, using one driver instance per VND unit avoids these complications
21altogether, thus making nesting possible with a maximum depth of the number of
22VFS threads. Of course, this comes at the cost of having more VND driver
23processes; in order to avoid this cost in the common case, driver instances are
24dynamically started and stopped by vndconfig(8).
25
26copyfd(2) instead of openas(2): Compared to the NetBSD interface, the MINIX3
27VND API requires that the user program configuring a device pass in a file
28descriptor in the vnd_ioctl structure instead of a pointer to a path name.
29While binary compatibility with NetBSD would be impossible anyway (MINIX3 can
30not support pointers in IOCTL data structures), providing a path name buffer
31would be closer to what NetBSD does. There are two reasons behind the choice to
32pass in a file descriptor instead. First, performing an open(2)-like call as
33a driver backcall is tricky in terms of avoiding deadlocks in VFS, since it
34would by nature violate the VFS locking order. On top of that, special
35provisions would have to be added to support opening a file in the context of
36another process so that chrooted processes would be supported, for example.
37In contrast, copying a file descriptor to a remote process is relatively easy
38because there is only one potential deadlock case to cover - that of the given
39file descriptor identifying the VFS filp object used to control the very same
40device - and VFS need only implement a procedure that very much resembles
41sending a file descriptor across a UNIX domain socket. Second, since passing a
42file descriptor is effectively passing an object capability, it is easier to
43improve the isolation of the VND drivers in the future, as described below.
44
45No separate control device: The driver uses the same minor (block) device for
46configuration and for actual (whole-disk) I/O, instead of exposing a separate
47device that exists only for the purpose of configuring the device. The reason
48for this is that such a control device simply does not fit the NetBSD
49opendisk(3) API. While MINIX3 may at some point implement support for NetBSD's
50notion of raw devices, such raw devices are still expected to support I/O, and
51that means they cannot be control-only. In this regard, it should be mentioned
52that the entire VND infrastructure relies on block caches being invalidated
53properly upon (un)configuration of VND units, and that such invalidation
54(through the REQ_FLUSH file system request) is currently initiated only by
55closing block devices. Support for configuration or I/O through character
56devices would thus require more work on that side first. In any case, the
57primary downside of not having a separate control device is that handling
58access permissions on device open is a bit of a hack in order to keep the
59MINIX3 userland happy.
60
61
62FUTURE IMPROVEMENTS
63
64Currently, the VND driver instances are run as root just and only because the
65copyfd(2) call requires root. Obviously, nonroot user processes should never
66be able to copy file descriptors from arbitrary processes, and thus, some
67security check is required there. However, an access control list for VFS calls
68would be a much better solution: in that case, VND driver processes can be
69given exclusive rights to the use of the copyfd(2) call, while they can be
70given a normal driver UID at the same time.
71
72In MINIX3's dependability model, drivers are generally not considered to be
73malicious. However, the VND case is interesting because it is possible to
74isolate individual driver instances to the point of actual "least authority".
75The copyfd(2) call currently allows any file descriptor to be copied, but it
76would be possible to extend the scheme to let user processes (and vndconfig(8)
77in particular) mark the file descriptors that may be the target of a copyfd(2)
78call. One of several schemes may be implemented in VFS for this purpose. For
79example, each process could be allowed to mark one of its file descriptors as
80"copyable" using a new VFS call, and VFS would then allow copyfd(2) only on a
81"copyable" file descriptor from a process blocked on a call to the driver that
82invoked copyfd(2). This approach precludes hiding a VND driver behind a RAID
83or FBD (etc) driver, but more sophisticated approaches can solve that as well.
84Regardless of the scheme, the end result would be a situation where the VND
85drivers are strictly limited to operating on the resources given to them.
86
87Note that copyfd(2) was originally called dupfrom(2), and then extended to copy
88file descriptors *to* remote processes as well. The latter is not as security
89sensitive, but may have to be restricted in a similar way. If this is not
90possible, copyfd(2) can always be split into multiple calls.
91