xref: /minix/minix/servers/vfs/README (revision e3b8d4bb)
1433d6423SLionel Sambuc## Description of VFS                            Thomas Veerman 21-3-2013
2433d6423SLionel Sambuc## This file is organized such that it can be read both in a Wiki and on
3433d6423SLionel Sambuc## the MINIX terminal using e.g. vi or less. Please, keep the file in the
4433d6423SLionel Sambuc## source tree as the canonical version and copy changes into the Wiki.
5433d6423SLionel Sambuc#pragma section-numbers 2
6433d6423SLionel Sambuc
7433d6423SLionel Sambuc= VFS internals =
8433d6423SLionel Sambuc
9433d6423SLionel Sambuc<<TableOfContents(2)>>
10433d6423SLionel Sambuc
11433d6423SLionel Sambuc## Table of contents
12433d6423SLionel Sambuc## 1 ..... General description of responsibilities
13433d6423SLionel Sambuc## 2 ..... General architecture
14433d6423SLionel Sambuc## 3 ..... Worker threads
15433d6423SLionel Sambuc## 4 ..... Locking
16433d6423SLionel Sambuc## 4.1 .... Locking requirements
17433d6423SLionel Sambuc## 4.2 .... Three-level Lock
18433d6423SLionel Sambuc## 4.3 .... Data structures subject to locking
19433d6423SLionel Sambuc## 4.4 .... Locking order
20433d6423SLionel Sambuc## 4.5 .... Vmnt (file system) locking
21433d6423SLionel Sambuc## 4.6 .... Vnode (open file) locking
22433d6423SLionel Sambuc## 4.7 .... Filp (file position) locking
23433d6423SLionel Sambuc## 4.8 .... Lock characteristics per request type
24433d6423SLionel Sambuc## 5 ..... Recovery from driver crashes
25433d6423SLionel Sambuc## 5.1 .... Recovery from block drivers crashes
26433d6423SLionel Sambuc## 5.2 .... Recovery from character driver crashes
27433d6423SLionel Sambuc## 5.3 .... Recovery from File Server crashes
28433d6423SLionel Sambuc
29433d6423SLionel Sambuc== General description of responsibilities ==
30433d6423SLionel Sambuc## 1 General description of responsibilities
31433d6423SLionel SambucVFS implements the file system in cooperation with one or more File Servers
32433d6423SLionel Sambuc(FS). The File Servers take care of the actual file system on a partition. That
33433d6423SLionel Sambucis, they interpret the data structure on disk, write and read data to/from
34433d6423SLionel Sambucdisk, etc. VFS sits on top of those File Servers and communicates with
35433d6423SLionel Sambucthem. Looking inside VFS, we can identify several roles. First, a role of VFS
36433d6423SLionel Sambucis to handle most POSIX system calls that are supported by Minix. Additionally,
37433d6423SLionel Sambucit supports a few calls necessary for libc. The following system calls are
38433d6423SLionel Sambuchandled by VFS:
39433d6423SLionel Sambuc
40433d6423SLionel Sambucaccess, chdir, chmod, chown, chroot, close, creat, fchdir, fcntl, fstat,
41433d6423SLionel Sambucfstatvfs, fsync, ftruncate, getdents, getvfsstat, ioctl, link, lseek,
42433d6423SLionel Sambuclstat, mkdir, mknod, mount, open, pipe2, read, readlink, rename, rmdir, select,
43433d6423SLionel Sambucstat, statvfs, symlink, sync, truncate, umask, umount, unlink, utimes, write.
44433d6423SLionel Sambuc
45433d6423SLionel SambucSecond, it maintains part of the state belonging to a process (process state is
46433d6423SLionel Sambucspread out over the kernel, VM, PM, and VFS). For example, it maintains state
47433d6423SLionel Sambucfor select(2) calls, file descriptors and file positions. Also, it cooperates
48433d6423SLionel Sambucwith the Process Manager to handle the fork, exec, and exit system calls.
49433d6423SLionel SambucThird, VFS keeps track of endpoints that are supposed to be drivers for
50*e3b8d4bbSDavid van Moolenbroekcharacter or block special files, as well as for socket protocol families.
51*e3b8d4bbSDavid van MoolenbroekFile Servers can be regarded as drivers for block special files, although they
52*e3b8d4bbSDavid van Moolenbroekare handled entirely different compared to other drivers.
53433d6423SLionel Sambuc
54433d6423SLionel SambucThe following diagram depicts how a read() on a file in /home is being handled:
55433d6423SLionel Sambuc{{{
56433d6423SLionel Sambuc      ----------------
57433d6423SLionel Sambuc      | user process |
58433d6423SLionel Sambuc      ----------------
59433d6423SLionel Sambuc             ^      ^
60433d6423SLionel Sambuc             |      |
61433d6423SLionel Sambuc           read(2)   \
62433d6423SLionel Sambuc             |        \
63433d6423SLionel Sambuc             V         \
64433d6423SLionel Sambuc      ----------------  |
65433d6423SLionel Sambuc      |      VFS     |  |
66433d6423SLionel Sambuc      ----------------  |
67433d6423SLionel Sambuc                    ^   |
68433d6423SLionel Sambuc                    |   |
69433d6423SLionel Sambuc                    V   |
70433d6423SLionel Sambuc  ------- -------- ---------
71433d6423SLionel Sambuc  | MFS | |  MFS | |  MFS  |
72433d6423SLionel Sambuc  |  /  | | /usr | | /home |
73433d6423SLionel Sambuc  ------- -------- ---------
74433d6423SLionel Sambuc}}}
75433d6423SLionel SambucDiagram 1: handling of read(2) system call
76433d6423SLionel Sambuc
77433d6423SLionel SambucThe user process executes the read system call which is delivered to VFS. VFS
78433d6423SLionel Sambucverifies the read is done on a valid (open) file and forwards the request
79433d6423SLionel Sambucto the FS responsible for the file system on which the file resides. The FS
80433d6423SLionel Sambucreads the data, copies it directly to the user process, and replies to VFS
81433d6423SLionel Sambucit has executed the request. Subsequently, VFS replies to the user process
82433d6423SLionel Sambucthe operation is done and the user process continues to run.
83433d6423SLionel Sambuc
84433d6423SLionel Sambuc== General architecture ==
85433d6423SLionel Sambuc## 2 General architecture
86433d6423SLionel SambucVFS works roughly identical to every other server and driver in Minix; it
87433d6423SLionel Sambucfetches a message (internally referred to as a job in some cases), executes
88433d6423SLionel Sambucthe request embedded in the message, returns a reply, and fetches the next
89433d6423SLionel Sambucjob. There are several sources for new jobs: from user processes, from PM, from
90433d6423SLionel Sambucthe kernel, and from suspended jobs inside VFS itself (suspended operations
91*e3b8d4bbSDavid van Moolenbroekon pipes, locks, character special files, or sockets). File Servers are
92*e3b8d4bbSDavid van Moolenbroekregarded as normal user processes in this case, but their abilities are
93*e3b8d4bbSDavid van Moolenbroeklimited. This is to prevent deadlocks. Once a job is received, a worker thread
94*e3b8d4bbSDavid van Moolenbroekstarts executing it. During the lifetime of a job, the worker thread might need
95433d6423SLionel Sambucto talk to several File Servers. The protocol VFS speaks with File Servers
96433d6423SLionel Sambucis fully documented on the Wiki at [0]. The protocol fields are defined in
97433d6423SLionel Sambuc<minix/vfsif.h>. If the job is an operation on a character or block special
98433d6423SLionel Sambucfile and the need to talk to a driver arises, VFS uses the Character and
99433d6423SLionel SambucBlock Device Protocol. See [1]. This is sadly not official documentation,
100433d6423SLionel Sambucbut it is an accurate description of how it works. Luckily, driver writers
101433d6423SLionel Sambuccan use the libchardriver and libblockdriver libraries and don't have to
102433d6423SLionel Sambucknow the details of the protocol.
103433d6423SLionel Sambuc
104433d6423SLionel Sambuc== Worker threads ==
105433d6423SLionel Sambuc## 3 Worker threads
106433d6423SLionel SambucUpon start up, VFS spawns a configurable amount of worker threads. The
107433d6423SLionel Sambucmain thread fetches requests and replies, and hands them off to idle or
108433d6423SLionel Sambucreply-pending workers, respectively. If no worker threads are available,
109433d6423SLionel Sambucthe request is queued. All standard system calls are handled by such worker
110433d6423SLionel Sambucthreads. One of the threads is reserved to handle new requests from system
111433d6423SLionel Sambucprocesses (i.e., File Servers and drivers) when there are no normal worker
112433d6423SLionel Sambucthreads available; all normal threads might be blocked on a single worker
113433d6423SLionel Sambucthread that caused a system process to send a request on its own. To unblock
114433d6423SLionel Sambucall normal threads, we need to reserve one spare thread to handle that
115433d6423SLionel Sambucsituation. VFS drives all File Servers and drivers asynchronously. While
116433d6423SLionel Sambucwaiting for a reply, a worker thread is blocked and other workers can keep
117433d6423SLionel Sambucprocessing requests. Upon reply the worker thread is unblocked.
118433d6423SLionel Sambuc
119433d6423SLionel SambucAs mentioned above, the main thread is responsible for retrieving new jobs and
120433d6423SLionel Sambucreplies to current jobs and start or unblock the proper worker thread.
121433d6423SLionel SambucDriver replies are processed directly from the main thread. As a consequence,
122433d6423SLionel Sambucthese processing routines may not block their calling thread. In some cases,
123433d6423SLionel Sambucthese routines may resume a thread that is blocked waiting for the reply. This
124433d6423SLionel Sambucis always the case for block driver replies, and may or may not be the case for
125*e3b8d4bbSDavid van Moolenbroekcharacter and socket driver replies. The character and socket driver reply
126*e3b8d4bbSDavid van Moolenbroekprocessing routines may also unblock suspended processes which in turn generate
127*e3b8d4bbSDavid van Moolenbroeknew jobs to be handled by the main loop (e.g., suspended reads and writes on
128*e3b8d4bbSDavid van Moolenbroekpipes). So depending on the reply a new thread may have to be started.
129433d6423SLionel Sambuc
130433d6423SLionel SambucWorker threads are strictly tied to a process, and each process can have at
131433d6423SLionel Sambucmost one worker thread running for it. Generally speaking, there are two types
132433d6423SLionel Sambucof work supported by worker threads: normal work, and work from PM. The main
133433d6423SLionel Sambucsubtype of normal work is the handling of a system call made by the process
134433d6423SLionel Sambucitself. The process is blocked while VFS is handling the system call, so no new
135433d6423SLionel Sambucsystem call can arrive from a process while VFS has not completed a previous
136433d6423SLionel Sambucsystem call from that process. For that reason, if there are no worker threads
137433d6423SLionel Sambucavailable to handle the work, the work is queued in the corresponding process
138433d6423SLionel Sambucentry of the fproc table.
139433d6423SLionel Sambuc
140433d6423SLionel SambucThe other main type of work consists of requests from PM. The protocol PM
141433d6423SLionel Sambucspeaks with VFS is asynchronous. PM is allowed to send up to one request per
142433d6423SLionel Sambucprocess to VFS, in addition to a request to initiate a reboot. Most jobs from
143433d6423SLionel SambucPM are taken care of immediately by the main thread, but some jobs require a
144433d6423SLionel Sambucworker thread context (to be able to sleep) and/or serialization with normal
145433d6423SLionel Sambucwork. Therefore, each process may have a PM request queued for execution, also
146433d6423SLionel Sambucin the fproc table. Managing proper queuing, addition, and execution of both
147433d6423SLionel Sambucnormal and PM work is the responsibility of the worker thread infrastructure.
148433d6423SLionel Sambuc
149433d6423SLionel SambucThere are several special tasks that require a worker thread, and these are
150433d6423SLionel Sambucimplemented as normal work associated with a certain special process that does
151433d6423SLionel Sambucnot make regular VFS calls anyway. For example, the initial ramdisk mount
152433d6423SLionel Sambucprocedure uses a thread associated with the VFS process. Some of these special
153433d6423SLionel Sambuctasks require protection against being started multiple times at once, as this
154433d6423SLionel Sambucis not only undesirable but also disallowed. The full list of worker thread
155433d6423SLionel Sambuctask types and subtypes is shown in Table 1.
156433d6423SLionel Sambuc
157433d6423SLionel Sambuc{{{
158433d6423SLionel Sambuc-------------------------------------------------------------------------
159433d6423SLionel Sambuc| Worker thread task        | Type   | Association     | May use spare? |
160433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+
161433d6423SLionel Sambuc| system call from process  | normal | calling process | if system proc |
162433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+
163433d6423SLionel Sambuc| resumed pipe operation    | normal | calling process | no             |
164433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+
165433d6423SLionel Sambuc| postponed PM request      | PM     | target process  | no             |
166433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+
167433d6423SLionel Sambuc| DS event notification     | normal | DS              | yes            |
168433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+
169433d6423SLionel Sambuc| initial ramdisk mounting  | normal | VFS             | no             |
170433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+
171433d6423SLionel Sambuc| reboot sequence           | normal | PM              | no             |
172433d6423SLionel Sambuc-------------------------------------------------------------------------
173433d6423SLionel Sambuc}}}
174433d6423SLionel SambucTable 1: worker thread work types and subtypes
175433d6423SLionel Sambuc
176433d6423SLionel SambucCommunication with block drivers is asynchronous, but at this time, access to
177433d6423SLionel Sambucthese drivers is serialized on a per-driver basis. File Servers are treated
178433d6423SLionel Sambucdifferently. VFS was designed to be able to send requests concurrently to File
179433d6423SLionel SambucServers, although at the time of writing there are no File Servers that can
180433d6423SLionel Sambucactually make use of that functionality. To identify which reply from an FS
181433d6423SLionel Sambucbelongs to which worker thread, all requests have an embedded transaction
182433d6423SLionel Sambucidentification number (a magic number + thread id encoded in the mtype field of
183433d6423SLionel Sambuca message) which the FS has to echo upon reply. Because the range of valid
184433d6423SLionel Sambuctransaction IDs is isolated from valid system call numbers, VFS can use that ID
185433d6423SLionel Sambucto differentiate between replies from File Servers and actual new system calls
186433d6423SLionel Sambucfrom FSes. Using this mechanism VFS is able to support FUSE and ProcFS.
187433d6423SLionel Sambuc
188433d6423SLionel Sambuc== Locking ==
189433d6423SLionel Sambuc## 4 Locking
190433d6423SLionel SambucTo ensure correct execution of system calls, worker threads sometimes need
191433d6423SLionel Sambuccertain objects within VFS to remain unchanged during thread suspension
192433d6423SLionel Sambucand resumption (i.e., when they need to communicate with a driver or File
193433d6423SLionel SambucServer). Threads keep most state on the stack, but there are a few global
194433d6423SLionel Sambucvariables that require protection: the fproc table, vmnt table, vnode table,
195433d6423SLionel Sambucand filp table. Other tables such as lock table, select table, and dmap table
196433d6423SLionel Sambucdon't require protection by means of exclusive access. There it's required
197433d6423SLionel Sambucand enough to simply mark an entry in use.
198433d6423SLionel Sambuc
199433d6423SLionel Sambuc=== Locking requirements ===
200433d6423SLionel Sambuc## 4.1 Locking requirements
201433d6423SLionel SambucVFS implements the locking model described in [2]. For completeness of this
202433d6423SLionel Sambucdocument we'll describe it here, too. The requirements are based on a threading
203433d6423SLionel Sambucpackage that is non-preemptive. VFS must guarantee correct functioning with
204433d6423SLionel Sambucseveral, semi-concurrently executing threads in any arbitrary order. The
205433d6423SLionel Sambuclatter requirement follows from the fact that threads need service from
206433d6423SLionel Sambucother components like File Servers and drivers, and they may take any time
207433d6423SLionel Sambucto complete requests.
208433d6423SLionel Sambuc 1. Consistency of replicated values. Several system calls rely on VFS keeping a replicated representation of data in File Servers (e.g., file sizes, file modes, etc.).
209433d6423SLionel Sambuc 1. Isolation of system calls. Many system calls involve multiple requests to FSes. Concurrent requests from other processes must not lead to otherwise impossible results (e.g., a chmod operation on a file cannot fail halfway through because it's suddenly unlinked or moved).
210433d6423SLionel Sambuc 1. Integrity of objects. From the point of view of threads, obtaining mutual exclusion is a potentially blocking operation. The integrity of any objects used across blocking calls must be guaranteed (e.g., the file mode in a vnode must remain intact not only when talking to other components, but also when obtaining a lock on a filp).
211433d6423SLionel Sambuc 1. No deadlock. Not one call may cause another call to never complete. Deadlock situations are typically the result of two or more threads that each hold exclusive access to one resource and want exclusive access to the resource held by the other thread. These resources are a) data (global variables) and b) worker threads.
212433d6423SLionel Sambuc   a. Conflicts between locking of different types of objects can be avoided by keeping a locking order: objects of different type must always be locked in the same order. If multiple objects of the same type are to be locked, then first a "common denominator" higher up in the locking order must be locked.
213433d6423SLionel Sambuc   a. Some threads can only run to completion when another thread does work on their behalf. Examples of this are drivers and file servers that do system calls on their own (e.g., ProcFS, PFS/UNIX Domain Sockets, FUSE) or crashing components (e.g., a driver for a character special file that crashes during a request; a second thread is required to handle resource clean up or driver restart before the first thread can abort or retry the request).
214433d6423SLionel Sambuc 1. No starvation. VFS must guarantee that every system call completes in finite time (e.g., an infinite stream of reads must never completely block writes). Furthermore, we want to maximize parallelism to improve performance. This leads to:
215433d6423SLionel Sambuc 1. A request to one File Server must not block access to other FS processes. This means that most forms of locking cannot take place at a global level, and must at most take place on the file system level.
216433d6423SLionel Sambuc 1. No read-only operation on a regular file must block an independent read call to that file. In particular, (read-only) open and close operations may not block such reads, and multiple independent reads on the same file must be able to take place concurrently (i.e., reads that do not share a file position between their file descriptors).
217433d6423SLionel Sambuc
218433d6423SLionel Sambuc=== Three-level Lock ===
219433d6423SLionel Sambuc## 4.2 Three-level Lock
220433d6423SLionel SambucFrom the requirements it follows that we need at least two locking types: read
221433d6423SLionel Sambucand write locks. Concurrent reads are allowed, but writes are exclusive both
222433d6423SLionel Sambucfrom reads and from each other. However, in a lot of cases it possible to use
223433d6423SLionel Sambuca third locking type that is in between read and write lock: the serialize
224433d6423SLionel Sambuclock. This is implemented in the three-level lock [2]. The three-level
225433d6423SLionel Sambuclock provides:
226433d6423SLionel SambucTLL_READ: allows an unlimited number of threads to hold the lock with the
227433d6423SLionel Sambucsame type (both the thread itself and other threads); N * concurrent.
228433d6423SLionel SambucTLL_READSER: also allows an unlimited number of threads with type TLL_READ,
229433d6423SLionel Sambucbut only one thread can obtain serial access to the lock; N * concurrent +
230433d6423SLionel Sambuc1 * serial.
231433d6423SLionel SambucTLL_WRITE: provides full mutual exclusion; 1 * exclusive + 0 * concurrent +
232433d6423SLionel Sambuc0 * serial.
233433d6423SLionel SambucIn absence of TLL_READ locks, a TLL_READSER is identical to TLL_WRITE. However,
234433d6423SLionel SambucTLL_READSER never blocks concurrent TLL_READ access. TLL_READSER can be
235433d6423SLionel Sambucupgraded to TLL_WRITE; the thread will block until the last TLL_READ lock
236433d6423SLionel Sambucleaves and new TLL_READ locks are blocked. Locks can be downgraded to a
237433d6423SLionel Sambuclower type. The three-level lock is implemented using two FIFO queues with
238433d6423SLionel Sambucwrite-bias. This guarantees no starvation.
239433d6423SLionel Sambuc
240433d6423SLionel Sambuc=== Data structures subject to locking ===
241433d6423SLionel Sambuc## 4.3 Data structures subject to locking
242433d6423SLionel SambucVFS has a number of global data structures. See Table 2.
243433d6423SLionel Sambuc{{{
244433d6423SLionel Sambuc--------------------------------------------------------------------
245433d6423SLionel Sambuc| Structure  | Object description                                  |
246433d6423SLionel Sambuc+------------+-----------------------------------------------------|
247433d6423SLionel Sambuc| fproc      | Process (includes process's file descriptors)       |
248433d6423SLionel Sambuc+------------+-----------------------------------------------------|
249433d6423SLionel Sambuc| vmnt       | Virtual mount; a mounted file system                |
250433d6423SLionel Sambuc+------------+-----------------------------------------------------|
251433d6423SLionel Sambuc| vnode      | Virtual node; an open file                          |
252433d6423SLionel Sambuc+------------+-----------------------------------------------------|
253433d6423SLionel Sambuc| filp       | File position into an open file                     |
254433d6423SLionel Sambuc+------------+-----------------------------------------------------|
255433d6423SLionel Sambuc| lock       | File region locking state for an open file          |
256433d6423SLionel Sambuc+------------+-----------------------------------------------------|
257433d6423SLionel Sambuc| select     | State for an in-progress select(2) call             |
258433d6423SLionel Sambuc+------------+-----------------------------------------------------|
259433d6423SLionel Sambuc| dmap       | Mapping from major device number to a device driver |
260433d6423SLionel Sambuc--------------------------------------------------------------------
261433d6423SLionel Sambuc}}}
262433d6423SLionel SambucTable 2: VFS object types.
263433d6423SLionel Sambuc
264433d6423SLionel SambucAn fproc object is a process. An fproc object is created by fork(2)
265433d6423SLionel Sambucand destroyed by exit(2) (which may, or may not, be instantiated from the
266433d6423SLionel Sambucprocess itself). It is identified by its endpoint number ('fp_endpoint')
267433d6423SLionel Sambucand process id ('fp_pid'). Both are unique although in general the endpoint
268433d6423SLionel Sambucnumber is used throughout the system.
269433d6423SLionel SambucA vmnt object is a mounted file system. It is created by mount(2) and destroyed
270433d6423SLionel Sambucby umount(2). It is identified by a device number ('m_dev') and FS endpoint
271433d6423SLionel Sambucnumber ('m_fs_e'); both are unique to each vmnt object. There is always a
272433d6423SLionel Sambucsingle process that handles a file system on a device and a device cannot
273433d6423SLionel Sambucbe mounted twice.
274433d6423SLionel SambucA vnode object is the VFS representation of an open inode on the file
275433d6423SLionel Sambucsystem. A vnode object is created when a first process opens or creates the
276433d6423SLionel Sambuccorresponding file and is destroyed when the last process, which has that
277433d6423SLionel Sambucfile open, closes it. It is identified by a combination of FS endpoint number
278433d6423SLionel Sambuc('v_fs_e') and inode number of that file system ('v_inode_nr'). A vnode
279433d6423SLionel Sambucmight be mapped to another file system; the actual reading and writing is
280433d6423SLionel Sambuchandled by a different endpoint. This has no effect on locking.
281433d6423SLionel SambucA filp object contains a file position within a file. It is created when a file
282433d6423SLionel Sambucis opened or anonymous pipe created and destroyed when the last user (i.e.,
283433d6423SLionel Sambucprocess) closes it. A file descriptor always points to a single filp. A filp
284433d6423SLionel Sambucalways point to a single vnode, although not all vnodes are pointed to by a
285433d6423SLionel Sambucfilp. A filp has a reference count ('filp_count') which is identical to the
286433d6423SLionel Sambucnumber of file descriptors pointing to it. It can be increased by a dup(2)
287433d6423SLionel Sambucor fork(2). A filp can therefore be shared by multiple processes.
288433d6423SLionel SambucA lock object keeps information about locking of file regions. This has
289433d6423SLionel Sambucnothing to do with the threading type of locking. The lock objects require
290433d6423SLionel Sambucno locking protection and won't be discussed further.
291433d6423SLionel SambucA select object keeps information on a select(2) operation that cannot
292433d6423SLionel Sambucbe fulfilled immediately (waiting for timeout or file descriptors not
293433d6423SLionel Sambucready). They are identified by their owner ('requestor'); a pointer to the
294433d6423SLionel Sambucfproc table. A null pointer means not in use. A select object can be used by
295433d6423SLionel Sambuconly one process and a process can do only one select(2) at a time. Select(2)
296433d6423SLionel Sambucoperates on filps and is organized in such a way that it is sufficient to
297433d6423SLionel Sambucapply locking on individual filps and not on select objects themselves. They
298433d6423SLionel Sambucwon't be discussed further.
299433d6423SLionel SambucA dmap object is a mapping from a device number to a device driver. A device
300433d6423SLionel Sambucdriver can have multiple device numbers associated (e.g., TTY). Access to
301433d6423SLionel Sambuca driver is exclusive when it uses the synchronous driver protocol.
302433d6423SLionel Sambuc
303433d6423SLionel Sambuc=== Locking order ===
304433d6423SLionel Sambuc## 4.4 Locking order
305433d6423SLionel SambucBased on the description in the previous section, we need protection for
306433d6423SLionel Sambucfproc, vmnt, vnode, and filp objects. To prevent deadlocks as a result of
307433d6423SLionel Sambucobject locking, we need to define a strict locking order. In VFS we use the
308433d6423SLionel Sambucfollowing order:
309433d6423SLionel Sambuc
310433d6423SLionel Sambuc{{{
311433d6423SLionel Sambucfproc > [exec] > vmnt > vnode > filp > [block special file] > [dmap]
312433d6423SLionel Sambuc}}}
313433d6423SLionel Sambuc
314433d6423SLionel SambucThat is, no thread may lock an fproc object while holding a vmnt lock,
315433d6423SLionel Sambucand no thread may lock a vmnt object while holding an (associated) vnode, etc.
316433d6423SLionel Sambuc
317433d6423SLionel SambucFproc needs protection because processes themselves can initiate system
318433d6423SLionel Sambuccalls, but also PM can cause system calls that have to be executed in their
319433d6423SLionel Sambucname. For example, a process might be busy reading from a character device
320433d6423SLionel Sambucand another process sends a termination signal. The exit(2) that follows is
321433d6423SLionel Sambucsent by PM and is to be executed by the to-be-killed process itself. At this
322433d6423SLionel Sambucpoint there is contention for the fproc object that belongs to the process,
323433d6423SLionel Sambuchence the need for protection. This problem is solved in a simple way. Recall
324433d6423SLionel Sambucthat all worker threads are bound to a process. This also forms the basis of
325433d6423SLionel Sambucfproc locking: each worker thread acquires and holds the fproc lock for its
326433d6423SLionel Sambucassociated process for as long as it is processing work for that process.
327433d6423SLionel Sambuc
328433d6423SLionel SambucThere are two cases where a worker thread may hold the lock to more than one
329433d6423SLionel Sambucprocess. First, as mentioned, the reboot procedure is executed from a worker
330433d6423SLionel Sambucthread set in the context of the PM process, thus with the PM process entry
331433d6423SLionel Sambuclock held. The procedure itself then acquires a temporary lock on every other
332433d6423SLionel Sambucprocess in turn, in order to clean it up without interference. Thus, the PM
333433d6423SLionel Sambucprocess entry is higher up in the locking order than all other process entries.
334433d6423SLionel Sambuc
335433d6423SLionel SambucSecond, the exec(2) call is protected by a lock, and this exec lock is
336433d6423SLionel Sambuccurrently implemented as a lock on the VM process entry. The exec lock is
337433d6423SLionel Sambucacquired by a worker thread for the process performing the exec(2) call, and
338433d6423SLionel Sambucthus, the VM process entry is below all other process entries in the locking
339433d6423SLionel Sambucorder. The exec(2) call is protected by a lock for the following reason. VFS
340433d6423SLionel Sambucuses a number of variables on the heap to read ELF headers. They are on the
341433d6423SLionel Sambucheap due to their size; putting them on the stack would increase stack size
342433d6423SLionel Sambucdemands for worker threads. The exec call does blocking read calls and thus
343433d6423SLionel Sambucneeds exclusive access to these variables. However, only the exec(2) syscall
344433d6423SLionel Sambucneeds this lock.
345433d6423SLionel Sambuc
346433d6423SLionel SambucAccess to block special files needs to be exclusive. File Servers are
347433d6423SLionel Sambucresponsible for handling reads from and writes to block special files; if
348433d6423SLionel Sambuca block special file is on a device that is mounted, the FS responsible for
349433d6423SLionel Sambucthat mount point takes care of it, otherwise the FS that handles the root of
350433d6423SLionel Sambucthe file system is responsible. Due to mounting and unmounting file systems,
351433d6423SLionel Sambucthe FS handling a block special file may change. Locking the vnode is not
352433d6423SLionel Sambucenough since the inode can be on an entirely different File Server. Therefore,
353433d6423SLionel Sambucaccess to block special files must be mutually exclusive from concurrent
354433d6423SLionel Sambucmount(2)/umount(2) operations. However, when we're not accessing a block
355433d6423SLionel Sambucspecial file, we don't need this lock.
356433d6423SLionel Sambuc
357433d6423SLionel Sambuc=== Vmnt (file system) locking ===
358433d6423SLionel Sambuc## 4.5 Vmnt (file system) locking
359433d6423SLionel SambucVmnt locking cannot be seen completely separately from vnode locking. For
360433d6423SLionel Sambucexample, umount(2) fails if there are still in-use vnodes, which means that
361433d6423SLionel SambucFS requests [0] only involving in-use inodes do not have to acquire a vmnt
362433d6423SLionel Sambuclock. On the other hand, all other request do need a vmnt lock. Extrapolating
363433d6423SLionel Sambucthis to system calls this means that all system calls involving a file
364433d6423SLionel Sambucdescriptor don't need a vmnt lock and all other system calls (that make FS
365433d6423SLionel Sambucrequests) do need a vmnt lock.
366433d6423SLionel Sambuc{{{
367433d6423SLionel Sambuc-------------------------------------------------------------------------------
368433d6423SLionel Sambuc| Category          | System calls                                            |
369433d6423SLionel Sambuc+-------------------+---------------------------------------------------------+
370433d6423SLionel Sambuc| System calls with | access, chdir, chmod, chown, chroot, creat, dumpcore+,  |
371433d6423SLionel Sambuc| a path name       | exec, link, lstat, mkdir, mknod, mount, open, readlink, |
372433d6423SLionel Sambuc| argument          | rename, rmdir, stat, statvfs, symlink, truncate, umount,|
373433d6423SLionel Sambuc|                   | unlink, utime                                           |
374433d6423SLionel Sambuc+-------------------+---------------------------------------------------------+
375433d6423SLionel Sambuc| System calls with | close, fchdir, fcntl, fstat, fstatvfs, ftruncate,       |
376433d6423SLionel Sambuc| a file descriptor | getdents, ioctl, lseek, pipe, read, select, write       |
377433d6423SLionel Sambuc| argument          |                                                         |
378433d6423SLionel Sambuc+-------------------+---------------------------------------------------------+
379433d6423SLionel Sambuc| System calls with | fsync++, getvfsstat, sync, umask                        |
380433d6423SLionel Sambuc| other or no       |                                                         |
381433d6423SLionel Sambuc| arguments         |                                                         |
382433d6423SLionel Sambuc-------------------------------------------------------------------------------
383433d6423SLionel Sambuc}}}
384433d6423SLionel SambucTable 3: System call categories.
385433d6423SLionel Sambuc+ path name argument is implicit, the path name is "core.<pid>"
386433d6423SLionel Sambuc++ although fsync actually provides a file descriptor argument, it's only
387433d6423SLionel Sambucused to find the vmnt and not to do any actual operations on
388433d6423SLionel Sambuc
389433d6423SLionel SambucBefore we describe what kind of vmnt locks VFS applies to system calls with a
390433d6423SLionel Sambucpath name or other arguments, we need to make some notes on path lookup. Path
391433d6423SLionel Sambuclookups take arbitrary paths as input (relative and absolute). They can start
392433d6423SLionel Sambucat any vmnt (based on root directory and working directory of the process doing
393433d6423SLionel Sambucthe lookup) and visit any file system in arbitrary order, possibly visiting
394433d6423SLionel Sambucthe same file system more than once. As such, VFS can never tell in advance
395433d6423SLionel Sambucat which File Server a lookup will end. This has the following consequences:
396433d6423SLionel Sambuc * In the lookup procedure, only one vmnt must be locked at a time. When
397433d6423SLionel Sambuc moving from one vmnt to another, the first vmnt has to be unlocked before
398433d6423SLionel Sambuc acquiring the next lock to prevent deadlocks.
399433d6423SLionel Sambuc * The lookup procedure must lock each visited file system with TLL_READSER
400433d6423SLionel Sambuc and downgrade or upgrade to the lock type desired by the caller for the
401433d6423SLionel Sambuc destination file system (as VFS cannot know which file system is final). This
402433d6423SLionel Sambuc is to prevent deadlocks when a thread acquires a TLL_READSER on a vmnt and
403433d6423SLionel Sambuc another thread TLL_READ on the same vmnt. If the second thread is blocked
404433d6423SLionel Sambuc on the first thread due to it acquiring a lock on a vnode, the first thread
405433d6423SLionel Sambuc will be unable to upgrade a TLL_READSER lock to TLL_WRITE.
406433d6423SLionel Sambuc
407433d6423SLionel SambucWe use the following mapping for vmnt locks onto three-level lock types:
408433d6423SLionel Sambuc{{{
409433d6423SLionel Sambuc-------------------------------------------------------------------------------
410433d6423SLionel Sambuc| Lock type  |  Mapped to  | Used for                                         |
411433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+
412433d6423SLionel Sambuc| VMNT_READ  | TLL_READ    | Read-only operations and fully independent write |
413433d6423SLionel Sambuc|            |             | operations                                       |
414433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+
415433d6423SLionel Sambuc| VMNT_WRITE | TLL_READSER | Independent create and modify operations         |
416433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+
417433d6423SLionel Sambuc| VMNT_EXCL  | TLL_WRITE   | Delete and dependent write operations            |
418433d6423SLionel Sambuc-------------------------------------------------------------------------------
419433d6423SLionel Sambuc}}}
420433d6423SLionel SambucTable 4: vmnt to tll lock mapping
421433d6423SLionel Sambuc
422433d6423SLionel SambucThe following table shows a sub-categorization of system calls without a
423433d6423SLionel Sambucfile descriptor argument, together with their locking types and motivation
424433d6423SLionel Sambucas used by VFS.
425433d6423SLionel Sambuc{{{
426433d6423SLionel Sambuc-------------------------------------------------------------------------------
427433d6423SLionel Sambuc| Group       | System calls | Lock type  | Motivation                        |
428433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
429433d6423SLionel Sambuc| File open   | chdir,       | VMNT_READ  | These operations do not interfere |
430433d6423SLionel Sambuc| ops.        | chroot, exec,|            | with each other, as vnodes can be |
431433d6423SLionel Sambuc| (non-create)| open         |            | opened concurrently, and open     |
432433d6423SLionel Sambuc|             |              |            | operations do not affect          |
433433d6423SLionel Sambuc|             |              |            | replicated state.                 |
434433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
435433d6423SLionel Sambuc| File create-| creat,       | VMNT_EXCL  | File create ops. require mutual   |
436433d6423SLionel Sambuc| and-open    | open(O_CREAT)| for create | exclusion from concurrent file    |
437433d6423SLionel Sambuc| ops         |              | VMNT_WRITE | open ops. If the file already     |
438433d6423SLionel Sambuc|             |              | for open   | existed, the VMNT_WRITE lock that |
439433d6423SLionel Sambuc|             |              |            | is necessary for the lookup is    |
440433d6423SLionel Sambuc|             |              |            | not upgraded                      |
441433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
442433d6423SLionel Sambuc| File create-| pipe         | VMNT_READ  | These create nameless inodes      |
443433d6423SLionel Sambuc| unique-and- |              |            | which cannot be opened by means   |
444433d6423SLionel Sambuc| open ops.   |              |            | of a path. Their creation         |
445433d6423SLionel Sambuc|             |              |            | therefore does not interfere with |
446433d6423SLionel Sambuc|             |              |            | anything else                     |
447433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
448433d6423SLionel Sambuc| File create-| mkdir, mknod,| VMNT_WRITE | These operations do not affect    |
449433d6423SLionel Sambuc| only ops.   | slink        |            | any VFS state, and can therefore  |
450433d6423SLionel Sambuc|             |              |            | take place concurrently with open |
451433d6423SLionel Sambuc|             |              |            | operations                        |
452433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
453433d6423SLionel Sambuc| File info   | access, lstat| VMNT_READ  | These operations do not interfere |
454433d6423SLionel Sambuc| retrieval or| readlink,stat|            | with each other and do not modify |
455433d6423SLionel Sambuc| modification| utime        |            | replicated state                  |
456433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
457433d6423SLionel Sambuc| File        | chmod, chown,| VMNT_READ  | These operations do not interfere |
458433d6423SLionel Sambuc| modification| truncate     |            | with each other. They do need     |
459433d6423SLionel Sambuc|             |              |            | exclusive access on the vnode     |
460433d6423SLionel Sambuc|             |              |            | level                             |
461433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
462433d6423SLionel Sambuc| File link   | link         | VMNT_WRITE | Identical to file create-only     |
463433d6423SLionel Sambuc| ops.        |              |            | operations                        |
464433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
465433d6423SLionel Sambuc| File unlink | rmdir, unlink| VMNT_EXCL  | These must not interfere with     |
466433d6423SLionel Sambuc| ops.        |              |            | file create operations, to avoid  |
467433d6423SLionel Sambuc|             |              |            | the scenario where inodes are     |
468433d6423SLionel Sambuc|             |              |            | reused immediately. However, due  |
469433d6423SLionel Sambuc|             |              |            | to necessary path checks, the     |
470433d6423SLionel Sambuc|             |              |            | vmnt is first locked VMNT_WRITE   |
471433d6423SLionel Sambuc|             |              |            | and then upgraded                 |
472433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
473433d6423SLionel Sambuc| File rename | rename       | VMNT_EXCL  | Identical to file unlink          |
474433d6423SLionel Sambuc| ops.        |              |            | operations                        |
475433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+
476433d6423SLionel Sambuc| Non-file    | sync, umask, | VMNT_READ  | umask does not involve the file   |
477433d6423SLionel Sambuc| ops.        | getvfsstat   | or none    | system, so it does not need       |
478433d6423SLionel Sambuc|             |              |            | locks. sync does not alter state  |
479433d6423SLionel Sambuc|             |              |            | in VFS and  is atomic at the FS   |
480433d6423SLionel Sambuc|             |              |            | level. getvfsstat caches stats    |
481433d6423SLionel Sambuc|             |              |            | only and requires no exclusion.   |
482433d6423SLionel Sambuc-------------------------------------------------------------------------------
483433d6423SLionel Sambuc}}}
484433d6423SLionel SambucTable 5: System call without file descriptor argument sub-categorization
485433d6423SLionel Sambuc
486433d6423SLionel Sambuc=== Vnode (open file) locking ===
487433d6423SLionel Sambuc## 4.6 Vnode (open file) locking
488433d6423SLionel SambucCompared to vmnt locking, vnode locking is relatively straightforward. All
489433d6423SLionel Sambucread-only accesses to vnodes that merely read the vnode object's fields are
490433d6423SLionel Sambucallowed to be concurrent. Consequently, all accesses that change fields
491433d6423SLionel Sambucof a vnode object must be exclusive. This leaves us with creation and
492433d6423SLionel Sambucdestruction of vnode objects (and related to that, their reference counts);
493433d6423SLionel Sambucit's sufficient to serialize these accesses. This follows from the fact
494433d6423SLionel Sambucthat a vnode is only created when the first user opens it, and destroyed
495433d6423SLionel Sambucwhen the last user closes it. A open file in process A cannot be be closed
496433d6423SLionel Sambucby process B. Note that this also relies on the fact that a process can do
497433d6423SLionel Sambuconly one system call at a time. Kernel threads would violate this assumption.
498433d6423SLionel Sambuc
499433d6423SLionel SambucWe use the following mapping for vnode locks onto three-level lock types:
500433d6423SLionel Sambuc{{{
501433d6423SLionel Sambuc-------------------------------------------------------------------------------
502433d6423SLionel Sambuc| Lock type  |  Mapped to  | Used for                                         |
503433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+
504433d6423SLionel Sambuc| VNODE_READ | TLL_READ    | Read access to previously opened vnodes          |
505433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+
506433d6423SLionel Sambuc| VNODE_OPCL | TLL_READSER | Creation, opening, closing, and destruction of   |
507433d6423SLionel Sambuc|            |             | vnodes                                           |
508433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+
509433d6423SLionel Sambuc| VNODE_WRITE| TLL_WRITE   | Write access to previously opened vnodes         |
510433d6423SLionel Sambuc-------------------------------------------------------------------------------
511433d6423SLionel Sambuc}}}
512433d6423SLionel SambucTable 6: vnode to tll lock mapping
513433d6423SLionel Sambuc
514433d6423SLionel SambucWhen vnodes are destroyed, they are initially locked with VNODE_OPCL. After
515433d6423SLionel Sambucall, we're going to alter the reference count, so this must be serialized. If
516433d6423SLionel Sambucthe reference count then reaches zero we obtain exclusive access. This should
517433d6423SLionel Sambucalways be immediately possible unless there is a consistency problem. See
518433d6423SLionel Sambucsection 4.8 for an exhaustive listing of locking methods for all operations on
519433d6423SLionel Sambucvnodes.
520433d6423SLionel Sambuc
521433d6423SLionel Sambuc=== Filp (file position) locking ===
522433d6423SLionel Sambuc## 4.7 Filp (file position) locking
523433d6423SLionel SambucThe main fields of a filp object that are shared between various processes
524433d6423SLionel Sambuc(and by extension threads), and that can change after object creation,
525433d6423SLionel Sambucare filp_count and filp_pos. Writes to and reads from filp object must be
526433d6423SLionel Sambucmutually exclusive, as all system calls have to use the latest version. For
527433d6423SLionel Sambucexample, a read(2) call changes the file position (i.e., filp_pos), so two
528433d6423SLionel Sambucconcurrent reads must obtain exclusive access. Consequently, as even read
529433d6423SLionel Sambucoperations require exclusive access, filp object don't use three-level locks,
530433d6423SLionel Sambucbut only mutexes.
531433d6423SLionel Sambuc
532433d6423SLionel SambucSystem calls that involve a file descriptor often access both the filp and
533433d6423SLionel Sambucthe corresponding vnode. The locking order requires us to first lock the
534433d6423SLionel Sambucvnode and then the filp. This is taken care of at the filp level. Whenever
535433d6423SLionel Sambuca filp is locked, a lock on the vnode is acquired first. Conversely, when
536433d6423SLionel Sambuca filp is unlocked, the corresponding vnode is also unlocked. A convenient
537433d6423SLionel Sambucconsequence is that whenever a vnode is locked exclusively (VNODE_WRITE),
538433d6423SLionel Sambucall corresponding filps are implicitly locked. This is of particular use
539433d6423SLionel Sambucwhen multiple filps must be locked at the same time:
540433d6423SLionel Sambuc * When opening a named pipe, VFS must make sure that there is at most one   filp for the reader end and one filp for the writer end.
541433d6423SLionel Sambuc * Pipe readers and writers must be suspended in the absence of (respectively)  writers and readers.
542433d6423SLionel SambucBecause both filps are linked to the same vnode object (they are for the same
543433d6423SLionel Sambucpipe), it suffices to exclusively lock that vnode instead of both filp objects.
544433d6423SLionel Sambuc
545433d6423SLionel SambucIn some cases it can happen that a function that operates on a locked filp,
546433d6423SLionel Sambuccalls another function that triggers another lock on a different filp for
547433d6423SLionel Sambucthe same vnode. For example, close_filp. At some point, close_filp() calls
548433d6423SLionel Sambucrelease() which in turn will loop through the filp table looking for pipes
549433d6423SLionel Sambucbeing select(2)ed on. If there are, the select code will lock the filp and do
550433d6423SLionel Sambucoperations on it. This works fine when doing a select(2) call, but conflicts
551433d6423SLionel Sambucwith close(2) or exit(2). Lock_filp() makes an exception for this situation;
552433d6423SLionel Sambucif you've already locked a vnode with VNODE_OPCL or VNODE_WRITE when locking
553433d6423SLionel Sambuca filp, you obtain a "soft lock" on the vnode for this filp. This means
554433d6423SLionel Sambucthat lock_filp won't actually try to lock the vnode (which wouldn't work),
555433d6423SLionel Sambucbut flags the vnode as "skip unlock_vnode upon unlock_filp." Upon unlocking
556433d6423SLionel Sambucthe filp, the vnode remains locked, the soft lock is removed, and the filp
557433d6423SLionel Sambucmutex is released. Note that this scheme does not violate the locking order;
558433d6423SLionel Sambucthe vnode is (already) locked before the filp.
559433d6423SLionel Sambuc
560433d6423SLionel SambucA similar problem arises with create_pipe. In this case we obtain a new vnode
561433d6423SLionel Sambucobject, lock it, and obtain two new, locked, filp objects. If everything works
562433d6423SLionel Sambucout and the filp objects are linked to the same vnode, we run into trouble
563433d6423SLionel Sambucwhen unlocking both filps. The first filp being unlocked would work; the
564433d6423SLionel Sambucsecond filp doesn't have an associated vnode that's locked anymore. Therefore
565433d6423SLionel Sambucwe introduced a plural unlock_filps(filp1, filp2) that can unlock two filps
566433d6423SLionel Sambucthat both point to the same vnode.
567433d6423SLionel Sambuc
568433d6423SLionel Sambuc=== Lock characteristics per request type ===
569433d6423SLionel Sambuc## 4.8 Lock characteristics per request type
570433d6423SLionel SambucFor File Servers that support concurrent requests, it's useful to know which
571433d6423SLionel Sambuclocking guarantees VFS provides for vmnts and vnodes, so it can take that
572433d6423SLionel Sambucinto account when protecting internal data structures. READ = TLL_READ,
573433d6423SLionel SambucREADSER = TLL_READSER, WRITE = TLL_WRITE. The vnode locks applies to the
574433d6423SLionel Sambuc'''inode''' field in requests, unless the notes say otherwise.
575433d6423SLionel Sambuc{{{
576433d6423SLionel Sambuc------------------------------------------------------------------------------
577433d6423SLionel Sambuc| request      | vmnt    | vnode   | notes                                   |
578433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
579433d6423SLionel Sambuc| REQ_BREAD    |         | READ    | VFS serializes reads from and writes to |
580433d6423SLionel Sambuc|              |         |         | block special files                     |
581433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
582433d6423SLionel Sambuc| REQ_BWRITE   |         | WRITE   | VFS serializes reads from and writes to |
583433d6423SLionel Sambuc|              |         |         | block special files                     |
584433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
585433d6423SLionel Sambuc| REQ_CHMOD    | READ    | WRITE   | vmnt is only locked if file is not      |
586433d6423SLionel Sambuc|              |         |         | already opened                          |
587433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
588433d6423SLionel Sambuc| REQ_CHOWN    | READ    | WRITE   | vmnt is only locked if file is not      |
589433d6423SLionel Sambuc|              |         |         | already opened                          |
590433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
591433d6423SLionel Sambuc| REQ_CREATE   | WRITE   | WRITE   | The directory in which the file is      |
592433d6423SLionel Sambuc|              |         |         | created is write locked                 |
593433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
594433d6423SLionel Sambuc| REQ_FLUSH    |         |         | Mutually exclusive to REQ_BREAD and     |
595433d6423SLionel Sambuc|              |         |         | REQ_BWRITE                              |
596433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
597433d6423SLionel Sambuc| REQ_FTRUNC   | READ    | WRITE   | vmnt is only locked if file is not      |
598433d6423SLionel Sambuc|              |         |         | already opened                          |
599433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
600433d6423SLionel Sambuc| REQ_GETDENTS | READ    | READ    | vmnt is only locked if file is not      |
601433d6423SLionel Sambuc|              |         |         | already opened                          |
602433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
603433d6423SLionel Sambuc| REQ_INHIBREAD|         | READ    |                                         |
604433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
605433d6423SLionel Sambuc| REQ_LINK     | READSER | WRITE   | vfs_fs_link.inode is locked READ        |
606433d6423SLionel Sambuc|              |         |         | vfs_fs_link.dir_ino is locked WRITE     |
607433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
608433d6423SLionel Sambuc| REQ_LOOKUP   | READSER |         |                                         |
609433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
610433d6423SLionel Sambuc| REQ_MKDIR    | READSER | WRITE   |                                         |
611433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
612433d6423SLionel Sambuc| REQ_MKNOD    | READSER | WRITE   |                                         |
613433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
614433d6423SLionel Sambuc|REQ_MOUNTPOINT| WRITE   | WRITE   |                                         |
615433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
616433d6423SLionel Sambuc|REQ_NEW_DRIVER|         |         |                                         |
617433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
618433d6423SLionel Sambuc| REQ_NEWNODE  |         |         | Only sent to PFS                        |
619433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
620433d6423SLionel Sambuc| REQ_PUTNODE  |         | READSER | READSER when dropping all but one       |
621433d6423SLionel Sambuc|              |         | or WRITE| references. WRITE when final reference  |
622433d6423SLionel Sambuc|              |         |         | is dropped (i.e., no longer in use)     |
623433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
624433d6423SLionel Sambuc| REQ_RDLINK   | READ    | READ    | In some circumstances stricter locking  |
625433d6423SLionel Sambuc|              |         |         | might be applied, but not guaranteed    |
626433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
627433d6423SLionel Sambuc| REQ_READ     |         | READ    |                                         |
628433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
629433d6423SLionel Sambuc|REQ_READSUPER | WRITE   |         |                                         |
630433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
631433d6423SLionel Sambuc| REQ_RENAME   | WRITE   | WRITE   |                                         |
632433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
633433d6423SLionel Sambuc| REQ_RMDIR    | WRITE   | WRITE   |                                         |
634433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
635433d6423SLionel Sambuc| REQ_SLINK    | READSER | READ    |                                         |
636433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
637433d6423SLionel Sambuc| REQ_STAT     | READ    | READ    | vmnt is only locked if file is not      |
638433d6423SLionel Sambuc|              |         |         | already opened                          |
639433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
640433d6423SLionel Sambuc| REQ_STATVFS  | READ    | READ    | vmnt is only locked if file is not      |
641433d6423SLionel Sambuc|              |         |         | already opened                          |
642433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
643433d6423SLionel Sambuc| REQ_SYNC     | READ    |         |                                         |
644433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
645433d6423SLionel Sambuc| REQ_UNLINK   | WRITE   | WRITE   |                                         |
646433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
647433d6423SLionel Sambuc| REQ_UNMOUNT  | WRITE   |         |                                         |
648433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
649433d6423SLionel Sambuc| REQ_UTIME    | READ    | READ    |                                         |
650433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+
651433d6423SLionel Sambuc| REQ_WRITE    |         | WRITE   |                                         |
652433d6423SLionel Sambuc-----------------------------------------------------------------------------+
653433d6423SLionel Sambuc}}}
654433d6423SLionel SambucTable 7: VFS-FS requests locking guarantees
655433d6423SLionel Sambuc
656433d6423SLionel Sambuc== Recovery from driver crashes ==
657433d6423SLionel Sambuc## 5 Recovery from driver crashes
658*e3b8d4bbSDavid van MoolenbroekVFS can recover from block, character, and socket driver crashes. It can
659*e3b8d4bbSDavid van Moolenbroekrecover to some degree from a crashed File Server (which we can regard as a
660*e3b8d4bbSDavid van Moolenbroekdriver).
661433d6423SLionel Sambuc
662433d6423SLionel Sambuc=== Recovery from block drivers crashes ===
663433d6423SLionel Sambuc## 5.1 Recovery from block drivers crashes
664433d6423SLionel SambucWhen reading or writing, VFS doesn't communicate with block drivers directly,
665433d6423SLionel Sambucbut always through a File Server (the root File Server being default). If the
666433d6423SLionel Sambucblock driver crashes, the File Server does most of the work of the recovery
667433d6423SLionel Sambucprocedure. VFS loops through all open files for block special files that
668433d6423SLionel Sambucwere handled by this driver and reopens them. After that it sends the new
669433d6423SLionel Sambucendpoint to the File Server so it can finish the recover procedure. Finally,
670433d6423SLionel Sambucthe File Server will retry pending requests if possible. However, reopening
671433d6423SLionel Sambucfiles can cause the block driver to crash again. When that happens, VFS will
672433d6423SLionel Sambucstop the recovery. A driver can return ERESTART to VFS to tell it to retry
673433d6423SLionel Sambuca request. VFS does this with an arbitrary maximum of 5 attempts.
674433d6423SLionel Sambuc
675*e3b8d4bbSDavid van Moolenbroek=== Recovery from character and socket driver crashes ===
676*e3b8d4bbSDavid van Moolenbroek## 5.2 Recovery from character and socket driver crashes
677433d6423SLionel SambucWhile VFS used to support minimal recovery from character driver crashes, the
678433d6423SLionel Sambucadded complexity has so far proven to outweigh the benefits, especially since
679433d6423SLionel Sambucsuch crash recovery can never be fully transparent: it depends entirely on the
680433d6423SLionel Sambuccharacter device as to whether repeating an I/O request makes sense at all.
681433d6423SLionel SambucCurrently, all operations except close(2) on a file descriptor that identifies
682*e3b8d4bbSDavid van Moolenbroeka device on a crashed character or socket driver, will result in an EIO error.
683*e3b8d4bbSDavid van MoolenbroekIt is up to the application to reopen the character device or socket and retry
684*e3b8d4bbSDavid van Moolenbroekwhatever it was doing in the appropriate manner. In the future, automatic
685*e3b8d4bbSDavid van Moolenbroekreopen and I/O restart may be reintroduced for a limited subset of character
686*e3b8d4bbSDavid van Moolenbroekdrivers.
687433d6423SLionel Sambuc
688433d6423SLionel Sambuc=== Recovery from File Server crashes ===
689433d6423SLionel Sambuc## 5.3 Recovery from File Server crashes
690433d6423SLionel SambucAt the time of writing we cannot recover from crashed File Servers. When
691433d6423SLionel SambucVFS detects it has to clean up the remnants of a File Server process (i.e.,
692433d6423SLionel Sambucthrough an exit(2)), it marks all associated file descriptors as invalid
693433d6423SLionel Sambucand cancels ongoing and pending requests to that File Server. Resources that
694433d6423SLionel Sambucwere in use by the File Server are cleaned up.
695433d6423SLionel Sambuc
696433d6423SLionel Sambuc[0] http://wiki.minix3.org/en/DevelopersGuide/VfsFsProtocol
697433d6423SLionel Sambuc
698433d6423SLionel Sambuc[1] http://www.cs.vu.nl/~dcvmoole/minix/blockchar.txt
699433d6423SLionel Sambuc
700433d6423SLionel Sambuc[2] http://www.minix3.org/theses/moolenbroek-multimedia-support.pdf
701