1433d6423SLionel Sambuc## Description of VFS Thomas Veerman 21-3-2013 2433d6423SLionel Sambuc## This file is organized such that it can be read both in a Wiki and on 3433d6423SLionel Sambuc## the MINIX terminal using e.g. vi or less. Please, keep the file in the 4433d6423SLionel Sambuc## source tree as the canonical version and copy changes into the Wiki. 5433d6423SLionel Sambuc#pragma section-numbers 2 6433d6423SLionel Sambuc 7433d6423SLionel Sambuc= VFS internals = 8433d6423SLionel Sambuc 9433d6423SLionel Sambuc<<TableOfContents(2)>> 10433d6423SLionel Sambuc 11433d6423SLionel Sambuc## Table of contents 12433d6423SLionel Sambuc## 1 ..... General description of responsibilities 13433d6423SLionel Sambuc## 2 ..... General architecture 14433d6423SLionel Sambuc## 3 ..... Worker threads 15433d6423SLionel Sambuc## 4 ..... Locking 16433d6423SLionel Sambuc## 4.1 .... Locking requirements 17433d6423SLionel Sambuc## 4.2 .... Three-level Lock 18433d6423SLionel Sambuc## 4.3 .... Data structures subject to locking 19433d6423SLionel Sambuc## 4.4 .... Locking order 20433d6423SLionel Sambuc## 4.5 .... Vmnt (file system) locking 21433d6423SLionel Sambuc## 4.6 .... Vnode (open file) locking 22433d6423SLionel Sambuc## 4.7 .... Filp (file position) locking 23433d6423SLionel Sambuc## 4.8 .... Lock characteristics per request type 24433d6423SLionel Sambuc## 5 ..... Recovery from driver crashes 25433d6423SLionel Sambuc## 5.1 .... Recovery from block drivers crashes 26433d6423SLionel Sambuc## 5.2 .... Recovery from character driver crashes 27433d6423SLionel Sambuc## 5.3 .... Recovery from File Server crashes 28433d6423SLionel Sambuc 29433d6423SLionel Sambuc== General description of responsibilities == 30433d6423SLionel Sambuc## 1 General description of responsibilities 31433d6423SLionel SambucVFS implements the file system in cooperation with one or more File Servers 32433d6423SLionel Sambuc(FS). The File Servers take care of the actual file system on a partition. That 33433d6423SLionel Sambucis, they interpret the data structure on disk, write and read data to/from 34433d6423SLionel Sambucdisk, etc. VFS sits on top of those File Servers and communicates with 35433d6423SLionel Sambucthem. Looking inside VFS, we can identify several roles. First, a role of VFS 36433d6423SLionel Sambucis to handle most POSIX system calls that are supported by Minix. Additionally, 37433d6423SLionel Sambucit supports a few calls necessary for libc. The following system calls are 38433d6423SLionel Sambuchandled by VFS: 39433d6423SLionel Sambuc 40433d6423SLionel Sambucaccess, chdir, chmod, chown, chroot, close, creat, fchdir, fcntl, fstat, 41433d6423SLionel Sambucfstatvfs, fsync, ftruncate, getdents, getvfsstat, ioctl, link, lseek, 42433d6423SLionel Sambuclstat, mkdir, mknod, mount, open, pipe2, read, readlink, rename, rmdir, select, 43433d6423SLionel Sambucstat, statvfs, symlink, sync, truncate, umask, umount, unlink, utimes, write. 44433d6423SLionel Sambuc 45433d6423SLionel SambucSecond, it maintains part of the state belonging to a process (process state is 46433d6423SLionel Sambucspread out over the kernel, VM, PM, and VFS). For example, it maintains state 47433d6423SLionel Sambucfor select(2) calls, file descriptors and file positions. Also, it cooperates 48433d6423SLionel Sambucwith the Process Manager to handle the fork, exec, and exit system calls. 49433d6423SLionel SambucThird, VFS keeps track of endpoints that are supposed to be drivers for 50*e3b8d4bbSDavid van Moolenbroekcharacter or block special files, as well as for socket protocol families. 51*e3b8d4bbSDavid van MoolenbroekFile Servers can be regarded as drivers for block special files, although they 52*e3b8d4bbSDavid van Moolenbroekare handled entirely different compared to other drivers. 53433d6423SLionel Sambuc 54433d6423SLionel SambucThe following diagram depicts how a read() on a file in /home is being handled: 55433d6423SLionel Sambuc{{{ 56433d6423SLionel Sambuc ---------------- 57433d6423SLionel Sambuc | user process | 58433d6423SLionel Sambuc ---------------- 59433d6423SLionel Sambuc ^ ^ 60433d6423SLionel Sambuc | | 61433d6423SLionel Sambuc read(2) \ 62433d6423SLionel Sambuc | \ 63433d6423SLionel Sambuc V \ 64433d6423SLionel Sambuc ---------------- | 65433d6423SLionel Sambuc | VFS | | 66433d6423SLionel Sambuc ---------------- | 67433d6423SLionel Sambuc ^ | 68433d6423SLionel Sambuc | | 69433d6423SLionel Sambuc V | 70433d6423SLionel Sambuc ------- -------- --------- 71433d6423SLionel Sambuc | MFS | | MFS | | MFS | 72433d6423SLionel Sambuc | / | | /usr | | /home | 73433d6423SLionel Sambuc ------- -------- --------- 74433d6423SLionel Sambuc}}} 75433d6423SLionel SambucDiagram 1: handling of read(2) system call 76433d6423SLionel Sambuc 77433d6423SLionel SambucThe user process executes the read system call which is delivered to VFS. VFS 78433d6423SLionel Sambucverifies the read is done on a valid (open) file and forwards the request 79433d6423SLionel Sambucto the FS responsible for the file system on which the file resides. The FS 80433d6423SLionel Sambucreads the data, copies it directly to the user process, and replies to VFS 81433d6423SLionel Sambucit has executed the request. Subsequently, VFS replies to the user process 82433d6423SLionel Sambucthe operation is done and the user process continues to run. 83433d6423SLionel Sambuc 84433d6423SLionel Sambuc== General architecture == 85433d6423SLionel Sambuc## 2 General architecture 86433d6423SLionel SambucVFS works roughly identical to every other server and driver in Minix; it 87433d6423SLionel Sambucfetches a message (internally referred to as a job in some cases), executes 88433d6423SLionel Sambucthe request embedded in the message, returns a reply, and fetches the next 89433d6423SLionel Sambucjob. There are several sources for new jobs: from user processes, from PM, from 90433d6423SLionel Sambucthe kernel, and from suspended jobs inside VFS itself (suspended operations 91*e3b8d4bbSDavid van Moolenbroekon pipes, locks, character special files, or sockets). File Servers are 92*e3b8d4bbSDavid van Moolenbroekregarded as normal user processes in this case, but their abilities are 93*e3b8d4bbSDavid van Moolenbroeklimited. This is to prevent deadlocks. Once a job is received, a worker thread 94*e3b8d4bbSDavid van Moolenbroekstarts executing it. During the lifetime of a job, the worker thread might need 95433d6423SLionel Sambucto talk to several File Servers. The protocol VFS speaks with File Servers 96433d6423SLionel Sambucis fully documented on the Wiki at [0]. The protocol fields are defined in 97433d6423SLionel Sambuc<minix/vfsif.h>. If the job is an operation on a character or block special 98433d6423SLionel Sambucfile and the need to talk to a driver arises, VFS uses the Character and 99433d6423SLionel SambucBlock Device Protocol. See [1]. This is sadly not official documentation, 100433d6423SLionel Sambucbut it is an accurate description of how it works. Luckily, driver writers 101433d6423SLionel Sambuccan use the libchardriver and libblockdriver libraries and don't have to 102433d6423SLionel Sambucknow the details of the protocol. 103433d6423SLionel Sambuc 104433d6423SLionel Sambuc== Worker threads == 105433d6423SLionel Sambuc## 3 Worker threads 106433d6423SLionel SambucUpon start up, VFS spawns a configurable amount of worker threads. The 107433d6423SLionel Sambucmain thread fetches requests and replies, and hands them off to idle or 108433d6423SLionel Sambucreply-pending workers, respectively. If no worker threads are available, 109433d6423SLionel Sambucthe request is queued. All standard system calls are handled by such worker 110433d6423SLionel Sambucthreads. One of the threads is reserved to handle new requests from system 111433d6423SLionel Sambucprocesses (i.e., File Servers and drivers) when there are no normal worker 112433d6423SLionel Sambucthreads available; all normal threads might be blocked on a single worker 113433d6423SLionel Sambucthread that caused a system process to send a request on its own. To unblock 114433d6423SLionel Sambucall normal threads, we need to reserve one spare thread to handle that 115433d6423SLionel Sambucsituation. VFS drives all File Servers and drivers asynchronously. While 116433d6423SLionel Sambucwaiting for a reply, a worker thread is blocked and other workers can keep 117433d6423SLionel Sambucprocessing requests. Upon reply the worker thread is unblocked. 118433d6423SLionel Sambuc 119433d6423SLionel SambucAs mentioned above, the main thread is responsible for retrieving new jobs and 120433d6423SLionel Sambucreplies to current jobs and start or unblock the proper worker thread. 121433d6423SLionel SambucDriver replies are processed directly from the main thread. As a consequence, 122433d6423SLionel Sambucthese processing routines may not block their calling thread. In some cases, 123433d6423SLionel Sambucthese routines may resume a thread that is blocked waiting for the reply. This 124433d6423SLionel Sambucis always the case for block driver replies, and may or may not be the case for 125*e3b8d4bbSDavid van Moolenbroekcharacter and socket driver replies. The character and socket driver reply 126*e3b8d4bbSDavid van Moolenbroekprocessing routines may also unblock suspended processes which in turn generate 127*e3b8d4bbSDavid van Moolenbroeknew jobs to be handled by the main loop (e.g., suspended reads and writes on 128*e3b8d4bbSDavid van Moolenbroekpipes). So depending on the reply a new thread may have to be started. 129433d6423SLionel Sambuc 130433d6423SLionel SambucWorker threads are strictly tied to a process, and each process can have at 131433d6423SLionel Sambucmost one worker thread running for it. Generally speaking, there are two types 132433d6423SLionel Sambucof work supported by worker threads: normal work, and work from PM. The main 133433d6423SLionel Sambucsubtype of normal work is the handling of a system call made by the process 134433d6423SLionel Sambucitself. The process is blocked while VFS is handling the system call, so no new 135433d6423SLionel Sambucsystem call can arrive from a process while VFS has not completed a previous 136433d6423SLionel Sambucsystem call from that process. For that reason, if there are no worker threads 137433d6423SLionel Sambucavailable to handle the work, the work is queued in the corresponding process 138433d6423SLionel Sambucentry of the fproc table. 139433d6423SLionel Sambuc 140433d6423SLionel SambucThe other main type of work consists of requests from PM. The protocol PM 141433d6423SLionel Sambucspeaks with VFS is asynchronous. PM is allowed to send up to one request per 142433d6423SLionel Sambucprocess to VFS, in addition to a request to initiate a reboot. Most jobs from 143433d6423SLionel SambucPM are taken care of immediately by the main thread, but some jobs require a 144433d6423SLionel Sambucworker thread context (to be able to sleep) and/or serialization with normal 145433d6423SLionel Sambucwork. Therefore, each process may have a PM request queued for execution, also 146433d6423SLionel Sambucin the fproc table. Managing proper queuing, addition, and execution of both 147433d6423SLionel Sambucnormal and PM work is the responsibility of the worker thread infrastructure. 148433d6423SLionel Sambuc 149433d6423SLionel SambucThere are several special tasks that require a worker thread, and these are 150433d6423SLionel Sambucimplemented as normal work associated with a certain special process that does 151433d6423SLionel Sambucnot make regular VFS calls anyway. For example, the initial ramdisk mount 152433d6423SLionel Sambucprocedure uses a thread associated with the VFS process. Some of these special 153433d6423SLionel Sambuctasks require protection against being started multiple times at once, as this 154433d6423SLionel Sambucis not only undesirable but also disallowed. The full list of worker thread 155433d6423SLionel Sambuctask types and subtypes is shown in Table 1. 156433d6423SLionel Sambuc 157433d6423SLionel Sambuc{{{ 158433d6423SLionel Sambuc------------------------------------------------------------------------- 159433d6423SLionel Sambuc| Worker thread task | Type | Association | May use spare? | 160433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+ 161433d6423SLionel Sambuc| system call from process | normal | calling process | if system proc | 162433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+ 163433d6423SLionel Sambuc| resumed pipe operation | normal | calling process | no | 164433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+ 165433d6423SLionel Sambuc| postponed PM request | PM | target process | no | 166433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+ 167433d6423SLionel Sambuc| DS event notification | normal | DS | yes | 168433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+ 169433d6423SLionel Sambuc| initial ramdisk mounting | normal | VFS | no | 170433d6423SLionel Sambuc+---------------------------+--------+-----------------+----------------+ 171433d6423SLionel Sambuc| reboot sequence | normal | PM | no | 172433d6423SLionel Sambuc------------------------------------------------------------------------- 173433d6423SLionel Sambuc}}} 174433d6423SLionel SambucTable 1: worker thread work types and subtypes 175433d6423SLionel Sambuc 176433d6423SLionel SambucCommunication with block drivers is asynchronous, but at this time, access to 177433d6423SLionel Sambucthese drivers is serialized on a per-driver basis. File Servers are treated 178433d6423SLionel Sambucdifferently. VFS was designed to be able to send requests concurrently to File 179433d6423SLionel SambucServers, although at the time of writing there are no File Servers that can 180433d6423SLionel Sambucactually make use of that functionality. To identify which reply from an FS 181433d6423SLionel Sambucbelongs to which worker thread, all requests have an embedded transaction 182433d6423SLionel Sambucidentification number (a magic number + thread id encoded in the mtype field of 183433d6423SLionel Sambuca message) which the FS has to echo upon reply. Because the range of valid 184433d6423SLionel Sambuctransaction IDs is isolated from valid system call numbers, VFS can use that ID 185433d6423SLionel Sambucto differentiate between replies from File Servers and actual new system calls 186433d6423SLionel Sambucfrom FSes. Using this mechanism VFS is able to support FUSE and ProcFS. 187433d6423SLionel Sambuc 188433d6423SLionel Sambuc== Locking == 189433d6423SLionel Sambuc## 4 Locking 190433d6423SLionel SambucTo ensure correct execution of system calls, worker threads sometimes need 191433d6423SLionel Sambuccertain objects within VFS to remain unchanged during thread suspension 192433d6423SLionel Sambucand resumption (i.e., when they need to communicate with a driver or File 193433d6423SLionel SambucServer). Threads keep most state on the stack, but there are a few global 194433d6423SLionel Sambucvariables that require protection: the fproc table, vmnt table, vnode table, 195433d6423SLionel Sambucand filp table. Other tables such as lock table, select table, and dmap table 196433d6423SLionel Sambucdon't require protection by means of exclusive access. There it's required 197433d6423SLionel Sambucand enough to simply mark an entry in use. 198433d6423SLionel Sambuc 199433d6423SLionel Sambuc=== Locking requirements === 200433d6423SLionel Sambuc## 4.1 Locking requirements 201433d6423SLionel SambucVFS implements the locking model described in [2]. For completeness of this 202433d6423SLionel Sambucdocument we'll describe it here, too. The requirements are based on a threading 203433d6423SLionel Sambucpackage that is non-preemptive. VFS must guarantee correct functioning with 204433d6423SLionel Sambucseveral, semi-concurrently executing threads in any arbitrary order. The 205433d6423SLionel Sambuclatter requirement follows from the fact that threads need service from 206433d6423SLionel Sambucother components like File Servers and drivers, and they may take any time 207433d6423SLionel Sambucto complete requests. 208433d6423SLionel Sambuc 1. Consistency of replicated values. Several system calls rely on VFS keeping a replicated representation of data in File Servers (e.g., file sizes, file modes, etc.). 209433d6423SLionel Sambuc 1. Isolation of system calls. Many system calls involve multiple requests to FSes. Concurrent requests from other processes must not lead to otherwise impossible results (e.g., a chmod operation on a file cannot fail halfway through because it's suddenly unlinked or moved). 210433d6423SLionel Sambuc 1. Integrity of objects. From the point of view of threads, obtaining mutual exclusion is a potentially blocking operation. The integrity of any objects used across blocking calls must be guaranteed (e.g., the file mode in a vnode must remain intact not only when talking to other components, but also when obtaining a lock on a filp). 211433d6423SLionel Sambuc 1. No deadlock. Not one call may cause another call to never complete. Deadlock situations are typically the result of two or more threads that each hold exclusive access to one resource and want exclusive access to the resource held by the other thread. These resources are a) data (global variables) and b) worker threads. 212433d6423SLionel Sambuc a. Conflicts between locking of different types of objects can be avoided by keeping a locking order: objects of different type must always be locked in the same order. If multiple objects of the same type are to be locked, then first a "common denominator" higher up in the locking order must be locked. 213433d6423SLionel Sambuc a. Some threads can only run to completion when another thread does work on their behalf. Examples of this are drivers and file servers that do system calls on their own (e.g., ProcFS, PFS/UNIX Domain Sockets, FUSE) or crashing components (e.g., a driver for a character special file that crashes during a request; a second thread is required to handle resource clean up or driver restart before the first thread can abort or retry the request). 214433d6423SLionel Sambuc 1. No starvation. VFS must guarantee that every system call completes in finite time (e.g., an infinite stream of reads must never completely block writes). Furthermore, we want to maximize parallelism to improve performance. This leads to: 215433d6423SLionel Sambuc 1. A request to one File Server must not block access to other FS processes. This means that most forms of locking cannot take place at a global level, and must at most take place on the file system level. 216433d6423SLionel Sambuc 1. No read-only operation on a regular file must block an independent read call to that file. In particular, (read-only) open and close operations may not block such reads, and multiple independent reads on the same file must be able to take place concurrently (i.e., reads that do not share a file position between their file descriptors). 217433d6423SLionel Sambuc 218433d6423SLionel Sambuc=== Three-level Lock === 219433d6423SLionel Sambuc## 4.2 Three-level Lock 220433d6423SLionel SambucFrom the requirements it follows that we need at least two locking types: read 221433d6423SLionel Sambucand write locks. Concurrent reads are allowed, but writes are exclusive both 222433d6423SLionel Sambucfrom reads and from each other. However, in a lot of cases it possible to use 223433d6423SLionel Sambuca third locking type that is in between read and write lock: the serialize 224433d6423SLionel Sambuclock. This is implemented in the three-level lock [2]. The three-level 225433d6423SLionel Sambuclock provides: 226433d6423SLionel SambucTLL_READ: allows an unlimited number of threads to hold the lock with the 227433d6423SLionel Sambucsame type (both the thread itself and other threads); N * concurrent. 228433d6423SLionel SambucTLL_READSER: also allows an unlimited number of threads with type TLL_READ, 229433d6423SLionel Sambucbut only one thread can obtain serial access to the lock; N * concurrent + 230433d6423SLionel Sambuc1 * serial. 231433d6423SLionel SambucTLL_WRITE: provides full mutual exclusion; 1 * exclusive + 0 * concurrent + 232433d6423SLionel Sambuc0 * serial. 233433d6423SLionel SambucIn absence of TLL_READ locks, a TLL_READSER is identical to TLL_WRITE. However, 234433d6423SLionel SambucTLL_READSER never blocks concurrent TLL_READ access. TLL_READSER can be 235433d6423SLionel Sambucupgraded to TLL_WRITE; the thread will block until the last TLL_READ lock 236433d6423SLionel Sambucleaves and new TLL_READ locks are blocked. Locks can be downgraded to a 237433d6423SLionel Sambuclower type. The three-level lock is implemented using two FIFO queues with 238433d6423SLionel Sambucwrite-bias. This guarantees no starvation. 239433d6423SLionel Sambuc 240433d6423SLionel Sambuc=== Data structures subject to locking === 241433d6423SLionel Sambuc## 4.3 Data structures subject to locking 242433d6423SLionel SambucVFS has a number of global data structures. See Table 2. 243433d6423SLionel Sambuc{{{ 244433d6423SLionel Sambuc-------------------------------------------------------------------- 245433d6423SLionel Sambuc| Structure | Object description | 246433d6423SLionel Sambuc+------------+-----------------------------------------------------| 247433d6423SLionel Sambuc| fproc | Process (includes process's file descriptors) | 248433d6423SLionel Sambuc+------------+-----------------------------------------------------| 249433d6423SLionel Sambuc| vmnt | Virtual mount; a mounted file system | 250433d6423SLionel Sambuc+------------+-----------------------------------------------------| 251433d6423SLionel Sambuc| vnode | Virtual node; an open file | 252433d6423SLionel Sambuc+------------+-----------------------------------------------------| 253433d6423SLionel Sambuc| filp | File position into an open file | 254433d6423SLionel Sambuc+------------+-----------------------------------------------------| 255433d6423SLionel Sambuc| lock | File region locking state for an open file | 256433d6423SLionel Sambuc+------------+-----------------------------------------------------| 257433d6423SLionel Sambuc| select | State for an in-progress select(2) call | 258433d6423SLionel Sambuc+------------+-----------------------------------------------------| 259433d6423SLionel Sambuc| dmap | Mapping from major device number to a device driver | 260433d6423SLionel Sambuc-------------------------------------------------------------------- 261433d6423SLionel Sambuc}}} 262433d6423SLionel SambucTable 2: VFS object types. 263433d6423SLionel Sambuc 264433d6423SLionel SambucAn fproc object is a process. An fproc object is created by fork(2) 265433d6423SLionel Sambucand destroyed by exit(2) (which may, or may not, be instantiated from the 266433d6423SLionel Sambucprocess itself). It is identified by its endpoint number ('fp_endpoint') 267433d6423SLionel Sambucand process id ('fp_pid'). Both are unique although in general the endpoint 268433d6423SLionel Sambucnumber is used throughout the system. 269433d6423SLionel SambucA vmnt object is a mounted file system. It is created by mount(2) and destroyed 270433d6423SLionel Sambucby umount(2). It is identified by a device number ('m_dev') and FS endpoint 271433d6423SLionel Sambucnumber ('m_fs_e'); both are unique to each vmnt object. There is always a 272433d6423SLionel Sambucsingle process that handles a file system on a device and a device cannot 273433d6423SLionel Sambucbe mounted twice. 274433d6423SLionel SambucA vnode object is the VFS representation of an open inode on the file 275433d6423SLionel Sambucsystem. A vnode object is created when a first process opens or creates the 276433d6423SLionel Sambuccorresponding file and is destroyed when the last process, which has that 277433d6423SLionel Sambucfile open, closes it. It is identified by a combination of FS endpoint number 278433d6423SLionel Sambuc('v_fs_e') and inode number of that file system ('v_inode_nr'). A vnode 279433d6423SLionel Sambucmight be mapped to another file system; the actual reading and writing is 280433d6423SLionel Sambuchandled by a different endpoint. This has no effect on locking. 281433d6423SLionel SambucA filp object contains a file position within a file. It is created when a file 282433d6423SLionel Sambucis opened or anonymous pipe created and destroyed when the last user (i.e., 283433d6423SLionel Sambucprocess) closes it. A file descriptor always points to a single filp. A filp 284433d6423SLionel Sambucalways point to a single vnode, although not all vnodes are pointed to by a 285433d6423SLionel Sambucfilp. A filp has a reference count ('filp_count') which is identical to the 286433d6423SLionel Sambucnumber of file descriptors pointing to it. It can be increased by a dup(2) 287433d6423SLionel Sambucor fork(2). A filp can therefore be shared by multiple processes. 288433d6423SLionel SambucA lock object keeps information about locking of file regions. This has 289433d6423SLionel Sambucnothing to do with the threading type of locking. The lock objects require 290433d6423SLionel Sambucno locking protection and won't be discussed further. 291433d6423SLionel SambucA select object keeps information on a select(2) operation that cannot 292433d6423SLionel Sambucbe fulfilled immediately (waiting for timeout or file descriptors not 293433d6423SLionel Sambucready). They are identified by their owner ('requestor'); a pointer to the 294433d6423SLionel Sambucfproc table. A null pointer means not in use. A select object can be used by 295433d6423SLionel Sambuconly one process and a process can do only one select(2) at a time. Select(2) 296433d6423SLionel Sambucoperates on filps and is organized in such a way that it is sufficient to 297433d6423SLionel Sambucapply locking on individual filps and not on select objects themselves. They 298433d6423SLionel Sambucwon't be discussed further. 299433d6423SLionel SambucA dmap object is a mapping from a device number to a device driver. A device 300433d6423SLionel Sambucdriver can have multiple device numbers associated (e.g., TTY). Access to 301433d6423SLionel Sambuca driver is exclusive when it uses the synchronous driver protocol. 302433d6423SLionel Sambuc 303433d6423SLionel Sambuc=== Locking order === 304433d6423SLionel Sambuc## 4.4 Locking order 305433d6423SLionel SambucBased on the description in the previous section, we need protection for 306433d6423SLionel Sambucfproc, vmnt, vnode, and filp objects. To prevent deadlocks as a result of 307433d6423SLionel Sambucobject locking, we need to define a strict locking order. In VFS we use the 308433d6423SLionel Sambucfollowing order: 309433d6423SLionel Sambuc 310433d6423SLionel Sambuc{{{ 311433d6423SLionel Sambucfproc > [exec] > vmnt > vnode > filp > [block special file] > [dmap] 312433d6423SLionel Sambuc}}} 313433d6423SLionel Sambuc 314433d6423SLionel SambucThat is, no thread may lock an fproc object while holding a vmnt lock, 315433d6423SLionel Sambucand no thread may lock a vmnt object while holding an (associated) vnode, etc. 316433d6423SLionel Sambuc 317433d6423SLionel SambucFproc needs protection because processes themselves can initiate system 318433d6423SLionel Sambuccalls, but also PM can cause system calls that have to be executed in their 319433d6423SLionel Sambucname. For example, a process might be busy reading from a character device 320433d6423SLionel Sambucand another process sends a termination signal. The exit(2) that follows is 321433d6423SLionel Sambucsent by PM and is to be executed by the to-be-killed process itself. At this 322433d6423SLionel Sambucpoint there is contention for the fproc object that belongs to the process, 323433d6423SLionel Sambuchence the need for protection. This problem is solved in a simple way. Recall 324433d6423SLionel Sambucthat all worker threads are bound to a process. This also forms the basis of 325433d6423SLionel Sambucfproc locking: each worker thread acquires and holds the fproc lock for its 326433d6423SLionel Sambucassociated process for as long as it is processing work for that process. 327433d6423SLionel Sambuc 328433d6423SLionel SambucThere are two cases where a worker thread may hold the lock to more than one 329433d6423SLionel Sambucprocess. First, as mentioned, the reboot procedure is executed from a worker 330433d6423SLionel Sambucthread set in the context of the PM process, thus with the PM process entry 331433d6423SLionel Sambuclock held. The procedure itself then acquires a temporary lock on every other 332433d6423SLionel Sambucprocess in turn, in order to clean it up without interference. Thus, the PM 333433d6423SLionel Sambucprocess entry is higher up in the locking order than all other process entries. 334433d6423SLionel Sambuc 335433d6423SLionel SambucSecond, the exec(2) call is protected by a lock, and this exec lock is 336433d6423SLionel Sambuccurrently implemented as a lock on the VM process entry. The exec lock is 337433d6423SLionel Sambucacquired by a worker thread for the process performing the exec(2) call, and 338433d6423SLionel Sambucthus, the VM process entry is below all other process entries in the locking 339433d6423SLionel Sambucorder. The exec(2) call is protected by a lock for the following reason. VFS 340433d6423SLionel Sambucuses a number of variables on the heap to read ELF headers. They are on the 341433d6423SLionel Sambucheap due to their size; putting them on the stack would increase stack size 342433d6423SLionel Sambucdemands for worker threads. The exec call does blocking read calls and thus 343433d6423SLionel Sambucneeds exclusive access to these variables. However, only the exec(2) syscall 344433d6423SLionel Sambucneeds this lock. 345433d6423SLionel Sambuc 346433d6423SLionel SambucAccess to block special files needs to be exclusive. File Servers are 347433d6423SLionel Sambucresponsible for handling reads from and writes to block special files; if 348433d6423SLionel Sambuca block special file is on a device that is mounted, the FS responsible for 349433d6423SLionel Sambucthat mount point takes care of it, otherwise the FS that handles the root of 350433d6423SLionel Sambucthe file system is responsible. Due to mounting and unmounting file systems, 351433d6423SLionel Sambucthe FS handling a block special file may change. Locking the vnode is not 352433d6423SLionel Sambucenough since the inode can be on an entirely different File Server. Therefore, 353433d6423SLionel Sambucaccess to block special files must be mutually exclusive from concurrent 354433d6423SLionel Sambucmount(2)/umount(2) operations. However, when we're not accessing a block 355433d6423SLionel Sambucspecial file, we don't need this lock. 356433d6423SLionel Sambuc 357433d6423SLionel Sambuc=== Vmnt (file system) locking === 358433d6423SLionel Sambuc## 4.5 Vmnt (file system) locking 359433d6423SLionel SambucVmnt locking cannot be seen completely separately from vnode locking. For 360433d6423SLionel Sambucexample, umount(2) fails if there are still in-use vnodes, which means that 361433d6423SLionel SambucFS requests [0] only involving in-use inodes do not have to acquire a vmnt 362433d6423SLionel Sambuclock. On the other hand, all other request do need a vmnt lock. Extrapolating 363433d6423SLionel Sambucthis to system calls this means that all system calls involving a file 364433d6423SLionel Sambucdescriptor don't need a vmnt lock and all other system calls (that make FS 365433d6423SLionel Sambucrequests) do need a vmnt lock. 366433d6423SLionel Sambuc{{{ 367433d6423SLionel Sambuc------------------------------------------------------------------------------- 368433d6423SLionel Sambuc| Category | System calls | 369433d6423SLionel Sambuc+-------------------+---------------------------------------------------------+ 370433d6423SLionel Sambuc| System calls with | access, chdir, chmod, chown, chroot, creat, dumpcore+, | 371433d6423SLionel Sambuc| a path name | exec, link, lstat, mkdir, mknod, mount, open, readlink, | 372433d6423SLionel Sambuc| argument | rename, rmdir, stat, statvfs, symlink, truncate, umount,| 373433d6423SLionel Sambuc| | unlink, utime | 374433d6423SLionel Sambuc+-------------------+---------------------------------------------------------+ 375433d6423SLionel Sambuc| System calls with | close, fchdir, fcntl, fstat, fstatvfs, ftruncate, | 376433d6423SLionel Sambuc| a file descriptor | getdents, ioctl, lseek, pipe, read, select, write | 377433d6423SLionel Sambuc| argument | | 378433d6423SLionel Sambuc+-------------------+---------------------------------------------------------+ 379433d6423SLionel Sambuc| System calls with | fsync++, getvfsstat, sync, umask | 380433d6423SLionel Sambuc| other or no | | 381433d6423SLionel Sambuc| arguments | | 382433d6423SLionel Sambuc------------------------------------------------------------------------------- 383433d6423SLionel Sambuc}}} 384433d6423SLionel SambucTable 3: System call categories. 385433d6423SLionel Sambuc+ path name argument is implicit, the path name is "core.<pid>" 386433d6423SLionel Sambuc++ although fsync actually provides a file descriptor argument, it's only 387433d6423SLionel Sambucused to find the vmnt and not to do any actual operations on 388433d6423SLionel Sambuc 389433d6423SLionel SambucBefore we describe what kind of vmnt locks VFS applies to system calls with a 390433d6423SLionel Sambucpath name or other arguments, we need to make some notes on path lookup. Path 391433d6423SLionel Sambuclookups take arbitrary paths as input (relative and absolute). They can start 392433d6423SLionel Sambucat any vmnt (based on root directory and working directory of the process doing 393433d6423SLionel Sambucthe lookup) and visit any file system in arbitrary order, possibly visiting 394433d6423SLionel Sambucthe same file system more than once. As such, VFS can never tell in advance 395433d6423SLionel Sambucat which File Server a lookup will end. This has the following consequences: 396433d6423SLionel Sambuc * In the lookup procedure, only one vmnt must be locked at a time. When 397433d6423SLionel Sambuc moving from one vmnt to another, the first vmnt has to be unlocked before 398433d6423SLionel Sambuc acquiring the next lock to prevent deadlocks. 399433d6423SLionel Sambuc * The lookup procedure must lock each visited file system with TLL_READSER 400433d6423SLionel Sambuc and downgrade or upgrade to the lock type desired by the caller for the 401433d6423SLionel Sambuc destination file system (as VFS cannot know which file system is final). This 402433d6423SLionel Sambuc is to prevent deadlocks when a thread acquires a TLL_READSER on a vmnt and 403433d6423SLionel Sambuc another thread TLL_READ on the same vmnt. If the second thread is blocked 404433d6423SLionel Sambuc on the first thread due to it acquiring a lock on a vnode, the first thread 405433d6423SLionel Sambuc will be unable to upgrade a TLL_READSER lock to TLL_WRITE. 406433d6423SLionel Sambuc 407433d6423SLionel SambucWe use the following mapping for vmnt locks onto three-level lock types: 408433d6423SLionel Sambuc{{{ 409433d6423SLionel Sambuc------------------------------------------------------------------------------- 410433d6423SLionel Sambuc| Lock type | Mapped to | Used for | 411433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+ 412433d6423SLionel Sambuc| VMNT_READ | TLL_READ | Read-only operations and fully independent write | 413433d6423SLionel Sambuc| | | operations | 414433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+ 415433d6423SLionel Sambuc| VMNT_WRITE | TLL_READSER | Independent create and modify operations | 416433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+ 417433d6423SLionel Sambuc| VMNT_EXCL | TLL_WRITE | Delete and dependent write operations | 418433d6423SLionel Sambuc------------------------------------------------------------------------------- 419433d6423SLionel Sambuc}}} 420433d6423SLionel SambucTable 4: vmnt to tll lock mapping 421433d6423SLionel Sambuc 422433d6423SLionel SambucThe following table shows a sub-categorization of system calls without a 423433d6423SLionel Sambucfile descriptor argument, together with their locking types and motivation 424433d6423SLionel Sambucas used by VFS. 425433d6423SLionel Sambuc{{{ 426433d6423SLionel Sambuc------------------------------------------------------------------------------- 427433d6423SLionel Sambuc| Group | System calls | Lock type | Motivation | 428433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 429433d6423SLionel Sambuc| File open | chdir, | VMNT_READ | These operations do not interfere | 430433d6423SLionel Sambuc| ops. | chroot, exec,| | with each other, as vnodes can be | 431433d6423SLionel Sambuc| (non-create)| open | | opened concurrently, and open | 432433d6423SLionel Sambuc| | | | operations do not affect | 433433d6423SLionel Sambuc| | | | replicated state. | 434433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 435433d6423SLionel Sambuc| File create-| creat, | VMNT_EXCL | File create ops. require mutual | 436433d6423SLionel Sambuc| and-open | open(O_CREAT)| for create | exclusion from concurrent file | 437433d6423SLionel Sambuc| ops | | VMNT_WRITE | open ops. If the file already | 438433d6423SLionel Sambuc| | | for open | existed, the VMNT_WRITE lock that | 439433d6423SLionel Sambuc| | | | is necessary for the lookup is | 440433d6423SLionel Sambuc| | | | not upgraded | 441433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 442433d6423SLionel Sambuc| File create-| pipe | VMNT_READ | These create nameless inodes | 443433d6423SLionel Sambuc| unique-and- | | | which cannot be opened by means | 444433d6423SLionel Sambuc| open ops. | | | of a path. Their creation | 445433d6423SLionel Sambuc| | | | therefore does not interfere with | 446433d6423SLionel Sambuc| | | | anything else | 447433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 448433d6423SLionel Sambuc| File create-| mkdir, mknod,| VMNT_WRITE | These operations do not affect | 449433d6423SLionel Sambuc| only ops. | slink | | any VFS state, and can therefore | 450433d6423SLionel Sambuc| | | | take place concurrently with open | 451433d6423SLionel Sambuc| | | | operations | 452433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 453433d6423SLionel Sambuc| File info | access, lstat| VMNT_READ | These operations do not interfere | 454433d6423SLionel Sambuc| retrieval or| readlink,stat| | with each other and do not modify | 455433d6423SLionel Sambuc| modification| utime | | replicated state | 456433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 457433d6423SLionel Sambuc| File | chmod, chown,| VMNT_READ | These operations do not interfere | 458433d6423SLionel Sambuc| modification| truncate | | with each other. They do need | 459433d6423SLionel Sambuc| | | | exclusive access on the vnode | 460433d6423SLionel Sambuc| | | | level | 461433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 462433d6423SLionel Sambuc| File link | link | VMNT_WRITE | Identical to file create-only | 463433d6423SLionel Sambuc| ops. | | | operations | 464433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 465433d6423SLionel Sambuc| File unlink | rmdir, unlink| VMNT_EXCL | These must not interfere with | 466433d6423SLionel Sambuc| ops. | | | file create operations, to avoid | 467433d6423SLionel Sambuc| | | | the scenario where inodes are | 468433d6423SLionel Sambuc| | | | reused immediately. However, due | 469433d6423SLionel Sambuc| | | | to necessary path checks, the | 470433d6423SLionel Sambuc| | | | vmnt is first locked VMNT_WRITE | 471433d6423SLionel Sambuc| | | | and then upgraded | 472433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 473433d6423SLionel Sambuc| File rename | rename | VMNT_EXCL | Identical to file unlink | 474433d6423SLionel Sambuc| ops. | | | operations | 475433d6423SLionel Sambuc+-------------+--------------+------------+-----------------------------------+ 476433d6423SLionel Sambuc| Non-file | sync, umask, | VMNT_READ | umask does not involve the file | 477433d6423SLionel Sambuc| ops. | getvfsstat | or none | system, so it does not need | 478433d6423SLionel Sambuc| | | | locks. sync does not alter state | 479433d6423SLionel Sambuc| | | | in VFS and is atomic at the FS | 480433d6423SLionel Sambuc| | | | level. getvfsstat caches stats | 481433d6423SLionel Sambuc| | | | only and requires no exclusion. | 482433d6423SLionel Sambuc------------------------------------------------------------------------------- 483433d6423SLionel Sambuc}}} 484433d6423SLionel SambucTable 5: System call without file descriptor argument sub-categorization 485433d6423SLionel Sambuc 486433d6423SLionel Sambuc=== Vnode (open file) locking === 487433d6423SLionel Sambuc## 4.6 Vnode (open file) locking 488433d6423SLionel SambucCompared to vmnt locking, vnode locking is relatively straightforward. All 489433d6423SLionel Sambucread-only accesses to vnodes that merely read the vnode object's fields are 490433d6423SLionel Sambucallowed to be concurrent. Consequently, all accesses that change fields 491433d6423SLionel Sambucof a vnode object must be exclusive. This leaves us with creation and 492433d6423SLionel Sambucdestruction of vnode objects (and related to that, their reference counts); 493433d6423SLionel Sambucit's sufficient to serialize these accesses. This follows from the fact 494433d6423SLionel Sambucthat a vnode is only created when the first user opens it, and destroyed 495433d6423SLionel Sambucwhen the last user closes it. A open file in process A cannot be be closed 496433d6423SLionel Sambucby process B. Note that this also relies on the fact that a process can do 497433d6423SLionel Sambuconly one system call at a time. Kernel threads would violate this assumption. 498433d6423SLionel Sambuc 499433d6423SLionel SambucWe use the following mapping for vnode locks onto three-level lock types: 500433d6423SLionel Sambuc{{{ 501433d6423SLionel Sambuc------------------------------------------------------------------------------- 502433d6423SLionel Sambuc| Lock type | Mapped to | Used for | 503433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+ 504433d6423SLionel Sambuc| VNODE_READ | TLL_READ | Read access to previously opened vnodes | 505433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+ 506433d6423SLionel Sambuc| VNODE_OPCL | TLL_READSER | Creation, opening, closing, and destruction of | 507433d6423SLionel Sambuc| | | vnodes | 508433d6423SLionel Sambuc+------------+-------------+--------------------------------------------------+ 509433d6423SLionel Sambuc| VNODE_WRITE| TLL_WRITE | Write access to previously opened vnodes | 510433d6423SLionel Sambuc------------------------------------------------------------------------------- 511433d6423SLionel Sambuc}}} 512433d6423SLionel SambucTable 6: vnode to tll lock mapping 513433d6423SLionel Sambuc 514433d6423SLionel SambucWhen vnodes are destroyed, they are initially locked with VNODE_OPCL. After 515433d6423SLionel Sambucall, we're going to alter the reference count, so this must be serialized. If 516433d6423SLionel Sambucthe reference count then reaches zero we obtain exclusive access. This should 517433d6423SLionel Sambucalways be immediately possible unless there is a consistency problem. See 518433d6423SLionel Sambucsection 4.8 for an exhaustive listing of locking methods for all operations on 519433d6423SLionel Sambucvnodes. 520433d6423SLionel Sambuc 521433d6423SLionel Sambuc=== Filp (file position) locking === 522433d6423SLionel Sambuc## 4.7 Filp (file position) locking 523433d6423SLionel SambucThe main fields of a filp object that are shared between various processes 524433d6423SLionel Sambuc(and by extension threads), and that can change after object creation, 525433d6423SLionel Sambucare filp_count and filp_pos. Writes to and reads from filp object must be 526433d6423SLionel Sambucmutually exclusive, as all system calls have to use the latest version. For 527433d6423SLionel Sambucexample, a read(2) call changes the file position (i.e., filp_pos), so two 528433d6423SLionel Sambucconcurrent reads must obtain exclusive access. Consequently, as even read 529433d6423SLionel Sambucoperations require exclusive access, filp object don't use three-level locks, 530433d6423SLionel Sambucbut only mutexes. 531433d6423SLionel Sambuc 532433d6423SLionel SambucSystem calls that involve a file descriptor often access both the filp and 533433d6423SLionel Sambucthe corresponding vnode. The locking order requires us to first lock the 534433d6423SLionel Sambucvnode and then the filp. This is taken care of at the filp level. Whenever 535433d6423SLionel Sambuca filp is locked, a lock on the vnode is acquired first. Conversely, when 536433d6423SLionel Sambuca filp is unlocked, the corresponding vnode is also unlocked. A convenient 537433d6423SLionel Sambucconsequence is that whenever a vnode is locked exclusively (VNODE_WRITE), 538433d6423SLionel Sambucall corresponding filps are implicitly locked. This is of particular use 539433d6423SLionel Sambucwhen multiple filps must be locked at the same time: 540433d6423SLionel Sambuc * When opening a named pipe, VFS must make sure that there is at most one filp for the reader end and one filp for the writer end. 541433d6423SLionel Sambuc * Pipe readers and writers must be suspended in the absence of (respectively) writers and readers. 542433d6423SLionel SambucBecause both filps are linked to the same vnode object (they are for the same 543433d6423SLionel Sambucpipe), it suffices to exclusively lock that vnode instead of both filp objects. 544433d6423SLionel Sambuc 545433d6423SLionel SambucIn some cases it can happen that a function that operates on a locked filp, 546433d6423SLionel Sambuccalls another function that triggers another lock on a different filp for 547433d6423SLionel Sambucthe same vnode. For example, close_filp. At some point, close_filp() calls 548433d6423SLionel Sambucrelease() which in turn will loop through the filp table looking for pipes 549433d6423SLionel Sambucbeing select(2)ed on. If there are, the select code will lock the filp and do 550433d6423SLionel Sambucoperations on it. This works fine when doing a select(2) call, but conflicts 551433d6423SLionel Sambucwith close(2) or exit(2). Lock_filp() makes an exception for this situation; 552433d6423SLionel Sambucif you've already locked a vnode with VNODE_OPCL or VNODE_WRITE when locking 553433d6423SLionel Sambuca filp, you obtain a "soft lock" on the vnode for this filp. This means 554433d6423SLionel Sambucthat lock_filp won't actually try to lock the vnode (which wouldn't work), 555433d6423SLionel Sambucbut flags the vnode as "skip unlock_vnode upon unlock_filp." Upon unlocking 556433d6423SLionel Sambucthe filp, the vnode remains locked, the soft lock is removed, and the filp 557433d6423SLionel Sambucmutex is released. Note that this scheme does not violate the locking order; 558433d6423SLionel Sambucthe vnode is (already) locked before the filp. 559433d6423SLionel Sambuc 560433d6423SLionel SambucA similar problem arises with create_pipe. In this case we obtain a new vnode 561433d6423SLionel Sambucobject, lock it, and obtain two new, locked, filp objects. If everything works 562433d6423SLionel Sambucout and the filp objects are linked to the same vnode, we run into trouble 563433d6423SLionel Sambucwhen unlocking both filps. The first filp being unlocked would work; the 564433d6423SLionel Sambucsecond filp doesn't have an associated vnode that's locked anymore. Therefore 565433d6423SLionel Sambucwe introduced a plural unlock_filps(filp1, filp2) that can unlock two filps 566433d6423SLionel Sambucthat both point to the same vnode. 567433d6423SLionel Sambuc 568433d6423SLionel Sambuc=== Lock characteristics per request type === 569433d6423SLionel Sambuc## 4.8 Lock characteristics per request type 570433d6423SLionel SambucFor File Servers that support concurrent requests, it's useful to know which 571433d6423SLionel Sambuclocking guarantees VFS provides for vmnts and vnodes, so it can take that 572433d6423SLionel Sambucinto account when protecting internal data structures. READ = TLL_READ, 573433d6423SLionel SambucREADSER = TLL_READSER, WRITE = TLL_WRITE. The vnode locks applies to the 574433d6423SLionel Sambuc'''inode''' field in requests, unless the notes say otherwise. 575433d6423SLionel Sambuc{{{ 576433d6423SLionel Sambuc------------------------------------------------------------------------------ 577433d6423SLionel Sambuc| request | vmnt | vnode | notes | 578433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 579433d6423SLionel Sambuc| REQ_BREAD | | READ | VFS serializes reads from and writes to | 580433d6423SLionel Sambuc| | | | block special files | 581433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 582433d6423SLionel Sambuc| REQ_BWRITE | | WRITE | VFS serializes reads from and writes to | 583433d6423SLionel Sambuc| | | | block special files | 584433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 585433d6423SLionel Sambuc| REQ_CHMOD | READ | WRITE | vmnt is only locked if file is not | 586433d6423SLionel Sambuc| | | | already opened | 587433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 588433d6423SLionel Sambuc| REQ_CHOWN | READ | WRITE | vmnt is only locked if file is not | 589433d6423SLionel Sambuc| | | | already opened | 590433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 591433d6423SLionel Sambuc| REQ_CREATE | WRITE | WRITE | The directory in which the file is | 592433d6423SLionel Sambuc| | | | created is write locked | 593433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 594433d6423SLionel Sambuc| REQ_FLUSH | | | Mutually exclusive to REQ_BREAD and | 595433d6423SLionel Sambuc| | | | REQ_BWRITE | 596433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 597433d6423SLionel Sambuc| REQ_FTRUNC | READ | WRITE | vmnt is only locked if file is not | 598433d6423SLionel Sambuc| | | | already opened | 599433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 600433d6423SLionel Sambuc| REQ_GETDENTS | READ | READ | vmnt is only locked if file is not | 601433d6423SLionel Sambuc| | | | already opened | 602433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 603433d6423SLionel Sambuc| REQ_INHIBREAD| | READ | | 604433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 605433d6423SLionel Sambuc| REQ_LINK | READSER | WRITE | vfs_fs_link.inode is locked READ | 606433d6423SLionel Sambuc| | | | vfs_fs_link.dir_ino is locked WRITE | 607433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 608433d6423SLionel Sambuc| REQ_LOOKUP | READSER | | | 609433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 610433d6423SLionel Sambuc| REQ_MKDIR | READSER | WRITE | | 611433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 612433d6423SLionel Sambuc| REQ_MKNOD | READSER | WRITE | | 613433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 614433d6423SLionel Sambuc|REQ_MOUNTPOINT| WRITE | WRITE | | 615433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 616433d6423SLionel Sambuc|REQ_NEW_DRIVER| | | | 617433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 618433d6423SLionel Sambuc| REQ_NEWNODE | | | Only sent to PFS | 619433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 620433d6423SLionel Sambuc| REQ_PUTNODE | | READSER | READSER when dropping all but one | 621433d6423SLionel Sambuc| | | or WRITE| references. WRITE when final reference | 622433d6423SLionel Sambuc| | | | is dropped (i.e., no longer in use) | 623433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 624433d6423SLionel Sambuc| REQ_RDLINK | READ | READ | In some circumstances stricter locking | 625433d6423SLionel Sambuc| | | | might be applied, but not guaranteed | 626433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 627433d6423SLionel Sambuc| REQ_READ | | READ | | 628433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 629433d6423SLionel Sambuc|REQ_READSUPER | WRITE | | | 630433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 631433d6423SLionel Sambuc| REQ_RENAME | WRITE | WRITE | | 632433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 633433d6423SLionel Sambuc| REQ_RMDIR | WRITE | WRITE | | 634433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 635433d6423SLionel Sambuc| REQ_SLINK | READSER | READ | | 636433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 637433d6423SLionel Sambuc| REQ_STAT | READ | READ | vmnt is only locked if file is not | 638433d6423SLionel Sambuc| | | | already opened | 639433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 640433d6423SLionel Sambuc| REQ_STATVFS | READ | READ | vmnt is only locked if file is not | 641433d6423SLionel Sambuc| | | | already opened | 642433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 643433d6423SLionel Sambuc| REQ_SYNC | READ | | | 644433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 645433d6423SLionel Sambuc| REQ_UNLINK | WRITE | WRITE | | 646433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 647433d6423SLionel Sambuc| REQ_UNMOUNT | WRITE | | | 648433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 649433d6423SLionel Sambuc| REQ_UTIME | READ | READ | | 650433d6423SLionel Sambuc+--------------+---------+---------+-----------------------------------------+ 651433d6423SLionel Sambuc| REQ_WRITE | | WRITE | | 652433d6423SLionel Sambuc-----------------------------------------------------------------------------+ 653433d6423SLionel Sambuc}}} 654433d6423SLionel SambucTable 7: VFS-FS requests locking guarantees 655433d6423SLionel Sambuc 656433d6423SLionel Sambuc== Recovery from driver crashes == 657433d6423SLionel Sambuc## 5 Recovery from driver crashes 658*e3b8d4bbSDavid van MoolenbroekVFS can recover from block, character, and socket driver crashes. It can 659*e3b8d4bbSDavid van Moolenbroekrecover to some degree from a crashed File Server (which we can regard as a 660*e3b8d4bbSDavid van Moolenbroekdriver). 661433d6423SLionel Sambuc 662433d6423SLionel Sambuc=== Recovery from block drivers crashes === 663433d6423SLionel Sambuc## 5.1 Recovery from block drivers crashes 664433d6423SLionel SambucWhen reading or writing, VFS doesn't communicate with block drivers directly, 665433d6423SLionel Sambucbut always through a File Server (the root File Server being default). If the 666433d6423SLionel Sambucblock driver crashes, the File Server does most of the work of the recovery 667433d6423SLionel Sambucprocedure. VFS loops through all open files for block special files that 668433d6423SLionel Sambucwere handled by this driver and reopens them. After that it sends the new 669433d6423SLionel Sambucendpoint to the File Server so it can finish the recover procedure. Finally, 670433d6423SLionel Sambucthe File Server will retry pending requests if possible. However, reopening 671433d6423SLionel Sambucfiles can cause the block driver to crash again. When that happens, VFS will 672433d6423SLionel Sambucstop the recovery. A driver can return ERESTART to VFS to tell it to retry 673433d6423SLionel Sambuca request. VFS does this with an arbitrary maximum of 5 attempts. 674433d6423SLionel Sambuc 675*e3b8d4bbSDavid van Moolenbroek=== Recovery from character and socket driver crashes === 676*e3b8d4bbSDavid van Moolenbroek## 5.2 Recovery from character and socket driver crashes 677433d6423SLionel SambucWhile VFS used to support minimal recovery from character driver crashes, the 678433d6423SLionel Sambucadded complexity has so far proven to outweigh the benefits, especially since 679433d6423SLionel Sambucsuch crash recovery can never be fully transparent: it depends entirely on the 680433d6423SLionel Sambuccharacter device as to whether repeating an I/O request makes sense at all. 681433d6423SLionel SambucCurrently, all operations except close(2) on a file descriptor that identifies 682*e3b8d4bbSDavid van Moolenbroeka device on a crashed character or socket driver, will result in an EIO error. 683*e3b8d4bbSDavid van MoolenbroekIt is up to the application to reopen the character device or socket and retry 684*e3b8d4bbSDavid van Moolenbroekwhatever it was doing in the appropriate manner. In the future, automatic 685*e3b8d4bbSDavid van Moolenbroekreopen and I/O restart may be reintroduced for a limited subset of character 686*e3b8d4bbSDavid van Moolenbroekdrivers. 687433d6423SLionel Sambuc 688433d6423SLionel Sambuc=== Recovery from File Server crashes === 689433d6423SLionel Sambuc## 5.3 Recovery from File Server crashes 690433d6423SLionel SambucAt the time of writing we cannot recover from crashed File Servers. When 691433d6423SLionel SambucVFS detects it has to clean up the remnants of a File Server process (i.e., 692433d6423SLionel Sambucthrough an exit(2)), it marks all associated file descriptors as invalid 693433d6423SLionel Sambucand cancels ongoing and pending requests to that File Server. Resources that 694433d6423SLionel Sambucwere in use by the File Server are cleaned up. 695433d6423SLionel Sambuc 696433d6423SLionel Sambuc[0] http://wiki.minix3.org/en/DevelopersGuide/VfsFsProtocol 697433d6423SLionel Sambuc 698433d6423SLionel Sambuc[1] http://www.cs.vu.nl/~dcvmoole/minix/blockchar.txt 699433d6423SLionel Sambuc 700433d6423SLionel Sambuc[2] http://www.minix3.org/theses/moolenbroek-multimedia-support.pdf 701