1.Dd February 22, 2001 2.Dt vnode 9 3.Os OpenBSD 2.9 4.Sh NAME 5.Nm vnode 6.Nd an overview of vnodes 7.Sh DESCRIPTION 8The vnode is the kernel object that corresponds to a file (actually, 9a file, a directory, a fifo, a domain socket, a symlink, or a device). 10.Pp 11Each vnode has a set of methods corresponding to file operations 12(vop_open, vop_read, vop_write, vop_rename, vop_mkdir, vop_close). 13These methods are implemented by the individual file systems and 14are dispatched through function pointers. 15.Pp 16In addition, the VFS has functions for maintaining a pool of vnodes, 17associating vnodes with mount points, and associating vnodes with buffers. 18The individual file systems cannot override these functions. 19As such, individual file systems cannot allocate their own vnodes. 20.Pp 21In general, the contents of a struct vnode should not be examined or 22modified by the users of vnode methods. 23There are some rather common exceptions detailed later in this document. 24.Pp 25The vast majority of the vnode functions CANNOT be called from interrupt 26context. 27.Ss Vnode pool 28All the vnodes in the kernel are allocated out of a shared pool. 29The 30.Xr getnewvnode 9 31system call returns a fresh vnode from the vnode 32pool. 33The vnode returned has a reference count (v_usecount) of 1. 34.Pp 35The 36.Xr vref 9 37call increments the reference count on the vnode. 38The 39.Xr vrele 9 40and 41.Xr vput 9 42calls decrement the reference count. 43In addition, the 44.Xr vput 9 45call also releases the vnode lock. 46.Pp 47When a vnode's reference count becomes zero, the vnode pool places it 48a pool of free vnodes, eligible to be assigned to a different file. 49The vnode pool calls the 50.Xr vop_inactive 9 51method to inform the file system that the reference count has reached zero. 52.Pp 53When placed in the pool of free vnodes, the vnode is not otherwise altered. 54In fact, it can often be retrieved before it is reassigned to a different file. 55This is useful when the system closes a file and opens it again in rapid 56succession. 57The 58.Xr vget 9 59call is used to revive the vnode. 60Note, callers should ensure the vnode 61they get back has not been reassigned to a different file. 62.Pp 63When the vnode pool decides to reclaim the vnode to satisfy a getnewvnode 64request, it calls the 65.Xr vop_reclaim 9 66method. 67File systems often use this method to free any file-system specific data they 68attach to the vnode. 69.Pp 70A file system can force a vnode with a reference count of zero 71to be reclaimed earlier by calling the 72.Xr vrecycle 9 73call. 74The 75.Xr vrecycle 9 76call is a null operation if the reference count is greater than zero. 77.Pp 78The 79.Xr vgone 9 80and 81.Xr vgonel 9 82calls will force the pool to reclaim 83the vnode even if it has a non-zero reference count. 84If the vnode had a non-zero reference count, the vnode is then assigned 85an operations vector corresponding to the "dead" file system. 86In this operations vector, most operations return errors. 87.Ss Vnode locks 88Note to beginners: locks don't actually prevent memory from being read 89or overwritten. 90Instead, they are an object that, where used, allows only one piece of code 91to proceed through the locked section. 92If you do not surround a stretch of code with a lock, it can and probably 93will eventually be executed simultaneously with other stretches of code 94(including stretches ). 95Chances are the results will be unexpected and disappointing to both the 96user and you. 97.Pp 98The vnode actually has three different types of lock: the vnode lock, 99the vnode interlock, and the vnode reclamation lock (VXLOCK). 100.Ss The vnode lock 101The most general lock is the vnode lock. 102This lock is acquired by calling 103.Xr vn_lock 9 104and released by calling 105.Xr vn_unlock 9 . 106The vnode lock is used to serialize operations through the file system for 107a given file when there are multiple concurrent requests on the same file. 108Many file system functions require that you hold the vnode lock on entry. 109The vnode lock may be held when sleeping. 110.Pp 111The 112.Xr revoke 2 113and forcible unmount features in BSD UNIX allows a 114user to invalidate files and their associated vnodes at almost any 115time, even if there are active open files on it. 116While in a region of code protected by the vnode lock, the process is 117guaranteed that the vnode will not be reclaimed or invalidated. 118.Pp 119The vnode lock is a multiple-reader or single-writer lock. 120An exclusive vnode lock may be acquired multiple times by the same 121process. 122.Pp 123The vnode lock is somewhat messy because it is used for many purposes. 124Some clients of the vnode interface use it to try to bundle a series 125of VOP_ method calls into an atomic group. 126Many file systems rely on it to prevent race conditions in updating file 127system specific data structures (as opposed to having their own locks). 128.Pp 129The implementation of the vnode lock is the responsibility of the individual 130file systems. 131Not all file system implement it. 132.Pp 133To prevent deadlocks, when acquiring locks on multiple vnodes, the lock 134of parent directory must be acquired before the lock on the child directory. 135.Pp 136Interrupt handlers must not acquire vnode locks. 137.Ss Vnode interlock 138The vnode interlock (vp->v_interlock) is a spinlock. 139It is useful on multi-processor systems for acquiring a quick exclusive 140lock on the contents of the vnode. 141It MUST NOT be held while sleeping. 142(What fields does it cover? What about splbio/interrupt issues?) 143.Pp 144Operations on this lock are a no-op on uniprocessor systems. 145.Ss Other Vnode synchronization 146The vnode reclamation lock (VXLOCK) is used to prevent multiple 147processes from entering the vnode reclamation code. 148It is also used as a flag to indicate that reclamation is in progress. 149The VXWANT flag is set by processes that wish to woken up when reclamation 150is finished. 151.Pp 152The 153.Xr vwaitforio 9 154call is used for to wait for all outstanding write I/Os associated with a 155vnode to complete. 156.Ss Version number/capability 157The vnode capability, v_id, is a 32-bit version number on the vnode. 158Every time a vnode is reassigned to a new file, the vnode capability 159is changed. 160This is used by code that wish to keep pointers to vnodes but doesn't want 161to hold a reference (e.g., caches). 162The code keeps both a vnode * and a copy of the capability. 163The code can later compare the vnode's capability to its copy and see 164if the vnode still points to the same file. 165.Pp 166Note: for this to work, memory assigned to hold a struct vnode can 167only be used for another purpose when all pointers to it have disappeared. 168Since the vnode pool has no way of knowing when all pointers have 169disappeared, it never frees memory it has allocated for vnodes. 170.Ss Vnode fields 171Most of the fields of the vnode structure should be treated as opaque 172and only manipulated through the proper APIs. 173This section describes the fields that are manipulated directly. 174.Pp 175The v_flag attribute contains random flags related to various functions. 176They are summarized in table ... 177.Pp 178The v_tag attribute indicates what file system the vnode belongs to. 179Very little code actually uses this attribute and its use is deprecated. 180Programmers should seriously consider using more object-oriented approaches 181(e.g. function tables). 182There is no safe way of defining new v_tags for loadable file systems. 183The v_tag attribute is read-only. 184.Pp 185The v_type attribute indicates what type of file (e.g. directory, 186regular, fifo) this vnode is. 187This is used by the generic code to ensure for various checks. 188For example, the 189.Xr read 2 190system call returns an error when a read is attempted on a directory. 191.Pp 192The v_data attribute allows a file system to attach piece of file 193system specific memory to the vnode. 194This contains information about the file that is specific to 195the file system. 196.Pp 197The v_numoutput attribute indicates the number of pending synchronous 198and asynchronous writes on the vnode. 199It does not track the number of dirty buffers attached to the vnode. 200The attribute is used by code like fsync to wait for all writes 201to complete before returning to the user. 202This attribute must be manipulated at splbio(). 203.Pp 204The v_writecount attribute tracks the number of write calls pending 205on the vnode. 206.Ss RULES 207The vast majority of vnode functions may not be called from interrupt 208context. 209The exceptions are bgetvp and brelvp. 210The following fields of the vnode are manipulated at interrupt level: 211v_numoutput, v_holdcnt, v_dirtyblkhd, v_cleanblkhd, v_bioflag, v_freelist, 212and v_synclist. 213Any accesses to these field should be protected by splbio, 214unless you are certain that there is no chance an interrupt handler 215will modify them. 216.Pp 217A vnode will only be reassigned to another file when its reference count 218reaches zero and the vnode lock is freed. 219.Pp 220A vnode will not be reclaimed as long as the vnode lock is held. 221If the vnode reference count drops to zero while a process is holding 222the vnode lock, the vnode MAY be queued for reclamation. 223Increasing the reference count from 0 to 1 while holding the lock will 224most likely cause intermittent kernel panics. 225.Sh HISTORY 226This document first appeared in 227.Ox 2.9 . 228