1.\" $OpenBSD: vnode.9,v 1.28 2011/07/18 12:03:45 thib Exp $ 2.\" 3.\" Copyright (c) 2001 Constantine Sapuntzakis 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. The name of the author may not be used to endorse or promote products 13.\" derived from this software without specific prior written permission. 14.\" 15.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, 16.\" INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY 17.\" AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL 18.\" THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 19.\" EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 20.\" PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; 21.\" OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, 22.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR 23.\" OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 24.\" ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 25.\" 26.Dd $Mdocdate: July 18 2011 $ 27.Dt VNODE 9 28.Os 29.Sh NAME 30.Nm vnode 31.Nd an overview of vnodes 32.Sh DESCRIPTION 33A 34.Em vnode 35is an object in kernel memory that speaks the 36.Ux 37file interface (open, read, write, close, readdir, etc.). 38Vnodes can represent files, directories, FIFOs, domain sockets, block devices, 39character devices. 40.Pp 41Each vnode has a set of methods which start with the string 42.Dq VOP_ . 43These methods include 44.Fn VOP_OPEN , 45.Fn VOP_READ , 46.Fn VOP_WRITE , 47.Fn VOP_RENAME , 48.Fn VOP_CLOSE , 49and 50.Fn VOP_MKDIR . 51Many of these methods correspond closely to the equivalent 52file system call \- 53.Xr open 2 , 54.Xr read 2 , 55.Xr write 2 , 56.Xr rename 2 , 57etc. 58Each file system (FFS, NFS, etc.) provides implementations for these methods. 59.Pp 60The Virtual File System library (see 61.Xr vfs 9 ) 62maintains a pool of vnodes. 63File systems cannot allocate their own vnodes; they must use the functions 64provided by the VFS to create and manage vnodes. 65.Pp 66The definition of a vnode is as follows: 67.Bd -literal 68struct vnode { 69 struct uvm_vnode v_uvm; /* uvm(9) data */ 70 int (**v_op)(void *); /* vnode operations vector */ 71 enum vtype v_type; /* vnode type */ 72 u_int v_flag; /* vnode flags (see below) */ 73 u_int v_usecount; /* reference count of users */ 74 u_int v_writecount; /* reference count of writers */ 75 /* Flags that can be read/written in interrupts */ 76 u_int v_bioflag; /* flags used by intr handlers */ 77 u_int v_holdcnt; /* buffer references */ 78 u_int v_id; /* capability identifier */ 79 struct mount *v_mount; /* ptr to vfs we are in */ 80 TAILQ_ENTRY(vnode) v_freelist; /* vnode freelist */ 81 LIST_ENTRY(vnode) v_mntvnodes; /* vnodes for mount point */ 82 struct buflists v_cleanblkhd; /* clean blocklist head */ 83 struct buflists v_dirtyblkhd; /* dirty blocklist head */ 84 u_int v_numoutput; /* num of writes in progress */ 85 LIST_ENTRY(vnode) v_synclist; /* vnode with dirty buffers */ 86 union { 87 struct mount *vu_mountedhere;/* ptr to mounted vfs (VDIR) */ 88 struct socket *vu_socket; /* UNIX IPC (VSOCK) */ 89 struct specinfo *vu_specinfo; /* device (VCHR, VBLK) */ 90 struct fifoinfo *vu_fifoinfo; /* fifo (VFIFO) */ 91 } v_un; 92 93 enum vtagtype v_tag; /* type of underlying data */ 94 void *v_data; /* private data for fs */ 95 struct { 96 struct simplelock vsi_lock; /* lock to protect below */ 97 struct selinfo vsi_selinfo; /* identity of poller(s) */ 98 } v_selectinfo; 99}; 100#define v_mountedhere v_un.vu_mountedhere 101#define v_socket v_un.vu_socket 102#define v_specinfo v_un.vu_specinfo 103#define v_fifoinfo v_un.vu_fifoinfo 104.Ed 105.Ss Vnode life cycle 106When a client of the VFS requests a new vnode, the vnode allocation 107code can reuse an old vnode object that is no longer in use. 108Whether a vnode is in use is tracked by the vnode reference count 109.Pq Va v_usecount . 110By convention, each open file handle holds a reference 111as do VM objects backed by files. 112A vnode with a reference count of 1 or more will not be deallocated or 113reused to point to a different file. 114So, if you want to ensure that your vnode doesn't become a different 115file under you, you better be sure you have a reference to it. 116A vnode that points to a valid file and has a reference count of 1 or more 117is called 118.Em active . 119.Pp 120When a vnode's reference count drops to zero, it becomes 121.Em inactive , 122that is, a candidate for reuse. 123An inactive vnode still refers to a valid file and one can try to 124reactivate it using 125.Xr vget 9 126(this is used a lot by caches). 127.Pp 128Before the VFS can reuse an inactive vnode to refer to another file, 129it must clean all information pertaining to the old file. 130A cleaned out vnode is called a 131.Em reclaimed 132vnode. 133.Pp 134To support forceable unmounts and the 135.Xr revoke 2 136system call, the VFS may reclaim a vnode with a positive reference 137count. 138The reclaimed vnode is given to the dead file system, which 139returns errors for most operations. 140The reclaimed vnode will not be 141reused for another file until its reference count hits zero. 142.Ss Vnode pool 143The 144.Xr getnewvnode 9 145call allocates a vnode from the pool, possibly reusing an 146inactive vnode, and returns it to the caller. 147The vnode returned has a reference count 148.Pq Va v_usecount 149of 1. 150.Pp 151The 152.Xr vref 9 153call increments the reference count on the vnode. 154It may only be on a vnode with reference count of 1 or greater. 155The 156.Xr vrele 9 157and 158.Xr vput 9 159calls decrement the reference count. 160In addition, the 161.Xr vput 9 162call also releases the vnode lock. 163.Pp 164The 165.Xr vget 9 166call, when used on an inactive vnode, will make the vnode active 167by bumping the reference count to one. 168When called on an active vnode, 169.Fn vget 170increases the reference count by one. 171However, if the vnode is being reclaimed concurrently, then 172.Fn vget 173will fail and return an error. 174.Pp 175The 176.Xr vgone 9 177and 178.Xr vgonel 9 179calls 180orchestrate the reclamation of a vnode. 181They can be called on both active and inactive vnodes. 182.Pp 183When transitioning a vnode to the reclaimed state, the VFS will call the 184.Xr VOP_RECLAIM 9 185method. 186File systems use this method to free any file-system-specific data 187they attached to the vnode. 188.Ss Vnode locks 189The vnode actually has two different types of locks: the vnode lock 190and the vnode reclamation lock 191.Pq Dv VXLOCK . 192.Ss The vnode lock 193The vnode lock and its consistent use accomplishes the following: 194.Bl -bullet 195.It 196It keeps a locked vnode from changing across certain pairs of VOP_ calls, 197thus preserving cached data. 198For example, it keeps the directory from 199changing between a 200.Xr VOP_LOOKUP 9 201call and a 202.Xr VOP_CREATE 9 . 203The 204.Fn VOP_LOOKUP 205call makes sure the name doesn't already exist in the 206directory and finds free room in the directory for the new entry. 207The 208.Fn VOP_CREATE 209call can then go ahead and create the file without checking if 210it already exists or looking for free space. 211.It 212Some file systems rely on it to ensure that only one 213.Dq thread 214at a time 215is calling VOP_ vnode operations on a given file or directory. 216Otherwise, the file system's behavior is undefined. 217.It 218On rare occasions, code will hold the vnode lock so that a series of 219VOP_ operations occurs as an atomic unit. 220(Of course, this doesn't work with network file systems like NFSv2 that don't 221have any notion of bundling a bunch of operations into an atomic unit.) 222.It 223While the vnode lock is held, the vnode will not be reclaimed. 224.El 225.Pp 226There is a discipline to using the vnode lock. 227Some VOP_ operations require that the vnode lock is held before being called. 228.Pp 229The vnode lock is acquired by calling 230.Xr vn_lock 9 231and released by calling 232.Xr VOP_UNLOCK 9 . 233.Pp 234A process is allowed to sleep while holding the vnode lock. 235.Pp 236The implementation of the vnode lock is the responsibility of the individual 237file systems. 238Not all file systems implement it. 239.Pp 240To prevent deadlocks, when acquiring locks on multiple vnodes, the lock 241of parent directory must be acquired before the lock on the child directory. 242.Ss Other vnode synchronization 243The vnode reclamation lock 244.Pq Dv VXLOCK 245is used to prevent multiple 246processes from entering the vnode reclamation code. 247It is also used as a flag to indicate that reclamation is in progress. 248The 249.Dv VXWANT 250flag is set by processes that wish to be woken up when reclamation 251is finished. 252.Pp 253The 254.Xr vwaitforio 9 255call is used to wait for all outstanding write I/Os associated with a 256vnode to complete. 257.Ss Version number/capability 258The vnode capability, 259.Va v_id , 260is a 32-bit version number on the vnode. 261Every time a vnode is reassigned to a new file, the vnode capability 262is changed. 263This is used by code that wishes to keep pointers to vnodes but doesn't want 264to hold a reference (e.g., caches). 265The code keeps both a vnode pointer and a copy of the capability. 266The code can later compare the vnode's capability to its copy and see 267if the vnode still points to the same file. 268.Pp 269Note: for this to work, memory assigned to hold a 270.Vt struct vnode 271can 272only be used for another purpose when all pointers to it have disappeared. 273Since the vnode pool has no way of knowing when all pointers have 274disappeared, it never frees memory it has allocated for vnodes. 275.Ss Vnode fields 276Most of the fields of the vnode structure should be treated as opaque 277and only manipulated through the proper APIs. 278This section describes the fields that are manipulated directly. 279.Pp 280The 281.Va v_flag 282attribute contains random flags related to various functions. 283They are summarized in the following table: 284.Pp 285.Bl -tag -width 10n -compact -offset indent 286.It Dv VROOT 287This vnode is the root of its file system. 288.It Dv VTEXT 289This vnode is a pure text prototype. 290.It Dv VSYSTEM 291This vnode is being used by kernel. 292.It Dv VISTTY 293This vnode represents a 294.Xr tty 4 . 295.It Dv VXLOCK 296This vnode is locked to change its underlying type. 297.It Dv VXWANT 298A process is waiting for this vnode. 299.It Dv VALIASED 300This vnode has an alias. 301.It Dv VLOCKSWORK 302This vnode's underlying file system supports locking discipline. 303.El 304.Pp 305The 306.Va v_tag 307attribute indicates what file system the vnode belongs to. 308Very little code actually uses this attribute and its use is deprecated. 309Programmers should seriously consider using more object-oriented approaches 310(e.g. function tables). 311There is no safe way of defining new 312.Va v_tag Ns 's 313for loadable file systems. 314The 315.Va v_tag 316attribute is read-only. 317.Pp 318The 319.Va v_type 320attribute indicates what type of file (e.g. directory, 321regular, FIFO) this vnode is. 322This is used by the generic code for various checks. 323For example, the 324.Xr read 2 325system call returns zero when a read is attempted on a directory. 326.Pp 327Possible types are: 328.Pp 329.Bl -tag -width 10n -offset indent -compact 330.It Dv VNON 331This vnode has no type. 332.It Dv VREG 333This vnode represents a regular file. 334.It Dv VDIR 335This vnode represents a directory. 336.It Dv VBLK 337This vnode represents a block device. 338.It Dv VCHR 339This vnode represents a character device. 340.It Dv VLNK 341This vnode represents a symbolic link. 342.It Dv VSOCK 343This vnode represents a socket. 344.It Dv VFIFO 345This vnode represents a named pipe. 346.It Dv VBAD 347This vnode represents a bad or dead file. 348.El 349.Pp 350The 351.Va v_data 352attribute allows a file system to attach a piece of file 353system specific memory to the vnode. 354This contains information about the file that is specific to 355the file system (such as an inode pointer in the case of FFS). 356.Pp 357The 358.Va v_numoutput 359attribute indicates the number of pending synchronous 360and asynchronous writes on the vnode. 361It does not track the number of dirty buffers attached to the vnode. 362The attribute is used by code like 363.Xr fsync 2 364to wait for all writes 365to complete before returning to the user. 366This attribute must be manipulated at 367.Xr splbio 9 . 368.Pp 369The 370.Va v_writecount 371attribute tracks the number of write calls pending 372on the vnode. 373.Ss Rules 374The vast majority of vnode functions may not be called from interrupt 375context. 376The exceptions are 377.Fn bgetvp 378and 379.Fn brelvp . 380The following fields of the vnode are manipulated at interrupt level: 381.Va v_numoutput , v_holdcnt , v_dirtyblkhd , 382.Va v_cleanblkhd , v_bioflag , v_freelist , 383and 384.Va v_synclist . 385Any access to these fields should be protected by 386.Xr splbio 9 . 387.Sh SEE ALSO 388.Xr uvm 9 , 389.Xr vaccess 9 , 390.Xr vclean 9 , 391.Xr vcount 9 , 392.Xr vdevgone 9 , 393.Xr vfinddev 9 , 394.Xr vflush 9 , 395.Xr vflushbuf 9 , 396.Xr vfs 9 , 397.Xr vget 9 , 398.Xr vgone 9 , 399.Xr vhold 9 , 400.Xr vinvalbuf 9 , 401.Xr vn_lock 9 , 402.Xr VOP_LOOKUP 9 , 403.Xr vput 9 , 404.Xr vrecycle 9 , 405.Xr vref 9 , 406.Xr vrele 9 , 407.Xr vwaitforio 9 , 408.Xr vwakeup 9 409.Sh HISTORY 410This document first appeared in 411.Ox 2.9 . 412