xref: /openbsd/share/man/man9/vnode.9 (revision 09467b48)
1.\"     $OpenBSD: vnode.9,v 1.33 2020/01/20 23:23:04 claudio Exp $
2.\"
3.\" Copyright (c) 2001 Constantine Sapuntzakis
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\"
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. The name of the author may not be used to endorse or promote products
13.\"    derived from this software without specific prior written permission.
14.\"
15.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
16.\" INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
17.\" AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
18.\" THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
19.\" EXEMPLARY, OR CONSEQUENTIAL  DAMAGES (INCLUDING, BUT NOT LIMITED TO,
20.\" PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
21.\" OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
22.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
23.\" OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
24.\" ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25.\"
26.Dd $Mdocdate: January 20 2020 $
27.Dt VNODE 9
28.Os
29.Sh NAME
30.Nm vnode
31.Nd an overview of vnodes
32.Sh DESCRIPTION
33A
34.Em vnode
35is an object in kernel memory that speaks the
36.Ux
37file interface (open, read, write, close, readdir, etc.).
38Vnodes can represent files, directories, FIFOs, domain sockets, block devices,
39character devices.
40.Pp
41Each vnode has a set of methods which start with the string
42.Dq VOP_ .
43These methods include
44.Fn VOP_OPEN ,
45.Fn VOP_READ ,
46.Fn VOP_WRITE ,
47.Fn VOP_RENAME ,
48.Fn VOP_CLOSE ,
49and
50.Fn VOP_MKDIR .
51Many of these methods correspond closely to the equivalent
52file system call \-
53.Xr open 2 ,
54.Xr read 2 ,
55.Xr write 2 ,
56.Xr rename 2 ,
57etc.
58Each file system (FFS, NFS, etc.) provides implementations for these methods.
59.Pp
60The Virtual File System library (see
61.Xr vfs 9 )
62maintains a pool of vnodes.
63File systems cannot allocate their own vnodes; they must use the functions
64provided by the VFS to create and manage vnodes.
65.Pp
66The definition of a vnode is as follows:
67.Bd -literal
68struct vnode {
69	struct uvm_vnode *v_uvm;		/* uvm data */
70	const struct vops *v_op;		/* vnode operations vector */
71	enum	vtype v_type;			/* vnode type */
72	enum	vtagtype v_tag;			/* type of underlying data */
73	u_int	v_flag;				/* vnode flags (see below) */
74	u_int   v_usecount;			/* reference count of users */
75	u_int   v_uvcount;			/* unveil references */
76	/* reference count of writers */
77	u_int   v_writecount;
78	/* Flags that can be read/written in interrupts */
79	u_int   v_bioflag;
80	u_int   v_holdcnt;			/* buffer references */
81	u_int   v_id;				/* capability identifier */
82	u_int	v_inflight;
83	struct	mount *v_mount;			/* ptr to vfs we are in */
84	TAILQ_ENTRY(vnode) v_freelist;		/* vnode freelist */
85	LIST_ENTRY(vnode) v_mntvnodes;		/* vnodes for mount point */
86	struct	buf_rb_bufs v_bufs_tree;	/* lookup of all bufs */
87	struct	buflists v_cleanblkhd;		/* clean blocklist head */
88	struct	buflists v_dirtyblkhd;		/* dirty blocklist head */
89	u_int   v_numoutput;			/* num of writes in progress */
90	LIST_ENTRY(vnode) v_synclist;		/* vnode with dirty buffers */
91	union {
92		struct mount	*vu_mountedhere;/* ptr to mounted vfs (VDIR) */
93		struct socket	*vu_socket;	/* unix ipc (VSOCK) */
94		struct specinfo	*vu_specinfo;	/* device (VCHR, VBLK) */
95		struct fifoinfo	*vu_fifoinfo;	/* fifo (VFIFO) */
96	} v_un;
97
98	/* VFS namecache */
99	struct namecache_rb_cache v_nc_tree;
100	TAILQ_HEAD(, namecache) v_cache_dst;	 /* cache entries to us */
101
102	void	*v_data;			/* private data for fs */
103	struct	selinfo v_selectinfo;		/* identity of poller(s) */
104};
105#define	v_mountedhere	v_un.vu_mountedhere
106#define	v_socket	v_un.vu_socket
107#define	v_specinfo	v_un.vu_specinfo
108#define	v_fifoinfo	v_un.vu_fifoinfo
109.Ed
110.Ss Vnode life cycle
111When a client of the VFS requests a new vnode, the vnode allocation
112code can reuse an old vnode object that is no longer in use.
113Whether a vnode is in use is tracked by the vnode reference count
114.Pq Va v_usecount .
115By convention, each open file handle holds a reference
116as do VM objects backed by files.
117A vnode with a reference count of 1 or more will not be deallocated or
118reused to point to a different file.
119So, if you want to ensure that your vnode doesn't become a different
120file under you, you better be sure you have a reference to it.
121A vnode that points to a valid file and has a reference count of 1 or more
122is called
123.Em active .
124.Pp
125When a vnode's reference count drops to zero, it becomes
126.Em inactive ,
127that is, a candidate for reuse.
128An inactive vnode still refers to a valid file and one can try to
129reactivate it using
130.Xr vget 9
131(this is used a lot by caches).
132.Pp
133Before the VFS can reuse an inactive vnode to refer to another file,
134it must clean all information pertaining to the old file.
135A cleaned out vnode is called a
136.Em reclaimed
137vnode.
138.Pp
139To support forceable unmounts and the
140.Xr revoke 2
141system call, the VFS may reclaim a vnode with a positive reference
142count.
143The reclaimed vnode is given to the dead file system, which
144returns errors for most operations.
145The reclaimed vnode will not be
146reused for another file until its reference count hits zero.
147.Ss Vnode pool
148The
149.Xr getnewvnode 9
150call allocates a vnode from the pool, possibly reusing an
151inactive vnode, and returns it to the caller.
152The vnode returned has a reference count
153.Pq Va v_usecount
154of 1.
155.Pp
156The
157.Xr vref 9
158call increments the reference count on the vnode.
159It may only be on a vnode with reference count of 1 or greater.
160The
161.Xr vrele 9
162and
163.Xr vput 9
164calls decrement the reference count.
165In addition, the
166.Xr vput 9
167call also releases the vnode lock.
168.Pp
169The
170.Xr vget 9
171call, when used on an inactive vnode, will make the vnode active
172by bumping the reference count to one.
173When called on an active vnode,
174.Fn vget
175increases the reference count by one.
176However, if the vnode is being reclaimed concurrently, then
177.Fn vget
178will fail and return an error.
179.Pp
180The
181.Xr vgone 9
182and
183.Xr vgonel 9
184calls
185orchestrate the reclamation of a vnode.
186They can be called on both active and inactive vnodes.
187.Pp
188When transitioning a vnode to the reclaimed state, the VFS will call the
189.Xr VOP_RECLAIM 9
190method.
191File systems use this method to free any file-system-specific data
192they attached to the vnode.
193.Ss Vnode locks
194The vnode actually has two different types of locks: the vnode lock
195and the vnode reclamation lock
196.Pq Dv VXLOCK .
197.Ss The vnode lock
198The vnode lock and its consistent use accomplishes the following:
199.Bl -bullet
200.It
201It keeps a locked vnode from changing across certain pairs of VOP_ calls,
202thus preserving cached data.
203For example, it keeps the directory from
204changing between a
205.Xr VOP_LOOKUP 9
206call and a
207.Xr VOP_CREATE 9 .
208The
209.Fn VOP_LOOKUP
210call makes sure the name doesn't already exist in the
211directory and finds free room in the directory for the new entry.
212The
213.Fn VOP_CREATE
214call can then go ahead and create the file without checking if
215it already exists or looking for free space.
216.It
217Some file systems rely on it to ensure that only one
218.Dq thread
219at a time
220is calling VOP_ vnode operations on a given file or directory.
221Otherwise, the file system's behavior is undefined.
222.It
223On rare occasions, code will hold the vnode lock so that a series of
224VOP_ operations occurs as an atomic unit.
225(Of course, this doesn't work with network file systems like NFSv2 that don't
226have any notion of bundling a bunch of operations into an atomic unit.)
227.It
228While the vnode lock is held, the vnode will not be reclaimed.
229.El
230.Pp
231There is a discipline to using the vnode lock.
232Some VOP_ operations require that the vnode lock is held before being called.
233.Pp
234The vnode lock is acquired by calling
235.Xr vn_lock 9
236and released by calling
237.Xr VOP_UNLOCK 9 .
238.Pp
239A process is allowed to sleep while holding the vnode lock.
240.Pp
241The implementation of the vnode lock is the responsibility of the individual
242file systems.
243Not all file systems implement it.
244.Pp
245To prevent deadlocks, when acquiring locks on multiple vnodes, the lock
246of parent directory must be acquired before the lock on the child directory.
247.Ss Other vnode synchronization
248The vnode reclamation lock
249.Pq Dv VXLOCK
250is used to prevent multiple
251processes from entering the vnode reclamation code.
252It is also used as a flag to indicate that reclamation is in progress.
253The
254.Dv VXWANT
255flag is set by processes that wish to be woken up when reclamation
256is finished.
257.Pp
258The
259.Xr vwaitforio 9
260call is used to wait for all outstanding write I/Os associated with a
261vnode to complete.
262.Ss Version number/capability
263The vnode capability,
264.Va v_id ,
265is a 32-bit version number on the vnode.
266Every time a vnode is reassigned to a new file, the vnode capability
267is changed.
268This is used by code that wishes to keep pointers to vnodes but doesn't want
269to hold a reference (e.g., caches).
270The code keeps both a vnode pointer and a copy of the capability.
271The code can later compare the vnode's capability to its copy and see
272if the vnode still points to the same file.
273.Pp
274Note: for this to work, memory assigned to hold a
275.Vt struct vnode
276can
277only be used for another purpose when all pointers to it have disappeared.
278Since the vnode pool has no way of knowing when all pointers have
279disappeared, it never frees memory it has allocated for vnodes.
280.Ss Vnode fields
281Most of the fields of the vnode structure should be treated as opaque
282and only manipulated through the proper APIs.
283This section describes the fields that are manipulated directly.
284.Pp
285The
286.Va v_flag
287attribute contains random flags related to various functions.
288They are summarized in the following table:
289.Pp
290.Bl -tag -width 10n -compact -offset indent
291.It Dv VROOT
292This vnode is the root of its file system.
293.It Dv VTEXT
294This vnode is a pure text prototype.
295.It Dv VSYSTEM
296This vnode is being used by kernel.
297.It Dv VISTTY
298This vnode represents a
299.Xr tty 4 .
300.It Dv VXLOCK
301This vnode is locked to change its underlying type.
302.It Dv VXWANT
303A process is waiting for this vnode.
304.It Dv VALIASED
305This vnode has an alias.
306.It Dv VLOCKSWORK
307This vnode's underlying file system supports locking discipline.
308.El
309.Pp
310The
311.Va v_tag
312attribute indicates what file system the vnode belongs to.
313Very little code actually uses this attribute and its use is deprecated.
314Programmers should seriously consider using more object-oriented approaches
315(e.g. function tables).
316There is no safe way of defining new
317.Va v_tag Ns 's
318for loadable file systems.
319The
320.Va v_tag
321attribute is read-only.
322.Pp
323The
324.Va v_type
325attribute indicates what type of file (e.g. directory,
326regular, FIFO) this vnode is.
327This is used by the generic code for various checks.
328For example, the
329.Xr read 2
330system call returns zero when a read is attempted on a directory.
331.Pp
332Possible types are:
333.Pp
334.Bl -tag -width 10n -offset indent -compact
335.It Dv VNON
336This vnode has no type.
337.It Dv VREG
338This vnode represents a regular file.
339.It Dv VDIR
340This vnode represents a directory.
341.It Dv VBLK
342This vnode represents a block device.
343.It Dv VCHR
344This vnode represents a character device.
345.It Dv VLNK
346This vnode represents a symbolic link.
347.It Dv VSOCK
348This vnode represents a socket.
349.It Dv VFIFO
350This vnode represents a named pipe.
351.It Dv VBAD
352This vnode represents a bad or dead file.
353.El
354.Pp
355The
356.Va v_data
357attribute allows a file system to attach a piece of file
358system specific memory to the vnode.
359This contains information about the file that is specific to
360the file system (such as an inode pointer in the case of FFS).
361.Pp
362The
363.Va v_numoutput
364attribute indicates the number of pending synchronous
365and asynchronous writes on the vnode.
366It does not track the number of dirty buffers attached to the vnode.
367The attribute is used by code like
368.Xr fsync 2
369to wait for all writes
370to complete before returning to the user.
371This attribute must be manipulated at
372.Xr splbio 9 .
373.Pp
374The
375.Va v_writecount
376attribute tracks the number of write calls pending
377on the vnode.
378.Ss Rules
379The vast majority of vnode functions may not be called from interrupt
380context.
381The exceptions are
382.Fn bgetvp
383and
384.Fn brelvp .
385The following fields of the vnode are manipulated at interrupt level:
386.Va v_numoutput , v_holdcnt , v_dirtyblkhd ,
387.Va v_cleanblkhd , v_bioflag , v_freelist ,
388and
389.Va v_synclist .
390Any access to these fields should be protected by
391.Xr splbio 9 .
392.Sh SEE ALSO
393.Xr uvn_attach 9 ,
394.Xr vaccess 9 ,
395.Xr vclean 9 ,
396.Xr vcount 9 ,
397.Xr vdevgone 9 ,
398.Xr vfinddev 9 ,
399.Xr vflush 9 ,
400.Xr vflushbuf 9 ,
401.Xr vfs 9 ,
402.Xr vget 9 ,
403.Xr vgone 9 ,
404.Xr vhold 9 ,
405.Xr vinvalbuf 9 ,
406.Xr vn_lock 9 ,
407.Xr VOP_LOOKUP 9 ,
408.Xr vput 9 ,
409.Xr vrecycle 9 ,
410.Xr vref 9 ,
411.Xr vrele 9 ,
412.Xr vwaitforio 9 ,
413.Xr vwakeup 9
414.Sh HISTORY
415This document first appeared in
416.Ox 2.9 .
417