xref: /openbsd/share/man/man9/vnode.9 (revision d485f761)
1.Dd February 22, 2001
2.Dt vnode 9
3.Os OpenBSD 2.9
4.Sh NAME
5.Nm vnode
6.Nd an overview of vnodes
7.Sh DESCRIPTION
8The vnode is the kernel object that corresponds to a file (actually,
9a file, a directory, a fifo, a domain socket, a symlink, or a device).
10.Pp
11Each vnode has a set of methods corresponding to file operations
12(vop_open, vop_read, vop_write, vop_rename, vop_mkdir, vop_close).
13These methods are implemented by the individual file systems and
14are dispatched through function pointers.
15.Pp
16In addition, the VFS has functions for maintaining a pool of vnodes,
17associating vnodes with mount points, and associating vnodes with buffers.
18The individual file systems cannot override these functions.
19As such, individual file systems cannot allocate their own vnodes.
20.Pp
21In general, the contents of a struct vnode should not be examined or
22modified by the users of vnode methods.
23There are some rather common exceptions detailed later in this document.
24.Pp
25The vast majority of the vnode functions CANNOT be called from interrupt
26context.
27.Ss Vnode pool
28All the vnodes in the kernel are allocated out of a shared pool.
29The
30.Xr getnewvnode 9
31system call returns a fresh vnode from the vnode
32pool.
33The vnode returned has a reference count (v_usecount) of 1.
34.Pp
35The
36.Xr vref 9
37call increments the reference count on the vnode.
38The
39.Xr vrele 9
40and
41.Xr vput 9
42calls decrement the reference count.
43In addition, the
44.Xr vput 9
45call also releases the vnode lock.
46.Pp
47When a vnode's reference count becomes zero, the vnode pool places it
48a pool of free vnodes, eligible to be assigned to a different file.
49The vnode pool calls the
50.Xr vop_inactive 9
51method to inform the file system that the reference count has reached zero.
52.Pp
53When placed in the pool of free vnodes, the vnode is not otherwise altered.
54In fact, it can often be retrieved before it is reassigned to a different file.
55This is useful when the system closes a file and opens it again in rapid
56succession.
57The
58.Xr vget 9
59call is used to revive the vnode.
60Note, callers should ensure the vnode
61they get back has not been reassigned to a different file.
62.Pp
63When the vnode pool decides to reclaim the vnode to satisfy a getnewvnode
64request, it calls the
65.Xr vop_reclaim 9
66method.
67File systems often use this method to free any file-system specific data they
68attach to the vnode.
69.Pp
70A file system can force a vnode with a reference count of zero
71to be reclaimed earlier by calling the
72.Xr vrecycle 9
73call.
74The
75.Xr vrecycle 9
76call is a null operation if the reference count is greater than zero.
77.Pp
78The
79.Xr vgone 9
80and
81.Xr vgonel 9
82calls will force the pool to reclaim
83the vnode even if it has a non-zero reference count.
84If the vnode had a non-zero reference count, the vnode is then assigned
85an operations vector corresponding to the "dead" file system.
86In this operations vector, most operations return errors.
87.Ss Vnode locks
88Note to beginners: locks don't actually prevent memory from being read
89or overwritten.
90Instead, they are an object that, where used, allows only one piece of code
91to proceed through the locked section.
92If you do not surround a stretch of code with a lock, it can and probably
93will eventually be executed simultaneously with other stretches of code
94(including stretches ).
95Chances are the results will be unexpected and disappointing to both the
96user and you.
97.Pp
98The vnode actually has three different types of lock: the vnode lock,
99the vnode interlock, and the vnode reclamation lock (VXLOCK).
100.Ss The vnode lock
101The most general lock is the vnode lock.
102This lock is acquired by calling
103.Xr vn_lock 9
104and released by calling
105.Xr vn_unlock 9 .
106The vnode lock is used to serialize operations through the file system for
107a given file when there are multiple concurrent requests on the same file.
108Many file system functions require that you hold the vnode lock on entry.
109The vnode lock may be held when sleeping.
110.Pp
111The
112.Xr revoke 2
113and forcible unmount features in BSD UNIX allows a
114user to invalidate files and their associated vnodes at almost any
115time, even if there are active open files on it.
116While in a region of code protected by the vnode lock, the process is
117guaranteed that the vnode will not be reclaimed or invalidated.
118.Pp
119The vnode lock is a multiple-reader or single-writer lock.
120An exclusive vnode lock may be acquired multiple times by the same
121process.
122.Pp
123The vnode lock is somewhat messy because it is used for many purposes.
124Some clients of the vnode interface use it to try to bundle a series
125of VOP_ method calls into an atomic group.
126Many file systems rely on it to prevent race conditions in updating file
127system specific data structures (as opposed to having their own locks).
128.Pp
129The implementation of the vnode lock is the responsibility of the individual
130file systems.
131Not all file system implement it.
132.Pp
133To prevent deadlocks, when acquiring locks on multiple vnodes, the lock
134of parent directory must be acquired before the lock on the child directory.
135.Pp
136Interrupt handlers must not acquire vnode locks.
137.Ss Vnode interlock
138The vnode interlock (vp->v_interlock) is a spinlock.
139It is useful on multi-processor systems for acquiring a quick exclusive
140lock on the contents of the vnode.
141It MUST NOT be held while sleeping.
142(What fields does it cover? What about splbio/interrupt issues?)
143.Pp
144Operations on this lock are a no-op on uniprocessor systems.
145.Ss Other Vnode synchronization
146The vnode reclamation lock (VXLOCK) is used to prevent multiple
147processes from entering the vnode reclamation code.
148It is also used as a flag to indicate that reclamation is in progress.
149The VXWANT flag is set by processes that wish to woken up when reclamation
150is finished.
151.Pp
152The
153.Xr vwaitforio 9
154call is used for to wait for all outstanding write I/Os associated with a
155vnode to complete.
156.Ss Version number/capability
157The vnode capability, v_id, is a 32-bit version number on the vnode.
158Every time a vnode is reassigned to a new file, the vnode capability
159is changed.
160This is used by code that wish to keep pointers to vnodes but doesn't want
161to hold a reference (e.g., caches).
162The code keeps both a vnode * and a copy of the capability.
163The code can later compare the vnode's capability to its copy and see
164if the vnode still points to the same file.
165.Pp
166Note: for this to work, memory assigned to hold a struct vnode can
167only be used for another purpose when all pointers to it have disappeared.
168Since the vnode pool has no way of knowing when all pointers have
169disappeared, it never frees memory it has allocated for vnodes.
170.Ss Vnode fields
171Most of the fields of the vnode structure should be treated as opaque
172and only manipulated through the proper APIs.
173This section describes the fields that are manipulated directly.
174.Pp
175The v_flag attribute contains random flags related to various functions.
176They are summarized in table ...
177.Pp
178The v_tag attribute indicates what file system the vnode belongs to.
179Very little code actually uses this attribute and its use is deprecated.
180Programmers should seriously consider using more object-oriented approaches
181(e.g. function tables).
182There is no safe way of defining new v_tags for loadable file systems.
183The v_tag attribute is read-only.
184.Pp
185The v_type attribute indicates what type of file (e.g. directory,
186regular, fifo) this vnode is.
187This is used by the generic code to ensure for various checks.
188For example, the
189.Xr read 2
190system call returns an error when a read is attempted on a directory.
191.Pp
192The v_data attribute allows a file system to attach piece of file
193system specific memory to the vnode.
194This contains information about the file that is specific to
195the file system.
196.Pp
197The v_numoutput attribute indicates the number of pending synchronous
198and asynchronous writes on the vnode.
199It does not track the number of dirty buffers attached to the vnode.
200The attribute is used by code like fsync to wait for all writes
201to complete before returning to the user.
202This attribute must be manipulated at splbio().
203.Pp
204The v_writecount attribute tracks the number of write calls pending
205on the vnode.
206.Ss RULES
207The vast majority of vnode functions may not be called from interrupt
208context.
209The exceptions are bgetvp and brelvp.
210The following fields of the vnode are manipulated at interrupt level:
211v_numoutput, v_holdcnt, v_dirtyblkhd, v_cleanblkhd, v_bioflag, v_freelist,
212and v_synclist.
213Any accesses to these field should be protected by splbio,
214unless you are certain that there is no chance an interrupt handler
215will modify them.
216.Pp
217A vnode will only be reassigned to another file when its reference count
218reaches zero and the vnode lock is freed.
219.Pp
220A vnode will not be reclaimed as long as the vnode lock is held.
221If the vnode reference count drops to zero while a process is holding
222the vnode lock, the vnode MAY be queued for reclamation.
223Increasing the reference count from 0 to 1 while holding the lock will
224most likely cause intermittent kernel panics.
225.Sh HISTORY
226This document first appeared in
227.Ox 2.9 .
228