xref: /dragonfly/share/man/man7/tuning.7 (revision 0b29ed9d)
1.\" Copyright (c) 2001 Matthew Dillon.  Terms and conditions are those of
2.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in
3.\" the source tree.
4.\"
5.Dd August 24, 2018
6.Dt TUNING 7
7.Os
8.Sh NAME
9.Nm tuning
10.Nd performance tuning under DragonFly
11.Sh SYSTEM SETUP
12Modern
13.Dx
14systems typically have just three partitions on the main drive.
15In order, a UFS
16.Pa /boot ,
17.Pa swap ,
18and a HAMMER or HAMMER2
19.Pa root .
20In prior years the installer created separate PFSs for half a dozen
21directories, but now we just put (almost) everything in the root.
22The installer will separate stuff that doesn't need to be backed up into
23a /build subdirectory and create null-mounts for things like /usr/obj, but it
24no longer creates separate PFSs for these.
25If desired, you can make /build its own mount to separate-out the
26components of the filesystem which do not need to be persistent.
27.Pp
28Generally speaking the
29.Pa /boot
30partition should be 1GB in size.  This is the minimum recommended
31size, giving you room for backup kernels and alternative boot schemes.
32.Dx
33always installs debug-enabled kernels and modules and these can take
34up quite a bit of disk space (but will not take up any extra ram).
35.Pp
36In the old days we recommended that swap be sized to at least 2x main
37memory.  These days swap is often used for other activities, including
38.Xr tmpfs 5
39and
40.Xr swapcache 8 .
41We recommend that swap be sized to the larger of 2x main memory or
421GB if you have a fairly small disk and 16GB or more if you have a
43modestly endowed system.
44If you have a modest SSD + large HDD combination, we recommend
45a large dedicated swap partition on the SSD.  For example, if
46you have a 128GB SSD and 2TB or more of HDD storage, dedicating
47upwards of 64GB of the SSD to swap and using
48.Xr swapcache 8
49will significantly improve your HDD's performance.
50.Pp
51In an all-SSD or mostly-SSD system,
52.Xr swapcache 8
53is not normally used and should be left disabled (the default), but you
54may still want to have a large swap partition to support
55.Xr tmpfs 5
56use.
57Our synth/poudriere build machines run with at least 200GB of
58swap and use tmpfs for all the builder jails.  50-100 GB
59is swapped out at the peak of the build.  As a result, actual
60system storage bandwidth is minimized and performance increased.
61.Pp
62If you are on a minimally configured machine you may, of course,
63configure far less swap or no swap at all but we recommend at least
64some swap.
65The kernel's VM paging algorithms are tuned to perform best when there is
66swap space configured.
67Configuring too little swap can lead to inefficiencies in the VM
68page scanning code as well as create issues later on if you add
69more memory to your machine, so don't be shy about it.
70Swap is a good idea even if you don't think you will ever need it as it
71allows the
72machine to page out completely unused data and idle programs (like getty),
73maximizing the ram available for your activities.
74.Pp
75If you intend to use the
76.Xr swapcache 8
77facility with a SSD + HDD combination we recommend configuring as much
78swap space as you can on the SSD.
79However, keep in mind that each 1GByte of swapcache requires around
801MByte of ram, so don't scale your swap beyond the equivalent ram
81that you reasonably want to eat to support it.
82.Pp
83Finally, on larger systems with multiple drives, if the use
84of SSD swap is not in the cards or if it is and you need higher-than-normal
85swapcache bandwidth, you can configure swap on up to four drives and
86the kernel will interleave the storage.
87The swap partitions on the drives should be approximately the same size.
88The kernel can handle arbitrary sizes but
89internal data structures scale to 4 times the largest swap partition.
90Keeping
91the swap partitions near the same size will allow the kernel to optimally
92stripe swap space across the N disks.
93Do not worry about overdoing it a
94little, swap space is the saving grace of
95.Ux
96and even if you do not normally use much swap, having some allows the system
97to move idle program data out of ram and allows the machine to more easily
98handle abnormal runaway programs.
99However, keep in mind that any sort of swap space failure can lock the
100system up.
101Most machines are configured with only one or two swap partitions.
102.Pp
103Most
104.Dx
105systems have a single HAMMER or HAMMER2 root.
106PFSs can be used to administratively separate domains for backup purposes
107but tend to be a hassle otherwise so if you don't need the administrative
108separation you don't really need to use multiple PFSs.
109All the PFSs share the same allocation layer so there is no longer a need
110to size each individual mount.
111Instead you should review the
112.Xr hammer 8
113manual page and use the 'hammer viconfig' facility to adjust snapshot
114retention and other parameters.
115By default
116HAMMER1 keeps 60 days worth of snapshots, and HAMMER2 keeps none.
117By convention
118.Pa /build
119is not backed up and contained only directory trees that do not need
120to be backed-up or snapshotted.
121.Pp
122If a very large work area is desired it is often beneficial to
123configure it as its own filesystem in a completely independent partition
124so allocation blowouts (if they occur) do not affect the main system.
125By convention a large work area is named
126.Pa /build .
127Similarly if a machine is going to have a large number of users
128you might want to separate your
129.Pa /home
130out as well.
131.Pp
132A number of run-time
133.Xr mount 8
134options exist that can help you tune the system.
135The most obvious and most dangerous one is
136.Cm async .
137Do not ever use it; it is far too dangerous.
138A less dangerous and more
139useful
140.Xr mount 8
141option is called
142.Cm noatime .
143.Ux
144filesystems normally update the last-accessed time of a file or
145directory whenever it is accessed.
146However, neither HAMMER nor HAMMER2 implement atime so there is usually
147no need to mess with this option.
148The lack of atime updates can create issues with certain programs
149such as when detecting whether unread mail is present, but
150applications for the most part no longer depend on it.
151.Sh SSD SWAP
152The single most important thing you can do to improve performance is to`
153have at least one solid-state drive in your system, and to configure your
154swap space on that drive.
155If you are using a combination of a smaller SSD and a very larger HDD,
156you can use
157.Xr swapcache 8
158to automatically cache data from your HDD.
159But even if you do not, having swap space configured on your SSD will
160significantly improve performance under even modest paging loads.
161It is particularly useful to configure a significant amount of swap
162on a workstation, 32GB or more is not uncommon, to handle bloated
163leaky applications such as browsers.
164.Sh SYSCTL TUNING
165.Xr sysctl 8
166variables permit system behavior to be monitored and controlled at
167run-time.
168Some sysctls simply report on the behavior of the system; others allow
169the system behavior to be modified;
170some may be set at boot time using
171.Xr rc.conf 5 ,
172but most will be set via
173.Xr sysctl.conf 5 .
174There are several hundred sysctls in the system, including many that appear
175to be candidates for tuning but actually are not.
176In this document we will only cover the ones that have the greatest effect
177on the system.
178.Pp
179The
180.Va kern.gettimeofday_quick
181sysctl defaults to 0 (off).  Setting this sysctl to 1 causes gettimeofday()
182calls in libc to use a tick-granular time from the kpmap instead of making
183a system call.  Setting this feature can be useful when running benchmarks
184which make large numbers of gettimeofday() calls, such as postgres.
185.Pp
186The
187.Va kern.ipc.shm_use_phys
188sysctl defaults to 1 (on) and may be set to 0 (off) or 1 (on).
189Setting
190this parameter to 1 will cause all System V shared memory segments to be
191mapped to unpageable physical RAM.
192This feature only has an effect if you
193are either (A) mapping small amounts of shared memory across many (hundreds)
194of processes, or (B) mapping large amounts of shared memory across any
195number of processes.
196This feature allows the kernel to remove a great deal
197of internal memory management page-tracking overhead at the cost of wiring
198the shared memory into core, making it unswappable.
199.Pp
200The
201.Va vfs.write_behind
202sysctl defaults to 1 (on).  This tells the filesystem to issue media
203writes as full clusters are collected, which typically occurs when writing
204large sequential files.  The idea is to avoid saturating the buffer
205cache with dirty buffers when it would not benefit I/O performance.  However,
206this may stall processes and under certain circumstances you may wish to turn
207it off.
208.Pp
209The
210.Va vfs.lorunningspace
211and
212.Va vfs.hirunningspace
213sysctls determines how much outstanding write I/O may be queued to
214disk controllers system wide at any given moment.  The default is
215usually sufficient, particularly when SSDs are part of the mix.
216Note that setting too high a value can lead to extremely poor
217clustering performance.  Do not set this value arbitrarily high!  Also,
218higher write queueing values may add latency to reads occurring at the same
219time.
220The
221.Va vfs.bufcache_bw
222controls data cycling within the buffer cache.  I/O bandwidth less than
223this specification (per second) will cycle into the much larger general
224VM page cache while I/O bandwidth in excess of this specification will
225be recycled within the buffer cache, reducing the load on the rest of
226the VM system at the cost of bypassing normal VM caching mechanisms.
227The default value is 200 megabytes/s (209715200), which means that the
228system will try harder to cache data coming off a slower hard drive
229and less hard trying to cache data coming off a fast SSD.
230.Pp
231This parameter is particularly important if you have NVMe drives in
232your system as these storage devices are capable of transferring
233well over 2GBytes/sec into the system and can blow normal VM paging
234and caching algorithms to bits.
235.Pp
236There are various other buffer-cache and VM page cache related sysctls.
237We do not recommend modifying their values.
238.Pp
239The
240.Va net.inet.tcp.sendspace
241and
242.Va net.inet.tcp.recvspace
243sysctls are of particular interest if you are running network intensive
244applications.
245They control the amount of send and receive buffer space
246allowed for any given TCP connection.
247However,
248.Dx
249now auto-tunes these parameters using a number of other related
250sysctls (run 'sysctl net.inet.tcp' to get a list) and usually
251no longer need to be tuned manually.
252We do not recommend
253increasing or decreasing the defaults if you are managing a very large
254number of connections.
255Note that the routing table (see
256.Xr route 8 )
257can be used to introduce route-specific send and receive buffer size
258defaults.
259.Pp
260As an additional management tool you can use pipes in your
261firewall rules (see
262.Xr ipfw 8 )
263to limit the bandwidth going to or from particular IP blocks or ports.
264For example, if you have a T1 you might want to limit your web traffic
265to 70% of the T1's bandwidth in order to leave the remainder available
266for mail and interactive use.
267Normally a heavily loaded web server
268will not introduce significant latencies into other services even if
269the network link is maxed out, but enforcing a limit can smooth things
270out and lead to longer term stability.
271Many people also enforce artificial
272bandwidth limitations in order to ensure that they are not charged for
273using too much bandwidth.
274.Pp
275Setting the send or receive TCP buffer to values larger than 65535 will result
276in a marginal performance improvement unless both hosts support the window
277scaling extension of the TCP protocol, which is controlled by the
278.Va net.inet.tcp.rfc1323
279sysctl.
280These extensions should be enabled and the TCP buffer size should be set
281to a value larger than 65536 in order to obtain good performance from
282certain types of network links; specifically, gigabit WAN links and
283high-latency satellite links.
284RFC 1323 support is enabled by default.
285.Pp
286The
287.Va net.inet.tcp.always_keepalive
288sysctl determines whether or not the TCP implementation should attempt
289to detect dead TCP connections by intermittently delivering
290.Dq keepalives
291on the connection.
292By default, this is now enabled for all applications.
293We do not recommend turning it off.
294The extra network bandwidth is minimal and this feature will clean-up
295stalled and long-dead connections that might not otherwise be cleaned
296up.
297In the past people using dialup connections often did not want to
298use this feature in order to be able to retain connections across
299long disconnections, but in modern day the only default that makes
300sense is for the feature to be turned on.
301.Pp
302The
303.Va net.inet.tcp.delayed_ack
304TCP feature is largely misunderstood.  Historically speaking this feature
305was designed to allow the acknowledgement to transmitted data to be returned
306along with the response.  For example, when you type over a remote shell
307the acknowledgement to the character you send can be returned along with the
308data representing the echo of the character.   With delayed acks turned off
309the acknowledgement may be sent in its own packet before the remote service
310has a chance to echo the data it just received.  This same concept also
311applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the
312number of tiny packets flowing across the network in half.   The
313.Dx
314delayed-ack implementation also follows the TCP protocol rule that
315at least every other packet be acknowledged even if the standard 100ms
316timeout has not yet passed.  Normally the worst a delayed ack can do is
317slightly delay the teardown of a connection, or slightly delay the ramp-up
318of a slow-start TCP connection.  While we aren't sure we believe that
319the several FAQs related to packages such as SAMBA and SQUID which advise
320turning off delayed acks may be referring to the slow-start issue.
321.Pp
322The
323.Va net.inet.tcp.inflight_enable
324sysctl turns on bandwidth delay product limiting for all TCP connections.
325This feature is now turned on by default and we recommend that it be
326left on.
327It will slightly reduce the maximum bandwidth of a connection but the
328benefits of the feature in reducing packet backlogs at router constriction
329points are enormous.
330These benefits make it a whole lot easier for router algorithms to manage
331QOS for multiple connections.
332The limiting feature reduces the amount of data built up in intermediate
333router and switch packet queues as well as reduces the amount of data built
334up in the local host's interface queue.  With fewer packets queued up,
335interactive connections, especially over slow modems, will also be able
336to operate with lower round trip times.  However, note that this feature
337only affects data transmission (uploading / server-side).  It does not
338affect data reception (downloading).
339.Pp
340The system will attempt to calculate the bandwidth delay product for each
341connection and limit the amount of data queued to the network to just the
342amount required to maintain optimum throughput.  This feature is useful
343if you are serving data over modems, GigE, or high speed WAN links (or
344any other link with a high bandwidth*delay product), especially if you are
345also using window scaling or have configured a large send window.
346.Pp
347For production use setting
348.Va net.inet.tcp.inflight_min
349to at least 6144 may be beneficial.  Note, however, that setting high
350minimums may effectively disable bandwidth limiting depending on the link.
351.Pp
352Adjusting
353.Va net.inet.tcp.inflight_stab
354is not recommended.
355This parameter defaults to 50, representing +5% fudge when calculating the
356bwnd from the bw.  This fudge is on top of an additional fixed +2*maxseg
357added to bwnd.  The fudge factor is required to stabilize the algorithm
358at very high speeds while the fixed 2*maxseg stabilizes the algorithm at
359low speeds.  If you increase this value excessive packet buffering may occur.
360.Pp
361The
362.Va net.inet.ip.portrange.*
363sysctls control the port number ranges automatically bound to TCP and UDP
364sockets.  There are three ranges:  A low range, a default range, and a
365high range, selectable via an IP_PORTRANGE
366.Fn setsockopt
367call.
368Most network programs use the default range which is controlled by
369.Va net.inet.ip.portrange.first
370and
371.Va net.inet.ip.portrange.last ,
372which defaults to 1024 and 5000 respectively.  Bound port ranges are
373used for outgoing connections and it is possible to run the system out
374of ports under certain circumstances.  This most commonly occurs when you are
375running a heavily loaded web proxy.  The port range is not an issue
376when running serves which handle mainly incoming connections such as a
377normal web server, or has a limited number of outgoing connections such
378as a mail relay.  For situations where you may run yourself out of
379ports we recommend increasing
380.Va net.inet.ip.portrange.last
381modestly.  A value of 10000 or 20000 or 30000 may be reasonable.  You should
382also consider firewall effects when changing the port range.  Some firewalls
383may block large ranges of ports (usually low-numbered ports) and expect systems
384to use higher ranges of ports for outgoing connections.  For this reason
385we do not recommend that
386.Va net.inet.ip.portrange.first
387be lowered.
388.Pp
389The
390.Va kern.ipc.somaxconn
391sysctl limits the size of the listen queue for accepting new TCP connections.
392The default value of 128 is typically too low for robust handling of new
393connections in a heavily loaded web server environment.
394For such environments,
395we recommend increasing this value to 1024 or higher.
396The service daemon
397may itself limit the listen queue size (e.g.\&
398.Xr sendmail 8 ,
399apache) but will
400often have a directive in its configuration file to adjust the queue size up.
401Larger listen queues also do a better job of fending off denial of service
402attacks.
403.Pp
404The
405.Va kern.maxvnodes
406specifies how many vnodes and related file structures the kernel will
407cache.
408The kernel uses a modestly generous default for this parameter based on
409available physical memory.
410You generally do not want to mess with this parameter as it directly
411effects how well the kernel can cache not only file structures but also
412the underlying file data.
413.Pp
414However, situations may crop up where you wish to cache less filesystem
415data in order to make more memory available for programs.  Not only will
416this reduce kernel memory use for vnodes and inodes, it will also have a
417tendancy to reduce the impact of the buffer cache on main memory because
418recycling a vnode also frees any underlying data that has been cached for
419that vnode.
420.Pp
421It is, in fact, possible for the system to have more files open than the
422value of this tunable, but as files are closed the system will try to
423reduce the actual number of cached vnodes to match this value.
424The read-only
425.Va kern.openfiles
426sysctl may be interrogated to determine how many files are currently open
427on the system.
428.Pp
429The
430.Va vm.swap_idle_enabled
431sysctl is useful in large multi-user systems where you have lots of users
432entering and leaving the system and lots of idle processes.
433Such systems
434tend to generate a great deal of continuous pressure on free memory reserves.
435Turning this feature on and adjusting the swapout hysteresis (in idle
436seconds) via
437.Va vm.swap_idle_threshold1
438and
439.Va vm.swap_idle_threshold2
440allows you to depress the priority of pages associated with idle processes
441more quickly than the normal pageout algorithm.
442This gives a helping hand
443to the pageout daemon.
444Do not turn this option on unless you need it,
445because the tradeoff you are making is to essentially pre-page memory sooner
446rather than later, eating more swap and disk bandwidth.
447In a small system
448this option will have a detrimental effect but in a large system that is
449already doing moderate paging this option allows the VM system to stage
450whole processes into and out of memory more easily.
451.Sh LOADER TUNABLES
452Some aspects of the system behavior may not be tunable at runtime because
453memory allocations they perform must occur early in the boot process.
454To change loader tunables, you must set their values in
455.Xr loader.conf 5
456and reboot the system.
457.Pp
458.Va kern.maxusers
459is automatically sized at boot based on the amount of memory available in
460the system.  The value can be read (but not written) via sysctl.
461.Pp
462You can change this value as a loader tunable if the default resource
463limits are not sufficient.
464This tunable works primarily by adjusting
465.Va kern.maxproc ,
466so you can opt to override that instead.
467It is generally easier formulate an adjustment to
468.Va kern.maxproc
469instead of
470.Va kern.maxusers .
471.Pp
472.Va kern.maxproc
473controls most kernel auto-scaling components.  If kernel resource limits
474are not scaled high enough, setting this tunables to a higher value is
475usually sufficient.
476Generally speaking you will want to set this tunable to the upper limit
477for the number of process threads you want the kernel to be able to handle.
478The kernel may still decide to cap maxproc at a lower value if there is
479insufficient ram to scale resources as desired.
480.Pp
481Only set this tunable if the defaults are not sufficient.
482Do not use this tunable to try to trim kernel resource limits, you will
483not actually save much memory by doing so and you will leave the system
484more vulnerable to DOS attacks and runaway processes.
485.Pp
486Setting this tunable will scale the maximum number processes, pipes and
487sockets, total open files the system can support, and increase mbuf
488and mbuf-cluster limits.  These other elements can also be separately
489overridden to fine-tune the setup.  We rcommend setting this tunable
490first to create a baseline.
491.Pp
492Setting a high value presumes that you have enough physical memory to
493support the resource utilization.  For example, your system would need
494approximately 128GB of ram to reasonably support a maxproc value of
4954 million (4000000).  The default maxproc given that much ram will
496typically be in the 250000 range.
497.Pp
498Note that the PID is currently limited to 6 digits, so a system cannot
499have more than a million processes operating anyway (though the aggregate
500number of threads can be far greater).
501And yes, there is in fact no reason why a very well-endowed system
502couldn't have that many processes.
503.Pp
504.Va kern.nbuf
505sets how many filesystem buffers the kernel should cache.
506Filesystem buffers can be up to 128KB each.
507UFS typically uses an 8KB blocksize while HAMMER and HAMMER2 typically
508uses 64KB.  The system defaults usually suffice for this parameter.
509Cached buffers represent wired physical memory so specifying a value
510that is too large can result in excessive kernel memory use, and is also
511not entirely necessary since the pages backing the buffers are also
512cached by the VM page cache (which does not use wired memory).
513The buffer cache significantly improves the hot path for cached file
514accesses and dirty data.
515.Pp
516The kernel reserves (128KB * nbuf) bytes of KVM.  The actual physical
517memory use depends on the filesystem buffer size.
518It is generally more flexible to manage the filesytem cache via
519.Va kern.maxfiles
520than via
521.Va kern.nbuf ,
522but situations do arise where you might want to increase or decrease
523the latter.
524.Pp
525The
526.Va kern.dfldsiz
527and
528.Va kern.dflssiz
529tunables set the default soft limits for process data and stack size
530respectively.
531Processes may increase these up to the hard limits by calling
532.Xr setrlimit 2 .
533The
534.Va kern.maxdsiz ,
535.Va kern.maxssiz ,
536and
537.Va kern.maxtsiz
538tunables set the hard limits for process data, stack, and text size
539respectively; processes may not exceed these limits.
540The
541.Va kern.sgrowsiz
542tunable controls how much the stack segment will grow when a process
543needs to allocate more stack.
544.Pp
545.Va kern.ipc.nmbclusters
546and
547.Va kern.ipc.nmbjclusters
548may be adjusted to increase the number of network mbufs the system is
549willing to allocate.
550Each normal cluster represents approximately 2K of memory,
551so a value of 1024 represents 2M of kernel memory reserved for network
552buffers.
553Each 'j' cluster is typically 4KB, so a value of 1024 represents 4M of
554kernel memory.
555You can do a simple calculation to figure out how many you need but
556keep in mind that tcp buffer sizing is now more dynamic than it used to
557be.
558.Pp
559The defaults usually suffice but you may want to bump it up on service-heavy
560machines.
561Modern machines often need a large number of mbufs to operate services
562efficiently, values of 65536, even upwards of 262144 or more are common.
563If you are running a server, it is better to be generous than to be frugal.
564Remember the memory calculation though.
565.Pp
566Under no circumstances
567should you specify an arbitrarily high value for this parameter, it could
568lead to a boot-time crash.
569The
570.Fl m
571option to
572.Xr netstat 1
573may be used to observe network cluster use.
574.Sh KERNEL CONFIG TUNING
575There are a number of kernel options that you may have to fiddle with in
576a large-scale system.
577In order to change these options you need to be
578able to compile a new kernel from source.
579The
580.Xr config 8
581manual page and the handbook are good starting points for learning how to
582do this.
583Generally speaking, removing options to trim the size of the kernel
584is not going to save very much memory on a modern system.
585In the grand scheme of things, saving a megabyte or two is in the noise
586on a system that likely has multiple gigabytes of memory.
587.Pp
588If your motherboard is AHCI-capable then we strongly recommend turning
589on AHCI mode in the BIOS if it is not already the default.
590.Sh CPU, MEMORY, DISK, NETWORK
591The type of tuning you do depends heavily on where your system begins to
592bottleneck as load increases.
593If your system runs out of CPU (idle times
594are perpetually 0%) then you need to consider upgrading the CPU or moving to
595an SMP motherboard (multiple CPU's), or perhaps you need to revisit the
596programs that are causing the load and try to optimize them.
597If your system
598is paging to swap a lot you need to consider adding more memory.
599If your
600system is saturating the disk you typically see high CPU idle times and
601total disk saturation.
602.Xr systat 1
603can be used to monitor this.
604There are many solutions to saturated disks:
605increasing memory for caching, mirroring disks, distributing operations across
606several machines, and so forth.
607.Pp
608Finally, you might run out of network suds.
609Optimize the network path
610as much as possible.
611If you are operating a machine as a router you may need to
612setup a
613.Xr pf 4
614firewall (also see
615.Xr firewall 7 .
616.Dx
617has a very good fair-share queueing algorithm for QOS in
618.Xr pf 4 .
619.Sh BULK BUILDING MACHINE SETUP
620Generally speaking memory is at a premium when doing bulk compiles.
621Machines dedicated to bulk building usually reduce
622.Va kern.maxvnodes
623to 1000000 (1 million) vnodes or lower.  Don't get too cocky here, this
624parameter should never be reduced below around 100000 on reasonably well
625endowed machines.
626.Pp
627Bulk build setups also often benefit from a relatively large amount
628of SSD swap, allowing the system to 'burst' high-memory-usage situations
629while still maintaining optimal concurrency for other periods during the
630build which do not use as much run-time memory and prefer more parallelism.
631.Sh SOURCE OF KERNEL MEMORY USAGE
632The primary sources of kernel memory usage are:
633.Bl -tag -width ".Va kern.maxvnodes"
634.It Va kern.maxvnodes
635The maximum number of cached vnodes in the system.
636These can eat quite a bit of kernel memory, primarily due to auxiliary
637structures tracked by the HAMMER filesystem.
638It is relatively easy to configure a smaller value, but we do not
639recommend reducing this parameter below 100000.
640Smaller values directly impact the number of discrete files the
641kernel can cache data for at once.
642.It Va kern.ipc.nmbclusters , Va kern.ipc.nmbjclusters
643Calculate approximately 2KB per normal cluster and 4KB per jumbo
644cluster.
645Do not make these values too low or you risk deadlocking the network
646stack.
647.It Va kern.nbuf
648The number of filesystem buffers managed by the kernel.
649The kernel wires the underlying cached VM pages, typically 8KB (UFS) or
65064KB (HAMMER) per buffer.
651.It swap/swapcache
652Swap memory requires approximately 1MB of physical ram for each 1GB
653of swap space.
654When swapcache is used, additional memory may be required to keep
655VM objects around longer (only really reducable by reducing the
656value of
657.Va kern.maxvnodes
658which you can do post-boot if you desire).
659.It tmpfs
660Tmpfs is very useful but keep in mind that while the file data itself
661is backed by swap, the meta-data (the directory topology) requires
662wired kernel memory.
663.It mmu page tables
664Even though the underlying data pages themselves can be paged to swap,
665the page tables are usually wired into memory.
666This can create problems when a large number of processes are mmap()ing
667very large files.
668Sometimes turning on
669.Va machdep.pmap_mmu_optimize
670suffices to reduce overhead.
671Page table kernel memory use can be observed by using 'vmstat -z'
672.It Va kern.ipc.shm_use_phys
673It is sometimes necessary to force shared memory to use physical memory
674when running a large database which uses shared memory to implement its
675own data caching.
676The use of sysv shared memory in this regard allows the database to
677distinguish between data which it knows it can access instantly (i.e.
678without even having to page-in from swap) verses data which it might require
679and I/O to fetch.
680.Pp
681If you use this feature be very careful with regards to the database's
682shared memory configuration as you will be wiring the memory.
683.El
684.Sh SEE ALSO
685.Xr netstat 1 ,
686.Xr systat 1 ,
687.Xr dm 4 ,
688.Xr dummynet 4 ,
689.Xr nata 4 ,
690.Xr pf 4 ,
691.Xr login.conf 5 ,
692.Xr pf.conf 5 ,
693.Xr rc.conf 5 ,
694.Xr sysctl.conf 5 ,
695.Xr firewall 7 ,
696.Xr hier 7 ,
697.Xr boot 8 ,
698.Xr ccdconfig 8 ,
699.Xr config 8 ,
700.Xr disklabel 8 ,
701.Xr fsck 8 ,
702.Xr ifconfig 8 ,
703.Xr ipfw 8 ,
704.Xr loader 8 ,
705.Xr mount 8 ,
706.Xr newfs 8 ,
707.Xr route 8 ,
708.Xr sysctl 8 ,
709.Xr tunefs 8
710.Sh HISTORY
711The
712.Nm
713manual page was inherited from
714.Fx
715and first appeared in
716.Fx 4.3 ,
717May 2001.
718.Sh AUTHORS
719The
720.Nm
721manual page was originally written by
722.An Matthew Dillon .
723