1.\" Copyright (c) 2001, Matthew Dillon. Terms and conditions are those of 2.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in 3.\" the source tree. 4.\" 5.\" $FreeBSD: src/share/man/man7/tuning.7,v 1.1.2.30 2002/12/17 19:32:08 dillon Exp $ 6.\" $DragonFly: src/share/man/man7/tuning.7,v 1.2 2003/06/17 04:37:00 dillon Exp $ 7.\" 8.Dd May 25, 2001 9.Dt TUNING 7 10.Os 11.Sh NAME 12.Nm tuning 13.Nd performance tuning under FreeBSD 14.Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP 15When using 16.Xr disklabel 8 17or 18.Xr sysinstall 8 19to lay out your filesystems on a hard disk it is important to remember 20that hard drives can transfer data much more quickly from outer tracks 21than they can from inner tracks. 22To take advantage of this you should 23try to pack your smaller filesystems and swap closer to the outer tracks, 24follow with the larger filesystems, and end with the largest filesystems. 25It is also important to size system standard filesystems such that you 26will not be forced to resize them later as you scale the machine up. 27I usually create, in order, a 128M root, 1G swap, 128M 28.Pa /var , 29128M 30.Pa /var/tmp , 313G 32.Pa /usr , 33and use any remaining space for 34.Pa /home . 35.Pp 36You should typically size your swap space to approximately 2x main memory. 37If you do not have a lot of RAM, though, you will generally want a lot 38more swap. 39It is not recommended that you configure any less than 40256M of swap on a system and you should keep in mind future memory 41expansion when sizing the swap partition. 42The kernel's VM paging algorithms are tuned to perform best when there is 43at least 2x swap versus main memory. 44Configuring too little swap can lead 45to inefficiencies in the VM page scanning code as well as create issues 46later on if you add more memory to your machine. 47Finally, on larger systems 48with multiple SCSI disks (or multiple IDE disks operating on different 49controllers), we strongly recommend that you configure swap on each drive 50(up to four drives). 51The swap partitions on the drives should be approximately the same size. 52The kernel can handle arbitrary sizes but 53internal data structures scale to 4 times the largest swap partition. 54Keeping 55the swap partitions near the same size will allow the kernel to optimally 56stripe swap space across the N disks. 57Do not worry about overdoing it a 58little, swap space is the saving grace of 59.Ux 60and even if you do not normally use much swap, it can give you more time to 61recover from a runaway program before being forced to reboot. 62.Pp 63How you size your 64.Pa /var 65partition depends heavily on what you intend to use the machine for. 66This 67partition is primarily used to hold mailboxes, the print spool, and log 68files. 69Some people even make 70.Pa /var/log 71its own partition (but except for extreme cases it is not worth the waste 72of a partition ID). 73If your machine is intended to act as a mail 74or print server, 75or you are running a heavily visited web server, you should consider 76creating a much larger partition \(en perhaps a gig or more. 77It is very easy 78to underestimate log file storage requirements. 79.Pp 80Sizing 81.Pa /var/tmp 82depends on the kind of temporary file usage you think you will need. 83128M is 84the minimum we recommend. 85Also note that sysinstall will create a 86.Pa /tmp 87directory. 88Dedicating a partition for temporary file storage is important for 89two reasons: first, it reduces the possibility of filesystem corruption 90in a crash, and second it reduces the chance of a runaway process that 91fills up 92.Oo Pa /var Oc Ns Pa /tmp 93from blowing up more critical subsystems (mail, 94logging, etc). 95Filling up 96.Oo Pa /var Oc Ns Pa /tmp 97is a very common problem to have. 98.Pp 99In the old days there were differences between 100.Pa /tmp 101and 102.Pa /var/tmp , 103but the introduction of 104.Pa /var 105(and 106.Pa /var/tmp ) 107led to massive confusion 108by program writers so today programs haphazardly use one or the 109other and thus no real distinction can be made between the two. 110So it makes sense to have just one temporary directory and 111softlink to it from the other tmp directory locations. 112However you handle 113.Pa /tmp , 114the one thing you do not want to do is leave it sitting 115on the root partition where it might cause root to fill up or possibly 116corrupt root in a crash/reboot situation. 117.Pp 118The 119.Pa /usr 120partition holds the bulk of the files required to support the system and 121a subdirectory within it called 122.Pa /usr/local 123holds the bulk of the files installed from the 124.Xr ports 7 125hierarchy. 126If you do not use ports all that much and do not intend to keep 127system source 128.Pq Pa /usr/src 129on the machine, you can get away with 130a 1 gigabyte 131.Pa /usr 132partition. 133However, if you install a lot of ports 134(especially window managers and Linux-emulated binaries), we recommend 135at least a 2 gigabyte 136.Pa /usr 137and if you also intend to keep system source 138on the machine, we recommend a 3 gigabyte 139.Pa /usr . 140Do not underestimate the 141amount of space you will need in this partition, it can creep up and 142surprise you! 143.Pp 144The 145.Pa /home 146partition is typically used to hold user-specific data. 147I usually size it to the remainder of the disk. 148.Pp 149Why partition at all? 150Why not create one big 151.Pa / 152partition and be done with it? 153Then I do not have to worry about undersizing things! 154Well, there are several reasons this is not a good idea. 155First, 156each partition has different operational characteristics and separating them 157allows the filesystem to tune itself to those characteristics. 158For example, 159the root and 160.Pa /usr 161partitions are read-mostly, with very little writing, while 162a lot of reading and writing could occur in 163.Pa /var 164and 165.Pa /var/tmp . 166By properly 167partitioning your system fragmentation introduced in the smaller more 168heavily write-loaded partitions will not bleed over into the mostly-read 169partitions. 170Additionally, keeping the write-loaded partitions closer to 171the edge of the disk (i.e. before the really big partitions instead of after 172in the partition table) will increase I/O performance in the partitions 173where you need it the most. 174Now it is true that you might also need I/O 175performance in the larger partitions, but they are so large that shifting 176them more towards the edge of the disk will not lead to a significant 177performance improvement whereas moving 178.Pa /var 179to the edge can have a huge impact. 180Finally, there are safety concerns. 181Having a small neat root partition that 182is essentially read-only gives it a greater chance of surviving a bad crash 183intact. 184.Pp 185Properly partitioning your system also allows you to tune 186.Xr newfs 8 , 187and 188.Xr tunefs 8 189parameters. 190Tuning 191.Xr newfs 8 192requires more experience but can lead to significant improvements in 193performance. 194There are three parameters that are relatively safe to tune: 195.Em blocksize , bytes/i-node , 196and 197.Em cylinders/group . 198.Pp 199.Fx 200performs best when using 8K or 16K filesystem block sizes. 201The default filesystem block size is 16K, 202which provides best performance for most applications, 203with the exception of those that perform random access on large files 204(such as database server software). 205Such applications tend to perform better with a smaller block size, 206although modern disk characteristics are such that the performance 207gain from using a smaller block size may not be worth consideration. 208Using a block size larger than 16K 209can cause fragmentation of the buffer cache and 210lead to lower performance. 211.Pp 212The defaults may be unsuitable 213for a filesystem that requires a very large number of i-nodes 214or is intended to hold a large number of very small files. 215Such a filesystem should be created with an 8K or 4K block size. 216This also requires you to specify a smaller 217fragment size. 218We recommend always using a fragment size that is 1/8 219the block size (less testing has been done on other fragment size factors). 220The 221.Xr newfs 8 222options for this would be 223.Dq Li "newfs -f 1024 -b 8192 ..." . 224.Pp 225If a large partition is intended to be used to hold fewer, larger files, such 226as database files, you can increase the 227.Em bytes/i-node 228ratio which reduces the number of i-nodes (maximum number of files and 229directories that can be created) for that partition. 230Decreasing the number 231of i-nodes in a filesystem can greatly reduce 232.Xr fsck 8 233recovery times after a crash. 234Do not use this option 235unless you are actually storing large files on the partition, because if you 236overcompensate you can wind up with a filesystem that has lots of free 237space remaining but cannot accommodate any more files. 238Using 32768, 65536, or 262144 bytes/i-node is recommended. 239You can go higher but 240it will have only incremental effects on 241.Xr fsck 8 242recovery times. 243For example, 244.Dq Li "newfs -i 32768 ..." . 245.Pp 246.Xr tunefs 8 247may be used to further tune a filesystem. 248This command can be run in 249single-user mode without having to reformat the filesystem. 250However, this is possibly the most abused program in the system. 251Many people attempt to 252increase available filesystem space by setting the min-free percentage to 0. 253This can lead to severe filesystem fragmentation and we do not recommend 254that you do this. 255Really the only 256.Xr tunefs 8 257option worthwhile here is turning on 258.Em softupdates 259with 260.Dq Li "tunefs -n enable /filesystem" . 261(Note: in 262.Fx 4.5 263and later, softupdates can be turned on using the 264.Fl U 265option to 266.Xr newfs 8 , 267and 268.Xr sysinstall 8 269will typically enable softupdates automatically for non-root filesystems). 270Softupdates drastically improves meta-data performance, mainly file 271creation and deletion. 272We recommend enabling softupdates on most filesystems; however, there 273are two limitations to softupdates that you should be aware of when 274determining whether to use it on a filesystem. 275First, softupdates guarantees filesystem consistency in the 276case of a crash but could very easily be several seconds (even a minute!) 277behind on pending writes to the physical disk. 278If you crash you may lose more work 279than otherwise. 280Secondly, softupdates delays the freeing of filesystem 281blocks. 282If you have a filesystem (such as the root filesystem) which is 283close to full, doing a major update of it, e.g.\& 284.Dq Li "make installworld" , 285can run it out of space and cause the update to fail. 286For this reason, softupdates will not be enabled on the root filesystem 287during a typical install. There is no loss of performance since the root 288filesystem is rarely written to. 289.Pp 290A number of run-time 291.Xr mount 8 292options exist that can help you tune the system. 293The most obvious and most dangerous one is 294.Cm async . 295Do not ever use it; it is far too dangerous. 296A less dangerous and more 297useful 298.Xr mount 8 299option is called 300.Cm noatime . 301.Ux 302filesystems normally update the last-accessed time of a file or 303directory whenever it is accessed. 304This operation is handled in 305.Fx 306with a delayed write and normally does not create a burden on the system. 307However, if your system is accessing a huge number of files on a continuing 308basis the buffer cache can wind up getting polluted with atime updates, 309creating a burden on the system. 310For example, if you are running a heavily 311loaded web site, or a news server with lots of readers, you might want to 312consider turning off atime updates on your larger partitions with this 313.Xr mount 8 314option. 315However, you should not gratuitously turn off atime 316updates everywhere. 317For example, the 318.Pa /var 319filesystem customarily 320holds mailboxes, and atime (in combination with mtime) is used to 321determine whether a mailbox has new mail. 322You might as well leave 323atime turned on for mostly read-only partitions such as 324.Pa / 325and 326.Pa /usr 327as well. 328This is especially useful for 329.Pa / 330since some system utilities 331use the atime field for reporting. 332.Sh STRIPING DISKS 333In larger systems you can stripe partitions from several drives together 334to create a much larger overall partition. 335Striping can also improve 336the performance of a filesystem by splitting I/O operations across two 337or more disks. 338The 339.Xr vinum 8 340and 341.Xr ccdconfig 8 342utilities may be used to create simple striped filesystems. 343Generally 344speaking, striping smaller partitions such as the root and 345.Pa /var/tmp , 346or essentially read-only partitions such as 347.Pa /usr 348is a complete waste of time. 349You should only stripe partitions that require serious I/O performance, 350typically 351.Pa /var , /home , 352or custom partitions used to hold databases and web pages. 353Choosing the proper stripe size is also 354important. 355Filesystems tend to store meta-data on power-of-2 boundaries 356and you usually want to reduce seeking rather than increase seeking. 357This 358means you want to use a large off-center stripe size such as 1152 sectors 359so sequential I/O does not seek both disks and so meta-data is distributed 360across both disks rather than concentrated on a single disk. 361If 362you really need to get sophisticated, we recommend using a real hardware 363RAID controller from the list of 364.Fx 365supported controllers. 366.Sh SYSCTL TUNING 367.Xr sysctl 8 368variables permit system behavior to be monitored and controlled at 369run-time. 370Some sysctls simply report on the behavior of the system; others allow 371the system behavior to be modified; 372some may be set at boot time using 373.Xr rc.conf 5 , 374but most will be set via 375.Xr sysctl.conf 5 . 376There are several hundred sysctls in the system, including many that appear 377to be candidates for tuning but actually are not. 378In this document we will only cover the ones that have the greatest effect 379on the system. 380.Pp 381The 382.Va kern.ipc.shm_use_phys 383sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on). 384Setting 385this parameter to 1 will cause all System V shared memory segments to be 386mapped to unpageable physical RAM. 387This feature only has an effect if you 388are either (A) mapping small amounts of shared memory across many (hundreds) 389of processes, or (B) mapping large amounts of shared memory across any 390number of processes. 391This feature allows the kernel to remove a great deal 392of internal memory management page-tracking overhead at the cost of wiring 393the shared memory into core, making it unswappable. 394.Pp 395The 396.Va vfs.vmiodirenable 397sysctl defaults to 1 (on). 398This parameter controls how directories are cached 399by the system. 400Most directories are small and use but a single fragment 401(typically 1K) in the filesystem and even less (typically 512 bytes) in 402the buffer cache. 403However, when operating in the default mode the buffer 404cache will only cache a fixed number of directories even if you have a huge 405amount of memory. 406Turning on this sysctl allows the buffer cache to use 407the VM Page Cache to cache the directories. 408The advantage is that all of 409memory is now available for caching directories. 410The disadvantage is that 411the minimum in-core memory used to cache a directory is the physical page 412size (typically 4K) rather than 512 bytes. 413We recommend turning this option off in memory-constrained environments; 414however, when on, it will substantially improve the performance of services 415that manipulate a large number of files. 416Such services can include web caches, large mail systems, and news systems. 417Turning on this option will generally not reduce performance even with the 418wasted memory but you should experiment to find out. 419.Pp 420The 421.Va vfs.write_behind 422sysctl defaults to 1 (on). This tells the filesystem to issue media 423writes as full clusters are collected, which typically occurs when writing 424large sequential files. The idea is to avoid saturating the buffer 425cache with dirty buffers when it would not benefit I/O performance. However, 426this may stall processes and under certain circumstances you may wish to turn 427it off. 428.Pp 429The 430.Va vfs.hirunningspace 431sysctl determines how much outstanding write I/O may be queued to 432disk controllers system wide at any given instance. The default is 433usually sufficient but on machines with lots of disks you may want to bump 434it up to four or five megabytes. Note that setting too high a value 435(exceeding the buffer cache's write threshold) can lead to extremely 436bad clustering performance. Do not set this value arbitrarily high! Also, 437higher write queueing values may add latency to reads occuring at the same 438time. 439.Pp 440There are various other buffer-cache and VM page cache related sysctls. 441We do not recommend modifying these values. 442As of 443.Fx 4.3 , 444the VM system does an extremely good job tuning itself. 445.Pp 446The 447.Va net.inet.tcp.sendspace 448and 449.Va net.inet.tcp.recvspace 450sysctls are of particular interest if you are running network intensive 451applications. 452They control the amount of send and receive buffer space 453allowed for any given TCP connection. 454The default sending buffer is 32K; the default receiving buffer 455is 64K. 456You can often 457improve bandwidth utilization by increasing the default at the cost of 458eating up more kernel memory for each connection. 459We do not recommend 460increasing the defaults if you are serving hundreds or thousands of 461simultaneous connections because it is possible to quickly run the system 462out of memory due to stalled connections building up. 463But if you need 464high bandwidth over a fewer number of connections, especially if you have 465gigabit Ethernet, increasing these defaults can make a huge difference. 466You can adjust the buffer size for incoming and outgoing data separately. 467For example, if your machine is primarily doing web serving you may want 468to decrease the recvspace in order to be able to increase the 469sendspace without eating too much kernel memory. 470Note that the routing table (see 471.Xr route 8 ) 472can be used to introduce route-specific send and receive buffer size 473defaults. 474.Pp 475As an additional management tool you can use pipes in your 476firewall rules (see 477.Xr ipfw 8 ) 478to limit the bandwidth going to or from particular IP blocks or ports. 479For example, if you have a T1 you might want to limit your web traffic 480to 70% of the T1's bandwidth in order to leave the remainder available 481for mail and interactive use. 482Normally a heavily loaded web server 483will not introduce significant latencies into other services even if 484the network link is maxed out, but enforcing a limit can smooth things 485out and lead to longer term stability. 486Many people also enforce artificial 487bandwidth limitations in order to ensure that they are not charged for 488using too much bandwidth. 489.Pp 490Setting the send or receive TCP buffer to values larger then 65535 will result 491in a marginal performance improvement unless both hosts support the window 492scaling extension of the TCP protocol, which is controlled by the 493.Va net.inet.tcp.rfc1323 494sysctl. 495These extensions should be enabled and the TCP buffer size should be set 496to a value larger than 65536 in order to obtain good performance from 497certain types of network links; specifically, gigabit WAN links and 498high-latency satellite links. 499RFC1323 support is enabled by default. 500.Pp 501The 502.Va net.inet.tcp.always_keepalive 503sysctl determines whether or not the TCP implementation should attempt 504to detect dead TCP connections by intermittently delivering 505.Dq keepalives 506on the connection. 507By default, this is enabled for all applications; by setting this 508sysctl to 0, only applications that specifically request keepalives 509will use them. 510In most environments, TCP keepalives will improve the management of 511system state by expiring dead TCP connections, particularly for 512systems serving dialup users who may not always terminate individual 513TCP connections before disconnecting from the network. 514However, in some environments, temporary network outages may be 515incorrectly identified as dead sessions, resulting in unexpectedly 516terminated TCP connections. 517In such environments, setting the sysctl to 0 may reduce the occurrence of 518TCP session disconnections. 519.Pp 520The 521.Va net.inet.tcp.delayed_ack 522TCP feature is largly misunderstood. Historically speaking this feature 523was designed to allow the acknowledgement to transmitted data to be returned 524along with the response. For example, when you type over a remote shell 525the acknowledgement to the character you send can be returned along with the 526data representing the echo of the character. With delayed acks turned off 527the acknowledgement may be sent in its own packet before the remote service 528has a chance to echo the data it just received. This same concept also 529applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the 530number of tiny packets flowing across the network in half. The FreeBSD 531delayed-ack implementation also follows the TCP protocol rule that 532at least every other packet be acknowledged even if the standard 100ms 533timeout has not yet passed. Normally the worst a delayed ack can do is 534slightly delay the teardown of a connection, or slightly delay the ramp-up 535of a slow-start TCP connection. While we aren't sure we believe that 536the several FAQs related to packages such as SAMBA and SQUID which advise 537turning off delayed acks may be refering to the slow-start issue. In FreeBSD 538it would be more beneficial to increase the slow-start flightsize via 539the 540.Va net.inet.tcp.slowstart_flightsize 541sysctl rather then disable delayed acks. 542.Pp 543The 544.Va net.inet.tcp.inflight_enable 545sysctl turns on bandwidth delay product limiting for all TCP connections. 546The system will attempt to calculate the bandwidth delay product for each 547connection and limit the amount of data queued to the network to just the 548amount required to maintain optimum throughput. This feature is useful 549if you are serving data over modems, GigE, or high speed WAN links (or 550any other link with a high bandwidth*delay product), especially if you are 551also using window scaling or have configured a large send window. If 552you enable this option you should also be sure to set 553.Va net.inet.tcp.inflight_debug 554to 0 (disable debugging), and for production use setting 555.Va net.inet.tcp.inflight_min 556to at least 6144 may be beneficial. Note, however, that setting high 557minimums may effectively disable bandwidth limiting depending on the link. 558The limiting feature reduces the amount of data built up in intermediate 559router and switch packet queues as well as reduces the amount of data built 560up in the local host's interface queue. With fewer packets queued up, 561interactive connections, especially over slow modems, will also be able 562to operate with lower round trip times. However, note that this feature 563only effects data transmission (uploading / server-side). It does not 564effect data reception (downloading). 565.Pp 566Adjusting 567.Va net.inet.tcp.inflight_stab 568is not recommended. 569This parameter defaults to 20, representing 2 maximal packets added 570to the bandwidth delay product window calculation. The additional 571window is required to stabilize the algorithm and improve responsiveness 572to changing conditions, but it can also result in higher ping times 573over slow links (though still much lower then you would get without 574the inflight algorithm). In such cases you may 575wish to try reducing this parameter to 15, 10, or 5, and you may also 576have to reduce 577.Va net.inet.tcp.inflight_min 578(for example, to 3500) to get the desired effect. Reducing these parameters 579should be done as a last resort only. 580.Pp 581The 582.Va net.inet.ip.portrange.* 583sysctls control the port number ranges automatically bound to TCP and UDP 584sockets. There are three ranges: A low range, a default range, and a 585high range, selectable via an IP_PORTRANGE setsockopt() call. Most 586network programs use the default range which is controlled by 587.Va net.inet.ip.portrange.first 588and 589.Va net.inet.ip.portrange.last , 590which defaults to 1024 and 5000 respectively. Bound port ranges are 591used for outgoing connections and it is possible to run the system out 592of ports under certain circumstances. This most commonly occurs when you are 593running a heavily loaded web proxy. The port range is not an issue 594when running serves which handle mainly incoming connections such as a 595normal web server, or has a limited number of outgoing connections such 596as a mail relay. For situations where you may run yourself out of 597ports we recommend increasing 598.Va net.inet.ip.portrange.last 599modestly. A value of 10000 or 20000 or 30000 may be reasonable. You should 600also consider firewall effects when changing the port range. Some firewalls 601may block large ranges of ports (usually low-numbered ports) and expect systems 602to use higher ranges of ports for outgoing connections. For this reason 603we do not recommend that 604.Va net.inet.ip.portrange.first 605be lowered. 606.Pp 607The 608.Va kern.ipc.somaxconn 609sysctl limits the size of the listen queue for accepting new TCP connections. 610The default value of 128 is typically too low for robust handling of new 611connections in a heavily loaded web server environment. 612For such environments, 613we recommend increasing this value to 1024 or higher. 614The service daemon 615may itself limit the listen queue size (e.g.\& 616.Xr sendmail 8 , 617apache) but will 618often have a directive in its configuration file to adjust the queue size up. 619Larger listen queues also do a better job of fending off denial of service 620attacks. 621.Pp 622The 623.Va kern.maxfiles 624sysctl determines how many open files the system supports. 625The default is 626typically a few thousand but you may need to bump this up to ten or twenty 627thousand if you are running databases or large descriptor-heavy daemons. 628The read-only 629.Va kern.openfiles 630sysctl may be interrogated to determine the current number of open files 631on the system. 632.Pp 633The 634.Va vm.swap_idle_enabled 635sysctl is useful in large multi-user systems where you have lots of users 636entering and leaving the system and lots of idle processes. 637Such systems 638tend to generate a great deal of continuous pressure on free memory reserves. 639Turning this feature on and adjusting the swapout hysteresis (in idle 640seconds) via 641.Va vm.swap_idle_threshold1 642and 643.Va vm.swap_idle_threshold2 644allows you to depress the priority of pages associated with idle processes 645more quickly then the normal pageout algorithm. 646This gives a helping hand 647to the pageout daemon. 648Do not turn this option on unless you need it, 649because the tradeoff you are making is to essentially pre-page memory sooner 650rather then later, eating more swap and disk bandwidth. 651In a small system 652this option will have a detrimental effect but in a large system that is 653already doing moderate paging this option allows the VM system to stage 654whole processes into and out of memory more easily. 655.Sh LOADER TUNABLES 656Some aspects of the system behavior may not be tunable at runtime because 657memory allocations they perform must occur early in the boot process. 658To change loader tunables, you must set their values in 659.Xr loader.conf 5 660and reboot the system. 661.Pp 662.Va kern.maxusers 663controls the scaling of a number of static system tables, including defaults 664for the maximum number of open files, sizing of network memory resources, etc. 665As of 666.Fx 4.5 , 667.Va kern.maxusers 668is automatically sized at boot based on the amount of memory available in 669the system, and may be determined at run-time by inspecting the value of the 670read-only 671.Va kern.maxusers 672sysctl. 673Some sites will require larger or smaller values of 674.Va kern.maxusers 675and may set it as a loader tunable; values of 64, 128, and 256 are not 676uncommon. 677We do not recommend going above 256 unless you need a huge number 678of file descriptors; many of the tunable values set to their defaults by 679.Va kern.maxusers 680may be individually overridden at boot-time or run-time as described 681elsewhere in this document. 682Systems older than 683.Fx 4.4 684must set this value via the kernel 685.Xr config 8 686option 687.Cd maxusers 688instead. 689.Pp 690.Va kern.ipc.nmbclusters 691may be adjusted to increase the number of network mbufs the system is 692willing to allocate. 693Each cluster represents approximately 2K of memory, 694so a value of 1024 represents 2M of kernel memory reserved for network 695buffers. 696You can do a simple calculation to figure out how many you need. 697If you have a web server which maxes out at 1000 simultaneous connections, 698and each connection eats a 16K receive and 16K send buffer, you need 699approximately 32MB worth of network buffers to deal with it. 700A good rule of 701thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. 702So for this case 703you would want to set 704.Va kern.ipc.nmbclusters 705to 32768. 706We recommend values between 7071024 and 4096 for machines with moderates amount of memory, and between 4096 708and 32768 for machines with greater amounts of memory. 709Under no circumstances 710should you specify an arbitrarily high value for this parameter, it could 711lead to a boot-time crash. 712The 713.Fl m 714option to 715.Xr netstat 1 716may be used to observe network cluster use. 717Older versions of 718.Fx 719do not have this tunable and require that the 720kernel 721.Xr config 8 722option 723.Dv NMBCLUSTERS 724be set instead. 725.Pp 726More and more programs are using the 727.Xr sendfile 2 728system call to transmit files over the network. 729The 730.Va kern.ipc.nsfbufs 731sysctl controls the number of filesystem buffers 732.Xr sendfile 2 733is allowed to use to perform its work. 734This parameter nominally scales 735with 736.Va kern.maxusers 737so you should not need to modify this parameter except under extreme 738circumstances. 739.Sh KERNEL CONFIG TUNING 740There are a number of kernel options that you may have to fiddle with in 741a large-scale system. 742In order to change these options you need to be 743able to compile a new kernel from source. 744The 745.Xr config 8 746manual page and the handbook are good starting points for learning how to 747do this. 748Generally the first thing you do when creating your own custom 749kernel is to strip out all the drivers and services you do not use. 750Removing things like 751.Dv INET6 752and drivers you do not have will reduce the size of your kernel, sometimes 753by a megabyte or more, leaving more memory available for applications. 754.Pp 755.Dv SCSI_DELAY 756and 757.Dv IDE_DELAY 758may be used to reduce system boot times. 759The defaults are fairly high and 760can be responsible for 15+ seconds of delay in the boot process. 761Reducing 762.Dv SCSI_DELAY 763to 5 seconds usually works (especially with modern drives). 764Reducing 765.Dv IDE_DELAY 766also works but you have to be a little more careful. 767.Pp 768There are a number of 769.Dv *_CPU 770options that can be commented out. 771If you only want the kernel to run 772on a Pentium class CPU, you can easily remove 773.Dv I386_CPU 774and 775.Dv I486_CPU , 776but only remove 777.Dv I586_CPU 778if you are sure your CPU is being recognized as a Pentium II or better. 779Some clones may be recognized as a Pentium or even a 486 and not be able 780to boot without those options. 781If it works, great! 782The operating system 783will be able to better-use higher-end CPU features for MMU, task switching, 784timebase, and even device operations. 785Additionally, higher-end CPUs support 7864MB MMU pages, which the kernel uses to map the kernel itself into memory, 787increasing its efficiency under heavy syscall loads. 788.Sh IDE WRITE CACHING 789.Fx 4.3 790flirted with turning off IDE write caching. 791This reduced write bandwidth 792to IDE disks but was considered necessary due to serious data consistency 793issues introduced by hard drive vendors. 794Basically the problem is that 795IDE drives lie about when a write completes. 796With IDE write caching turned 797on, IDE hard drives will not only write data to disk out of order, they 798will sometimes delay some of the blocks indefinitely under heavy disk 799load. 800A crash or power failure can result in serious filesystem 801corruption. 802So our default was changed to be safe. 803Unfortunately, the 804result was such a huge loss in performance that we caved in and changed the 805default back to on after the release. 806You should check the default on 807your system by observing the 808.Va hw.ata.wc 809sysctl variable. 810If IDE write caching is turned off, you can turn it back 811on by setting the 812.Va hw.ata.wc 813loader tunable to 1. 814More information on tuning the ATA driver system may be found in the 815.Xr ata 4 816man page. 817.Pp 818There is a new experimental feature for IDE hard drives called 819.Va hw.ata.tags 820(you also set this in the boot loader) which allows write caching to be safely 821turned on. 822This brings SCSI tagging features to IDE drives. 823As of this 824writing only IBM DPTA and DTLA drives support the feature. 825Warning! 826These 827drives apparently have quality control problems and I do not recommend 828purchasing them at this time. 829If you need performance, go with SCSI. 830.Sh CPU, MEMORY, DISK, NETWORK 831The type of tuning you do depends heavily on where your system begins to 832bottleneck as load increases. 833If your system runs out of CPU (idle times 834are perpetually 0%) then you need to consider upgrading the CPU or moving to 835an SMP motherboard (multiple CPU's), or perhaps you need to revisit the 836programs that are causing the load and try to optimize them. 837If your system 838is paging to swap a lot you need to consider adding more memory. 839If your 840system is saturating the disk you typically see high CPU idle times and 841total disk saturation. 842.Xr systat 1 843can be used to monitor this. 844There are many solutions to saturated disks: 845increasing memory for caching, mirroring disks, distributing operations across 846several machines, and so forth. 847If disk performance is an issue and you 848are using IDE drives, switching to SCSI can help a great deal. 849While modern 850IDE drives compare with SCSI in raw sequential bandwidth, the moment you 851start seeking around the disk SCSI drives usually win. 852.Pp 853Finally, you might run out of network suds. 854The first line of defense for 855improving network performance is to make sure you are using switches instead 856of hubs, especially these days where switches are almost as cheap. 857Hubs 858have severe problems under heavy loads due to collision backoff and one bad 859host can severely degrade the entire LAN. 860Second, optimize the network path 861as much as possible. 862For example, in 863.Xr firewall 7 864we describe a firewall protecting internal hosts with a topology where 865the externally visible hosts are not routed through it. 866Use 100BaseT rather 867than 10BaseT, or use 1000BaseT rather then 100BaseT, depending on your needs. 868Most bottlenecks occur at the WAN link (e.g.\& 869modem, T1, DSL, whatever). 870If expanding the link is not an option it may be possible to use the 871.Xr dummynet 4 872feature to implement peak shaving or other forms of traffic shaping to 873prevent the overloaded service (such as web services) from affecting other 874services (such as email), or vice versa. 875In home installations this could 876be used to give interactive traffic (your browser, 877.Xr ssh 1 878logins) priority 879over services you export from your box (web services, email). 880.Sh SEE ALSO 881.Xr netstat 1 , 882.Xr systat 1 , 883.Xr ata 4 , 884.Xr dummynet 4 , 885.Xr login.conf 5 , 886.Xr rc.conf 5 , 887.Xr sysctl.conf 5 , 888.Xr firewall 7 , 889.Xr hier 7 , 890.Xr ports 7 , 891.Xr boot 8 , 892.Xr ccdconfig 8 , 893.Xr config 8 , 894.Xr disklabel 8 , 895.Xr fsck 8 , 896.Xr ifconfig 8 , 897.Xr ipfw 8 , 898.Xr loader 8 , 899.Xr mount 8 , 900.Xr newfs 8 , 901.Xr route 8 , 902.Xr sysctl 8 , 903.Xr sysinstall 8 , 904.Xr tunefs 8 , 905.Xr vinum 8 906.Sh HISTORY 907The 908.Nm 909manual page was originally written by 910.An Matthew Dillon 911and first appeared 912in 913.Fx 4.3 , 914May 2001. 915