1.\" Copyright (c) 2001 Matthew Dillon. Terms and conditions are those of 2.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in 3.\" the source tree. 4.\" 5.\" $FreeBSD: src/share/man/man7/tuning.7,v 1.1.2.30 2002/12/17 19:32:08 dillon Exp $ 6.\" $DragonFly: src/share/man/man7/tuning.7,v 1.10 2007/02/19 11:10:11 swildner Exp $ 7.\" 8.Dd May 11, 2006 9.Dt TUNING 7 10.Os 11.Sh NAME 12.Nm tuning 13.Nd performance tuning under 14.Dx 15.Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP 16When using 17.Xr disklabel 8 18or the 19.Dx 20installer 21to lay out your filesystems on a hard disk it is important to remember 22that hard drives can transfer data much more quickly from outer tracks 23than they can from inner tracks. 24To take advantage of this you should 25try to pack your smaller filesystems and swap closer to the outer tracks, 26follow with the larger filesystems, and end with the largest filesystems. 27It is also important to size system standard filesystems such that you 28will not be forced to resize them later as you scale the machine up. 29I usually create, in order, a 128M root, 1G swap, 128M 30.Pa /var , 31128M 32.Pa /var/tmp , 333G 34.Pa /usr , 35and use any remaining space for 36.Pa /home . 37.Pp 38You should typically size your swap space to approximately 2x main memory. 39If you do not have a lot of RAM, though, you will generally want a lot 40more swap. 41It is not recommended that you configure any less than 42256M of swap on a system and you should keep in mind future memory 43expansion when sizing the swap partition. 44The kernel's VM paging algorithms are tuned to perform best when there is 45at least 2x swap versus main memory. 46Configuring too little swap can lead 47to inefficiencies in the VM page scanning code as well as create issues 48later on if you add more memory to your machine. 49Finally, on larger systems 50with multiple SCSI disks (or multiple IDE disks operating on different 51controllers), we strongly recommend that you configure swap on each drive 52(up to four drives). 53The swap partitions on the drives should be approximately the same size. 54The kernel can handle arbitrary sizes but 55internal data structures scale to 4 times the largest swap partition. 56Keeping 57the swap partitions near the same size will allow the kernel to optimally 58stripe swap space across the N disks. 59Do not worry about overdoing it a 60little, swap space is the saving grace of 61.Ux 62and even if you do not normally use much swap, it can give you more time to 63recover from a runaway program before being forced to reboot. 64.Pp 65How you size your 66.Pa /var 67partition depends heavily on what you intend to use the machine for. 68This 69partition is primarily used to hold mailboxes, the print spool, and log 70files. 71Some people even make 72.Pa /var/log 73its own partition (but except for extreme cases it is not worth the waste 74of a partition ID). 75If your machine is intended to act as a mail 76or print server, 77or you are running a heavily visited web server, you should consider 78creating a much larger partition \(en perhaps a gig or more. 79It is very easy 80to underestimate log file storage requirements. 81.Pp 82Sizing 83.Pa /var/tmp 84depends on the kind of temporary file usage you think you will need. 85128M is 86the minimum we recommend. 87Also note that the 88.Dx 89installer will create a 90.Pa /tmp 91directory. 92Dedicating a partition for temporary file storage is important for 93two reasons: first, it reduces the possibility of filesystem corruption 94in a crash, and second it reduces the chance of a runaway process that 95fills up 96.Oo Pa /var Oc Ns Pa /tmp 97from blowing up more critical subsystems (mail, 98logging, etc). 99Filling up 100.Oo Pa /var Oc Ns Pa /tmp 101is a very common problem to have. 102.Pp 103In the old days there were differences between 104.Pa /tmp 105and 106.Pa /var/tmp , 107but the introduction of 108.Pa /var 109(and 110.Pa /var/tmp ) 111led to massive confusion 112by program writers so today programs haphazardly use one or the 113other and thus no real distinction can be made between the two. 114So it makes sense to have just one temporary directory and 115softlink to it from the other tmp directory locations. 116However you handle 117.Pa /tmp , 118the one thing you do not want to do is leave it sitting 119on the root partition where it might cause root to fill up or possibly 120corrupt root in a crash/reboot situation. 121.Pp 122The 123.Pa /usr 124partition holds the bulk of the files required to support the system and 125a subdirectory within it called 126.Pa /usr/pkg 127holds the bulk of the files installed from the 128pkgsrc collection. 129If you do not use pkgsrc all that much and do not intend to keep 130system source 131.Pq Pa /usr/src 132on the machine, you can get away with 133a 1 gigabyte 134.Pa /usr 135partition. 136However, if you install a lot of packages 137(especially window managers and Linux-emulated binaries), we recommend 138at least a 2 gigabyte 139.Pa /usr 140and if you also intend to keep system source 141on the machine, we recommend a 3 gigabyte 142.Pa /usr . 143Do not underestimate the 144amount of space you will need in this partition, it can creep up and 145surprise you! 146.Pp 147The 148.Pa /home 149partition is typically used to hold user-specific data. 150I usually size it to the remainder of the disk. 151.Pp 152Why partition at all? 153Why not create one big 154.Pa / 155partition and be done with it? 156Then I do not have to worry about undersizing things! 157Well, there are several reasons this is not a good idea. 158First, 159each partition has different operational characteristics and separating them 160allows the filesystem to tune itself to those characteristics. 161For example, 162the root and 163.Pa /usr 164partitions are read-mostly, with very little writing, while 165a lot of reading and writing could occur in 166.Pa /var 167and 168.Pa /var/tmp . 169By properly 170partitioning your system fragmentation introduced in the smaller more 171heavily write-loaded partitions will not bleed over into the mostly-read 172partitions. 173Additionally, keeping the write-loaded partitions closer to 174the edge of the disk (i.e. before the really big partitions instead of after 175in the partition table) will increase I/O performance in the partitions 176where you need it the most. 177Now it is true that you might also need I/O 178performance in the larger partitions, but they are so large that shifting 179them more towards the edge of the disk will not lead to a significant 180performance improvement whereas moving 181.Pa /var 182to the edge can have a huge impact. 183Finally, there are safety concerns. 184Having a small neat root partition that 185is essentially read-only gives it a greater chance of surviving a bad crash 186intact. 187.Pp 188Properly partitioning your system also allows you to tune 189.Xr newfs 8 , 190and 191.Xr tunefs 8 192parameters. 193Tuning 194.Xr newfs 8 195requires more experience but can lead to significant improvements in 196performance. 197There are three parameters that are relatively safe to tune: 198.Em blocksize , bytes/i-node , 199and 200.Em cylinders/group . 201.Pp 202.Dx 203performs best when using 8K or 16K filesystem block sizes. 204The default filesystem block size is 16K, 205which provides best performance for most applications, 206with the exception of those that perform random access on large files 207(such as database server software). 208Such applications tend to perform better with a smaller block size, 209although modern disk characteristics are such that the performance 210gain from using a smaller block size may not be worth consideration. 211Using a block size larger than 16K 212can cause fragmentation of the buffer cache and 213lead to lower performance. 214.Pp 215The defaults may be unsuitable 216for a filesystem that requires a very large number of i-nodes 217or is intended to hold a large number of very small files. 218Such a filesystem should be created with an 8K or 4K block size. 219This also requires you to specify a smaller 220fragment size. 221We recommend always using a fragment size that is \(18 222the block size (less testing has been done on other fragment size factors). 223The 224.Xr newfs 8 225options for this would be 226.Dq Li "newfs -f 1024 -b 8192 ..." . 227.Pp 228If a large partition is intended to be used to hold fewer, larger files, such 229as database files, you can increase the 230.Em bytes/i-node 231ratio which reduces the number of i-nodes (maximum number of files and 232directories that can be created) for that partition. 233Decreasing the number 234of i-nodes in a filesystem can greatly reduce 235.Xr fsck 8 236recovery times after a crash. 237Do not use this option 238unless you are actually storing large files on the partition, because if you 239overcompensate you can wind up with a filesystem that has lots of free 240space remaining but cannot accommodate any more files. 241Using 32768, 65536, or 262144 bytes/i-node is recommended. 242You can go higher but 243it will have only incremental effects on 244.Xr fsck 8 245recovery times. 246For example, 247.Dq Li "newfs -i 32768 ..." . 248.Pp 249.Xr tunefs 8 250may be used to further tune a filesystem. 251This command can be run in 252single-user mode without having to reformat the filesystem. 253However, this is possibly the most abused program in the system. 254Many people attempt to 255increase available filesystem space by setting the min-free percentage to 0. 256This can lead to severe filesystem fragmentation and we do not recommend 257that you do this. 258Really the only 259.Xr tunefs 8 260option worthwhile here is turning on 261.Em softupdates 262with 263.Dq Li "tunefs -n enable /filesystem" . 264(Note: in 265.Dx , 266softupdates can be turned on using the 267.Fl U 268option to 269.Xr newfs 8 , 270and 271.Dx 272installer will typically enable softupdates automatically for 273non-root filesystems). 274Softupdates drastically improves meta-data performance, mainly file 275creation and deletion. 276We recommend enabling softupdates on most filesystems; however, there 277are two limitations to softupdates that you should be aware of when 278determining whether to use it on a filesystem. 279First, softupdates guarantees filesystem consistency in the 280case of a crash but could very easily be several seconds (even a minute!) 281behind on pending writes to the physical disk. 282If you crash you may lose more work 283than otherwise. 284Secondly, softupdates delays the freeing of filesystem 285blocks. 286If you have a filesystem (such as the root filesystem) which is 287close to full, doing a major update of it, e.g.\& 288.Dq Li "make installworld" , 289can run it out of space and cause the update to fail. 290For this reason, softupdates will not be enabled on the root filesystem 291during a typical install. There is no loss of performance since the root 292filesystem is rarely written to. 293.Pp 294A number of run-time 295.Xr mount 8 296options exist that can help you tune the system. 297The most obvious and most dangerous one is 298.Cm async . 299Do not ever use it; it is far too dangerous. 300A less dangerous and more 301useful 302.Xr mount 8 303option is called 304.Cm noatime . 305.Ux 306filesystems normally update the last-accessed time of a file or 307directory whenever it is accessed. 308This operation is handled in 309.Dx 310with a delayed write and normally does not create a burden on the system. 311However, if your system is accessing a huge number of files on a continuing 312basis the buffer cache can wind up getting polluted with atime updates, 313creating a burden on the system. 314For example, if you are running a heavily 315loaded web site, or a news server with lots of readers, you might want to 316consider turning off atime updates on your larger partitions with this 317.Xr mount 8 318option. 319However, you should not gratuitously turn off atime 320updates everywhere. 321For example, the 322.Pa /var 323filesystem customarily 324holds mailboxes, and atime (in combination with mtime) is used to 325determine whether a mailbox has new mail. 326You might as well leave 327atime turned on for mostly read-only partitions such as 328.Pa / 329and 330.Pa /usr 331as well. 332This is especially useful for 333.Pa / 334since some system utilities 335use the atime field for reporting. 336.Sh STRIPING DISKS 337In larger systems you can stripe partitions from several drives together 338to create a much larger overall partition. 339Striping can also improve 340the performance of a filesystem by splitting I/O operations across two 341or more disks. 342The 343.Xr vinum 8 344and 345.Xr ccdconfig 8 346utilities may be used to create simple striped filesystems. 347Generally 348speaking, striping smaller partitions such as the root and 349.Pa /var/tmp , 350or essentially read-only partitions such as 351.Pa /usr 352is a complete waste of time. 353You should only stripe partitions that require serious I/O performance, 354typically 355.Pa /var , /home , 356or custom partitions used to hold databases and web pages. 357Choosing the proper stripe size is also 358important. 359Filesystems tend to store meta-data on power-of-2 boundaries 360and you usually want to reduce seeking rather than increase seeking. 361This 362means you want to use a large off-center stripe size such as 1152 sectors 363so sequential I/O does not seek both disks and so meta-data is distributed 364across both disks rather than concentrated on a single disk. 365If 366you really need to get sophisticated, we recommend using a real hardware 367RAID controller from the list of 368.Dx 369supported controllers. 370.Sh SYSCTL TUNING 371.Xr sysctl 8 372variables permit system behavior to be monitored and controlled at 373run-time. 374Some sysctls simply report on the behavior of the system; others allow 375the system behavior to be modified; 376some may be set at boot time using 377.Xr rc.conf 5 , 378but most will be set via 379.Xr sysctl.conf 5 . 380There are several hundred sysctls in the system, including many that appear 381to be candidates for tuning but actually are not. 382In this document we will only cover the ones that have the greatest effect 383on the system. 384.Pp 385The 386.Va kern.ipc.shm_use_phys 387sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on). 388Setting 389this parameter to 1 will cause all System V shared memory segments to be 390mapped to unpageable physical RAM. 391This feature only has an effect if you 392are either (A) mapping small amounts of shared memory across many (hundreds) 393of processes, or (B) mapping large amounts of shared memory across any 394number of processes. 395This feature allows the kernel to remove a great deal 396of internal memory management page-tracking overhead at the cost of wiring 397the shared memory into core, making it unswappable. 398.Pp 399The 400.Va vfs.write_behind 401sysctl defaults to 1 (on). This tells the filesystem to issue media 402writes as full clusters are collected, which typically occurs when writing 403large sequential files. The idea is to avoid saturating the buffer 404cache with dirty buffers when it would not benefit I/O performance. However, 405this may stall processes and under certain circumstances you may wish to turn 406it off. 407.Pp 408The 409.Va vfs.hirunningspace 410sysctl determines how much outstanding write I/O may be queued to 411disk controllers system wide at any given instance. The default is 412usually sufficient but on machines with lots of disks you may want to bump 413it up to four or five megabytes. Note that setting too high a value 414(exceeding the buffer cache's write threshold) can lead to extremely 415bad clustering performance. Do not set this value arbitrarily high! Also, 416higher write queueing values may add latency to reads occuring at the same 417time. 418.Pp 419There are various other buffer-cache and VM page cache related sysctls. 420We do not recommend modifying these values. 421As of 422.Fx 4.3 , 423the VM system does an extremely good job tuning itself. 424.Pp 425The 426.Va net.inet.tcp.sendspace 427and 428.Va net.inet.tcp.recvspace 429sysctls are of particular interest if you are running network intensive 430applications. 431They control the amount of send and receive buffer space 432allowed for any given TCP connection. 433The default sending buffer is 32K; the default receiving buffer 434is 64K. 435You can often 436improve bandwidth utilization by increasing the default at the cost of 437eating up more kernel memory for each connection. 438We do not recommend 439increasing the defaults if you are serving hundreds or thousands of 440simultaneous connections because it is possible to quickly run the system 441out of memory due to stalled connections building up. 442But if you need 443high bandwidth over a fewer number of connections, especially if you have 444gigabit Ethernet, increasing these defaults can make a huge difference. 445You can adjust the buffer size for incoming and outgoing data separately. 446For example, if your machine is primarily doing web serving you may want 447to decrease the recvspace in order to be able to increase the 448sendspace without eating too much kernel memory. 449Note that the routing table (see 450.Xr route 8 ) 451can be used to introduce route-specific send and receive buffer size 452defaults. 453.Pp 454As an additional management tool you can use pipes in your 455firewall rules (see 456.Xr ipfw 8 ) 457to limit the bandwidth going to or from particular IP blocks or ports. 458For example, if you have a T1 you might want to limit your web traffic 459to 70% of the T1's bandwidth in order to leave the remainder available 460for mail and interactive use. 461Normally a heavily loaded web server 462will not introduce significant latencies into other services even if 463the network link is maxed out, but enforcing a limit can smooth things 464out and lead to longer term stability. 465Many people also enforce artificial 466bandwidth limitations in order to ensure that they are not charged for 467using too much bandwidth. 468.Pp 469Setting the send or receive TCP buffer to values larger then 65535 will result 470in a marginal performance improvement unless both hosts support the window 471scaling extension of the TCP protocol, which is controlled by the 472.Va net.inet.tcp.rfc1323 473sysctl. 474These extensions should be enabled and the TCP buffer size should be set 475to a value larger than 65536 in order to obtain good performance from 476certain types of network links; specifically, gigabit WAN links and 477high-latency satellite links. 478RFC1323 support is enabled by default. 479.Pp 480The 481.Va net.inet.tcp.always_keepalive 482sysctl determines whether or not the TCP implementation should attempt 483to detect dead TCP connections by intermittently delivering 484.Dq keepalives 485on the connection. 486By default, this is disabled for all applications, only applications 487that specifically request keepalives will use them. 488In most environments, TCP keepalives will improve the management of 489system state by expiring dead TCP connections, particularly for 490systems serving dialup users who may not always terminate individual 491TCP connections before disconnecting from the network. 492However, in some environments, temporary network outages may be 493incorrectly identified as dead sessions, resulting in unexpectedly 494terminated TCP connections. 495In such environments, setting the sysctl to 0 may reduce the occurrence of 496TCP session disconnections. 497.Pp 498The 499.Va net.inet.tcp.delayed_ack 500TCP feature is largly misunderstood. Historically speaking this feature 501was designed to allow the acknowledgement to transmitted data to be returned 502along with the response. For example, when you type over a remote shell 503the acknowledgement to the character you send can be returned along with the 504data representing the echo of the character. With delayed acks turned off 505the acknowledgement may be sent in its own packet before the remote service 506has a chance to echo the data it just received. This same concept also 507applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the 508number of tiny packets flowing across the network in half. The 509.Dx 510delayed-ack implementation also follows the TCP protocol rule that 511at least every other packet be acknowledged even if the standard 100ms 512timeout has not yet passed. Normally the worst a delayed ack can do is 513slightly delay the teardown of a connection, or slightly delay the ramp-up 514of a slow-start TCP connection. While we aren't sure we believe that 515the several FAQs related to packages such as SAMBA and SQUID which advise 516turning off delayed acks may be refering to the slow-start issue. 517.Pp 518The 519.Va net.inet.tcp.inflight_enable 520sysctl turns on bandwidth delay product limiting for all TCP connections. 521The system will attempt to calculate the bandwidth delay product for each 522connection and limit the amount of data queued to the network to just the 523amount required to maintain optimum throughput. This feature is useful 524if you are serving data over modems, GigE, or high speed WAN links (or 525any other link with a high bandwidth*delay product), especially if you are 526also using window scaling or have configured a large send window. If 527you enable this option you should also be sure to set 528.Va net.inet.tcp.inflight_debug 529to 0 (disable debugging), and for production use setting 530.Va net.inet.tcp.inflight_min 531to at least 6144 may be beneficial. Note, however, that setting high 532minimums may effectively disable bandwidth limiting depending on the link. 533The limiting feature reduces the amount of data built up in intermediate 534router and switch packet queues as well as reduces the amount of data built 535up in the local host's interface queue. With fewer packets queued up, 536interactive connections, especially over slow modems, will also be able 537to operate with lower round trip times. However, note that this feature 538only effects data transmission (uploading / server-side). It does not 539effect data reception (downloading). 540.Pp 541Adjusting 542.Va net.inet.tcp.inflight_stab 543is not recommended. 544This parameter defaults to 20, representing 2 maximal packets added 545to the bandwidth delay product window calculation. The additional 546window is required to stabilize the algorithm and improve responsiveness 547to changing conditions, but it can also result in higher ping times 548over slow links (though still much lower then you would get without 549the inflight algorithm). In such cases you may 550wish to try reducing this parameter to 15, 10, or 5, and you may also 551have to reduce 552.Va net.inet.tcp.inflight_min 553(for example, to 3500) to get the desired effect. Reducing these parameters 554should be done as a last resort only. 555.Pp 556The 557.Va net.inet.ip.portrange.* 558sysctls control the port number ranges automatically bound to TCP and UDP 559sockets. There are three ranges: A low range, a default range, and a 560high range, selectable via an IP_PORTRANGE setsockopt() call. Most 561network programs use the default range which is controlled by 562.Va net.inet.ip.portrange.first 563and 564.Va net.inet.ip.portrange.last , 565which defaults to 1024 and 5000 respectively. Bound port ranges are 566used for outgoing connections and it is possible to run the system out 567of ports under certain circumstances. This most commonly occurs when you are 568running a heavily loaded web proxy. The port range is not an issue 569when running serves which handle mainly incoming connections such as a 570normal web server, or has a limited number of outgoing connections such 571as a mail relay. For situations where you may run yourself out of 572ports we recommend increasing 573.Va net.inet.ip.portrange.last 574modestly. A value of 10000 or 20000 or 30000 may be reasonable. You should 575also consider firewall effects when changing the port range. Some firewalls 576may block large ranges of ports (usually low-numbered ports) and expect systems 577to use higher ranges of ports for outgoing connections. For this reason 578we do not recommend that 579.Va net.inet.ip.portrange.first 580be lowered. 581.Pp 582The 583.Va kern.ipc.somaxconn 584sysctl limits the size of the listen queue for accepting new TCP connections. 585The default value of 128 is typically too low for robust handling of new 586connections in a heavily loaded web server environment. 587For such environments, 588we recommend increasing this value to 1024 or higher. 589The service daemon 590may itself limit the listen queue size (e.g.\& 591.Xr sendmail 8 , 592apache) but will 593often have a directive in its configuration file to adjust the queue size up. 594Larger listen queues also do a better job of fending off denial of service 595attacks. 596.Pp 597The 598.Va kern.maxfiles 599sysctl determines how many open files the system supports. 600The default is 601typically a few thousand but you may need to bump this up to ten or twenty 602thousand if you are running databases or large descriptor-heavy daemons. 603The read-only 604.Va kern.openfiles 605sysctl may be interrogated to determine the current number of open files 606on the system. 607.Pp 608The 609.Va vm.swap_idle_enabled 610sysctl is useful in large multi-user systems where you have lots of users 611entering and leaving the system and lots of idle processes. 612Such systems 613tend to generate a great deal of continuous pressure on free memory reserves. 614Turning this feature on and adjusting the swapout hysteresis (in idle 615seconds) via 616.Va vm.swap_idle_threshold1 617and 618.Va vm.swap_idle_threshold2 619allows you to depress the priority of pages associated with idle processes 620more quickly then the normal pageout algorithm. 621This gives a helping hand 622to the pageout daemon. 623Do not turn this option on unless you need it, 624because the tradeoff you are making is to essentially pre-page memory sooner 625rather than later, eating more swap and disk bandwidth. 626In a small system 627this option will have a detrimental effect but in a large system that is 628already doing moderate paging this option allows the VM system to stage 629whole processes into and out of memory more easily. 630.Sh LOADER TUNABLES 631Some aspects of the system behavior may not be tunable at runtime because 632memory allocations they perform must occur early in the boot process. 633To change loader tunables, you must set their values in 634.Xr loader.conf 5 635and reboot the system. 636.Pp 637.Va kern.maxusers 638controls the scaling of a number of static system tables, including defaults 639for the maximum number of open files, sizing of network memory resources, etc. 640On 641.Dx , 642.Va kern.maxusers 643is automatically sized at boot based on the amount of memory available in 644the system, and may be determined at run-time by inspecting the value of the 645read-only 646.Va kern.maxusers 647sysctl. 648Some sites will require larger or smaller values of 649.Va kern.maxusers 650and may set it as a loader tunable; values of 64, 128, and 256 are not 651uncommon. 652We do not recommend going above 256 unless you need a huge number 653of file descriptors; many of the tunable values set to their defaults by 654.Va kern.maxusers 655may be individually overridden at boot-time or run-time as described 656elsewhere in this document. 657.Pp 658.Va kern.ipc.nmbclusters 659may be adjusted to increase the number of network mbufs the system is 660willing to allocate. 661Each cluster represents approximately 2K of memory, 662so a value of 1024 represents 2M of kernel memory reserved for network 663buffers. 664You can do a simple calculation to figure out how many you need. 665If you have a web server which maxes out at 1000 simultaneous connections, 666and each connection eats a 16K receive and 16K send buffer, you need 667approximately 32MB worth of network buffers to deal with it. 668A good rule of 669thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. 670So for this case 671you would want to set 672.Va kern.ipc.nmbclusters 673to 32768. 674We recommend values between 6751024 and 4096 for machines with moderates amount of memory, and between 4096 676and 32768 for machines with greater amounts of memory. 677Under no circumstances 678should you specify an arbitrarily high value for this parameter, it could 679lead to a boot-time crash. 680The 681.Fl m 682option to 683.Xr netstat 1 684may be used to observe network cluster use. 685.Pp 686More and more programs are using the 687.Xr sendfile 2 688system call to transmit files over the network. 689The 690.Va kern.ipc.nsfbufs 691sysctl controls the number of filesystem buffers 692.Xr sendfile 2 693is allowed to use to perform its work. 694This parameter nominally scales 695with 696.Va kern.maxusers 697so you should not need to modify this parameter except under extreme 698circumstances. 699.Sh KERNEL CONFIG TUNING 700There are a number of kernel options that you may have to fiddle with in 701a large-scale system. 702In order to change these options you need to be 703able to compile a new kernel from source. 704The 705.Xr config 8 706manual page and the handbook are good starting points for learning how to 707do this. 708Generally the first thing you do when creating your own custom 709kernel is to strip out all the drivers and services you do not use. 710Removing things like 711.Dv INET6 712and drivers you do not have will reduce the size of your kernel, sometimes 713by a megabyte or more, leaving more memory available for applications. 714.Pp 715.Dv SCSI_DELAY 716may be used to reduce system boot times. 717The default is fairly high and 718can be responsible for 15+ seconds of delay in the boot process. 719Reducing 720.Dv SCSI_DELAY 721to 5 seconds usually works (especially with modern drives). 722.Pp 723There are a number of 724.Dv *_CPU 725options that can be commented out. 726If you only want the kernel to run 727on a Pentium class CPU, you can easily remove 728.Dv I386_CPU 729and 730.Dv I486_CPU , 731but only remove 732.Dv I586_CPU 733if you are sure your CPU is being recognized as a Pentium II or better. 734Some clones may be recognized as a Pentium or even a 486 and not be able 735to boot without those options. 736If it works, great! 737The operating system 738will be able to better-use higher-end CPU features for MMU, task switching, 739timebase, and even device operations. 740Additionally, higher-end CPUs support 7414MB MMU pages, which the kernel uses to map the kernel itself into memory, 742increasing its efficiency under heavy syscall loads. 743.Sh IDE WRITE CACHING 744.Fx 4.3 745flirted with turning off IDE write caching. 746This reduced write bandwidth 747to IDE disks but was considered necessary due to serious data consistency 748issues introduced by hard drive vendors. 749Basically the problem is that 750IDE drives lie about when a write completes. 751With IDE write caching turned 752on, IDE hard drives will not only write data to disk out of order, they 753will sometimes delay some of the blocks indefinitely under heavy disk 754load. 755A crash or power failure can result in serious filesystem 756corruption. 757So our default was changed to be safe. 758Unfortunately, the 759result was such a huge loss in performance that we caved in and changed the 760default back to on after the release. 761You should check the default on 762your system by observing the 763.Va hw.ata.wc 764sysctl variable. 765If IDE write caching is turned off, you can turn it back 766on by setting the 767.Va hw.ata.wc 768loader tunable to 1. 769More information on tuning the ATA driver system may be found in the 770.Xr ata 4 771man page. 772.Pp 773There is a new experimental feature for IDE hard drives called 774.Va hw.ata.tags 775(you also set this in the boot loader) which allows write caching to be safely 776turned on. 777This brings SCSI tagging features to IDE drives. 778As of this 779writing only IBM DPTA and DTLA drives support the feature. 780Warning! 781These 782drives apparently have quality control problems and I do not recommend 783purchasing them at this time. 784If you need performance, go with SCSI. 785.Sh CPU, MEMORY, DISK, NETWORK 786The type of tuning you do depends heavily on where your system begins to 787bottleneck as load increases. 788If your system runs out of CPU (idle times 789are perpetually 0%) then you need to consider upgrading the CPU or moving to 790an SMP motherboard (multiple CPU's), or perhaps you need to revisit the 791programs that are causing the load and try to optimize them. 792If your system 793is paging to swap a lot you need to consider adding more memory. 794If your 795system is saturating the disk you typically see high CPU idle times and 796total disk saturation. 797.Xr systat 1 798can be used to monitor this. 799There are many solutions to saturated disks: 800increasing memory for caching, mirroring disks, distributing operations across 801several machines, and so forth. 802If disk performance is an issue and you 803are using IDE drives, switching to SCSI can help a great deal. 804While modern 805IDE drives compare with SCSI in raw sequential bandwidth, the moment you 806start seeking around the disk SCSI drives usually win. 807.Pp 808Finally, you might run out of network suds. 809The first line of defense for 810improving network performance is to make sure you are using switches instead 811of hubs, especially these days where switches are almost as cheap. 812Hubs 813have severe problems under heavy loads due to collision backoff and one bad 814host can severely degrade the entire LAN. 815Second, optimize the network path 816as much as possible. 817For example, in 818.Xr firewall 7 819we describe a firewall protecting internal hosts with a topology where 820the externally visible hosts are not routed through it. 821Use 100BaseT rather 822than 10BaseT, or use 1000BaseT rather than 100BaseT, depending on your needs. 823Most bottlenecks occur at the WAN link (e.g.\& 824modem, T1, DSL, whatever). 825If expanding the link is not an option it may be possible to use the 826.Xr dummynet 4 827feature to implement peak shaving or other forms of traffic shaping to 828prevent the overloaded service (such as web services) from affecting other 829services (such as email), or vice versa. 830In home installations this could 831be used to give interactive traffic (your browser, 832.Xr ssh 1 833logins) priority 834over services you export from your box (web services, email). 835.Sh SEE ALSO 836.Xr netstat 1 , 837.Xr systat 1 , 838.Xr ata 4 , 839.Xr dummynet 4 , 840.Xr login.conf 5 , 841.Xr rc.conf 5 , 842.Xr sysctl.conf 5 , 843.Xr firewall 7 , 844.Xr hier 7 , 845.Xr boot 8 , 846.Xr ccdconfig 8 , 847.Xr config 8 , 848.Xr disklabel 8 , 849.Xr fsck 8 , 850.Xr ifconfig 8 , 851.Xr ipfw 8 , 852.Xr loader 8 , 853.Xr mount 8 , 854.Xr newfs 8 , 855.Xr route 8 , 856.Xr sysctl 8 , 857.Xr tunefs 8 , 858.Xr vinum 8 859.Sh HISTORY 860The 861.Nm 862manual page was originally written by 863.An Matthew Dillon 864and first appeared 865in 866.Fx 4.3 , 867May 2001. 868