1.\" Copyright (c) 2001 Matthew Dillon. Terms and conditions are those of 2.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in 3.\" the source tree. 4.\" 5.\" $FreeBSD: src/share/man/man7/tuning.7,v 1.1.2.30 2002/12/17 19:32:08 dillon Exp $ 6.\" $DragonFly: src/share/man/man7/tuning.7,v 1.15 2007/09/14 23:47:53 swildner Exp $ 7.\" 8.Dd March 4, 2007 9.Dt TUNING 7 10.Os 11.Sh NAME 12.Nm tuning 13.Nd performance tuning under 14.Dx 15.Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP 16When using 17.Xr disklabel 8 18or the 19.Dx 20installer 21to lay out your filesystems on a hard disk it is important to remember 22that hard drives can transfer data much more quickly from outer tracks 23than they can from inner tracks. 24To take advantage of this you should 25try to pack your smaller filesystems and swap closer to the outer tracks, 26follow with the larger filesystems, and end with the largest filesystems. 27It is also important to size system standard filesystems such that you 28will not be forced to resize them later as you scale the machine up. 29I usually create, in order, a 128M root, 1G swap, 128M 30.Pa /var , 31128M 32.Pa /var/tmp , 333G 34.Pa /usr , 35and use any remaining space for 36.Pa /home . 37.Pp 38You should typically size your swap space to approximately 2x main memory. 39If you do not have a lot of RAM, though, you will generally want a lot 40more swap. 41It is not recommended that you configure any less than 42256M of swap on a system and you should keep in mind future memory 43expansion when sizing the swap partition. 44The kernel's VM paging algorithms are tuned to perform best when there is 45at least 2x swap versus main memory. 46Configuring too little swap can lead 47to inefficiencies in the VM page scanning code as well as create issues 48later on if you add more memory to your machine. 49Finally, on larger systems 50with multiple SCSI disks (or multiple IDE disks operating on different 51controllers), we strongly recommend that you configure swap on each drive 52(up to four drives). 53The swap partitions on the drives should be approximately the same size. 54The kernel can handle arbitrary sizes but 55internal data structures scale to 4 times the largest swap partition. 56Keeping 57the swap partitions near the same size will allow the kernel to optimally 58stripe swap space across the N disks. 59Do not worry about overdoing it a 60little, swap space is the saving grace of 61.Ux 62and even if you do not normally use much swap, it can give you more time to 63recover from a runaway program before being forced to reboot. 64.Pp 65How you size your 66.Pa /var 67partition depends heavily on what you intend to use the machine for. 68This 69partition is primarily used to hold mailboxes, the print spool, and log 70files. 71Some people even make 72.Pa /var/log 73its own partition (but except for extreme cases it is not worth the waste 74of a partition ID). 75If your machine is intended to act as a mail 76or print server, 77or you are running a heavily visited web server, you should consider 78creating a much larger partition \(en perhaps a gig or more. 79It is very easy 80to underestimate log file storage requirements. 81.Pp 82Sizing 83.Pa /var/tmp 84depends on the kind of temporary file usage you think you will need. 85128M is 86the minimum we recommend. 87Also note that the 88.Dx 89installer will create a 90.Pa /tmp 91directory. 92Dedicating a partition for temporary file storage is important for 93two reasons: first, it reduces the possibility of filesystem corruption 94in a crash, and second it reduces the chance of a runaway process that 95fills up 96.Oo Pa /var Oc Ns Pa /tmp 97from blowing up more critical subsystems (mail, 98logging, etc). 99Filling up 100.Oo Pa /var Oc Ns Pa /tmp 101is a very common problem to have. 102.Pp 103In the old days there were differences between 104.Pa /tmp 105and 106.Pa /var/tmp , 107but the introduction of 108.Pa /var 109(and 110.Pa /var/tmp ) 111led to massive confusion 112by program writers so today programs haphazardly use one or the 113other and thus no real distinction can be made between the two. 114So it makes sense to have just one temporary directory and 115softlink to it from the other tmp directory locations. 116However you handle 117.Pa /tmp , 118the one thing you do not want to do is leave it sitting 119on the root partition where it might cause root to fill up or possibly 120corrupt root in a crash/reboot situation. 121.Pp 122The 123.Pa /usr 124partition holds the bulk of the files required to support the system and 125a subdirectory within it called 126.Pa /usr/pkg 127holds the bulk of the files installed from the 128.Xr pkgsrc 7 129collection. 130If you do not use 131.Xr pkgsrc 7 132all that much and do not intend to keep system source 133.Pq Pa /usr/src 134on the machine, you can get away with 135a 1 gigabyte 136.Pa /usr 137partition. 138However, if you install a lot of packages 139(especially window managers and Linux-emulated binaries), we recommend 140at least a 2 gigabyte 141.Pa /usr 142and if you also intend to keep system source 143on the machine, we recommend a 3 gigabyte 144.Pa /usr . 145Do not underestimate the 146amount of space you will need in this partition, it can creep up and 147surprise you! 148.Pp 149The 150.Pa /home 151partition is typically used to hold user-specific data. 152I usually size it to the remainder of the disk. 153.Pp 154Why partition at all? 155Why not create one big 156.Pa / 157partition and be done with it? 158Then I do not have to worry about undersizing things! 159Well, there are several reasons this is not a good idea. 160First, 161each partition has different operational characteristics and separating them 162allows the filesystem to tune itself to those characteristics. 163For example, 164the root and 165.Pa /usr 166partitions are read-mostly, with very little writing, while 167a lot of reading and writing could occur in 168.Pa /var 169and 170.Pa /var/tmp . 171By properly 172partitioning your system fragmentation introduced in the smaller more 173heavily write-loaded partitions will not bleed over into the mostly-read 174partitions. 175Additionally, keeping the write-loaded partitions closer to 176the edge of the disk (i.e. before the really big partitions instead of after 177in the partition table) will increase I/O performance in the partitions 178where you need it the most. 179Now it is true that you might also need I/O 180performance in the larger partitions, but they are so large that shifting 181them more towards the edge of the disk will not lead to a significant 182performance improvement whereas moving 183.Pa /var 184to the edge can have a huge impact. 185Finally, there are safety concerns. 186Having a small neat root partition that 187is essentially read-only gives it a greater chance of surviving a bad crash 188intact. 189.Pp 190Properly partitioning your system also allows you to tune 191.Xr newfs 8 , 192and 193.Xr tunefs 8 194parameters. 195Tuning 196.Xr newfs 8 197requires more experience but can lead to significant improvements in 198performance. 199There are three parameters that are relatively safe to tune: 200.Em blocksize , bytes/i-node , 201and 202.Em cylinders/group . 203.Pp 204.Dx 205performs best when using 8K or 16K filesystem block sizes. 206The default filesystem block size is 16K, 207which provides best performance for most applications, 208with the exception of those that perform random access on large files 209(such as database server software). 210Such applications tend to perform better with a smaller block size, 211although modern disk characteristics are such that the performance 212gain from using a smaller block size may not be worth consideration. 213Using a block size larger than 16K 214can cause fragmentation of the buffer cache and 215lead to lower performance. 216.Pp 217The defaults may be unsuitable 218for a filesystem that requires a very large number of i-nodes 219or is intended to hold a large number of very small files. 220Such a filesystem should be created with an 8K or 4K block size. 221This also requires you to specify a smaller 222fragment size. 223We recommend always using a fragment size that is \(18 224the block size (less testing has been done on other fragment size factors). 225The 226.Xr newfs 8 227options for this would be 228.Dq Li "newfs -f 1024 -b 8192 ..." . 229.Pp 230If a large partition is intended to be used to hold fewer, larger files, such 231as database files, you can increase the 232.Em bytes/i-node 233ratio which reduces the number of i-nodes (maximum number of files and 234directories that can be created) for that partition. 235Decreasing the number 236of i-nodes in a filesystem can greatly reduce 237.Xr fsck 8 238recovery times after a crash. 239Do not use this option 240unless you are actually storing large files on the partition, because if you 241overcompensate you can wind up with a filesystem that has lots of free 242space remaining but cannot accommodate any more files. 243Using 32768, 65536, or 262144 bytes/i-node is recommended. 244You can go higher but 245it will have only incremental effects on 246.Xr fsck 8 247recovery times. 248For example, 249.Dq Li "newfs -i 32768 ..." . 250.Pp 251.Xr tunefs 8 252may be used to further tune a filesystem. 253This command can be run in 254single-user mode without having to reformat the filesystem. 255However, this is possibly the most abused program in the system. 256Many people attempt to 257increase available filesystem space by setting the min-free percentage to 0. 258This can lead to severe filesystem fragmentation and we do not recommend 259that you do this. 260Really the only 261.Xr tunefs 8 262option worthwhile here is turning on 263.Em softupdates 264with 265.Dq Li "tunefs -n enable /filesystem" . 266(Note: in 267.Dx , 268softupdates can be turned on using the 269.Fl U 270option to 271.Xr newfs 8 , 272and 273.Dx 274installer will typically enable softupdates automatically for 275non-root filesystems). 276Softupdates drastically improves meta-data performance, mainly file 277creation and deletion. 278We recommend enabling softupdates on most filesystems; however, there 279are two limitations to softupdates that you should be aware of when 280determining whether to use it on a filesystem. 281First, softupdates guarantees filesystem consistency in the 282case of a crash but could very easily be several seconds (even a minute!) 283behind on pending writes to the physical disk. 284If you crash you may lose more work 285than otherwise. 286Secondly, softupdates delays the freeing of filesystem 287blocks. 288If you have a filesystem (such as the root filesystem) which is 289close to full, doing a major update of it, e.g.\& 290.Dq Li "make installworld" , 291can run it out of space and cause the update to fail. 292For this reason, softupdates will not be enabled on the root filesystem 293during a typical install. There is no loss of performance since the root 294filesystem is rarely written to. 295.Pp 296A number of run-time 297.Xr mount 8 298options exist that can help you tune the system. 299The most obvious and most dangerous one is 300.Cm async . 301Do not ever use it; it is far too dangerous. 302A less dangerous and more 303useful 304.Xr mount 8 305option is called 306.Cm noatime . 307.Ux 308filesystems normally update the last-accessed time of a file or 309directory whenever it is accessed. 310This operation is handled in 311.Dx 312with a delayed write and normally does not create a burden on the system. 313However, if your system is accessing a huge number of files on a continuing 314basis the buffer cache can wind up getting polluted with atime updates, 315creating a burden on the system. 316For example, if you are running a heavily 317loaded web site, or a news server with lots of readers, you might want to 318consider turning off atime updates on your larger partitions with this 319.Xr mount 8 320option. 321However, you should not gratuitously turn off atime 322updates everywhere. 323For example, the 324.Pa /var 325filesystem customarily 326holds mailboxes, and atime (in combination with mtime) is used to 327determine whether a mailbox has new mail. 328You might as well leave 329atime turned on for mostly read-only partitions such as 330.Pa / 331and 332.Pa /usr 333as well. 334This is especially useful for 335.Pa / 336since some system utilities 337use the atime field for reporting. 338.Sh STRIPING DISKS 339In larger systems you can stripe partitions from several drives together 340to create a much larger overall partition. 341Striping can also improve 342the performance of a filesystem by splitting I/O operations across two 343or more disks. 344The 345.Xr vinum 8 346and 347.Xr ccdconfig 8 348utilities may be used to create simple striped filesystems. 349Generally 350speaking, striping smaller partitions such as the root and 351.Pa /var/tmp , 352or essentially read-only partitions such as 353.Pa /usr 354is a complete waste of time. 355You should only stripe partitions that require serious I/O performance, 356typically 357.Pa /var , /home , 358or custom partitions used to hold databases and web pages. 359Choosing the proper stripe size is also 360important. 361Filesystems tend to store meta-data on power-of-2 boundaries 362and you usually want to reduce seeking rather than increase seeking. 363This 364means you want to use a large off-center stripe size such as 1152 sectors 365so sequential I/O does not seek both disks and so meta-data is distributed 366across both disks rather than concentrated on a single disk. 367If 368you really need to get sophisticated, we recommend using a real hardware 369RAID controller from the list of 370.Dx 371supported controllers. 372.Sh SYSCTL TUNING 373.Xr sysctl 8 374variables permit system behavior to be monitored and controlled at 375run-time. 376Some sysctls simply report on the behavior of the system; others allow 377the system behavior to be modified; 378some may be set at boot time using 379.Xr rc.conf 5 , 380but most will be set via 381.Xr sysctl.conf 5 . 382There are several hundred sysctls in the system, including many that appear 383to be candidates for tuning but actually are not. 384In this document we will only cover the ones that have the greatest effect 385on the system. 386.Pp 387The 388.Va kern.ipc.shm_use_phys 389sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on). 390Setting 391this parameter to 1 will cause all System V shared memory segments to be 392mapped to unpageable physical RAM. 393This feature only has an effect if you 394are either (A) mapping small amounts of shared memory across many (hundreds) 395of processes, or (B) mapping large amounts of shared memory across any 396number of processes. 397This feature allows the kernel to remove a great deal 398of internal memory management page-tracking overhead at the cost of wiring 399the shared memory into core, making it unswappable. 400.Pp 401The 402.Va vfs.write_behind 403sysctl defaults to 1 (on). This tells the filesystem to issue media 404writes as full clusters are collected, which typically occurs when writing 405large sequential files. The idea is to avoid saturating the buffer 406cache with dirty buffers when it would not benefit I/O performance. However, 407this may stall processes and under certain circumstances you may wish to turn 408it off. 409.Pp 410The 411.Va vfs.hirunningspace 412sysctl determines how much outstanding write I/O may be queued to 413disk controllers system wide at any given instance. The default is 414usually sufficient but on machines with lots of disks you may want to bump 415it up to four or five megabytes. Note that setting too high a value 416(exceeding the buffer cache's write threshold) can lead to extremely 417bad clustering performance. Do not set this value arbitrarily high! Also, 418higher write queueing values may add latency to reads occurring at the same 419time. 420.Pp 421There are various other buffer-cache and VM page cache related sysctls. 422We do not recommend modifying these values. 423As of 424.Fx 4.3 , 425the VM system does an extremely good job tuning itself. 426.Pp 427The 428.Va net.inet.tcp.sendspace 429and 430.Va net.inet.tcp.recvspace 431sysctls are of particular interest if you are running network intensive 432applications. 433They control the amount of send and receive buffer space 434allowed for any given TCP connection. 435The default sending buffer is 32K; the default receiving buffer 436is 64K. 437You can often 438improve bandwidth utilization by increasing the default at the cost of 439eating up more kernel memory for each connection. 440We do not recommend 441increasing the defaults if you are serving hundreds or thousands of 442simultaneous connections because it is possible to quickly run the system 443out of memory due to stalled connections building up. 444But if you need 445high bandwidth over a fewer number of connections, especially if you have 446gigabit Ethernet, increasing these defaults can make a huge difference. 447You can adjust the buffer size for incoming and outgoing data separately. 448For example, if your machine is primarily doing web serving you may want 449to decrease the recvspace in order to be able to increase the 450sendspace without eating too much kernel memory. 451Note that the routing table (see 452.Xr route 8 ) 453can be used to introduce route-specific send and receive buffer size 454defaults. 455.Pp 456As an additional management tool you can use pipes in your 457firewall rules (see 458.Xr ipfw 8 ) 459to limit the bandwidth going to or from particular IP blocks or ports. 460For example, if you have a T1 you might want to limit your web traffic 461to 70% of the T1's bandwidth in order to leave the remainder available 462for mail and interactive use. 463Normally a heavily loaded web server 464will not introduce significant latencies into other services even if 465the network link is maxed out, but enforcing a limit can smooth things 466out and lead to longer term stability. 467Many people also enforce artificial 468bandwidth limitations in order to ensure that they are not charged for 469using too much bandwidth. 470.Pp 471Setting the send or receive TCP buffer to values larger then 65535 will result 472in a marginal performance improvement unless both hosts support the window 473scaling extension of the TCP protocol, which is controlled by the 474.Va net.inet.tcp.rfc1323 475sysctl. 476These extensions should be enabled and the TCP buffer size should be set 477to a value larger than 65536 in order to obtain good performance from 478certain types of network links; specifically, gigabit WAN links and 479high-latency satellite links. 480RFC1323 support is enabled by default. 481.Pp 482The 483.Va net.inet.tcp.always_keepalive 484sysctl determines whether or not the TCP implementation should attempt 485to detect dead TCP connections by intermittently delivering 486.Dq keepalives 487on the connection. 488By default, this is disabled for all applications, only applications 489that specifically request keepalives will use them. 490In most environments, TCP keepalives will improve the management of 491system state by expiring dead TCP connections, particularly for 492systems serving dialup users who may not always terminate individual 493TCP connections before disconnecting from the network. 494However, in some environments, temporary network outages may be 495incorrectly identified as dead sessions, resulting in unexpectedly 496terminated TCP connections. 497In such environments, setting the sysctl to 0 may reduce the occurrence of 498TCP session disconnections. 499.Pp 500The 501.Va net.inet.tcp.delayed_ack 502TCP feature is largely misunderstood. Historically speaking this feature 503was designed to allow the acknowledgement to transmitted data to be returned 504along with the response. For example, when you type over a remote shell 505the acknowledgement to the character you send can be returned along with the 506data representing the echo of the character. With delayed acks turned off 507the acknowledgement may be sent in its own packet before the remote service 508has a chance to echo the data it just received. This same concept also 509applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the 510number of tiny packets flowing across the network in half. The 511.Dx 512delayed-ack implementation also follows the TCP protocol rule that 513at least every other packet be acknowledged even if the standard 100ms 514timeout has not yet passed. Normally the worst a delayed ack can do is 515slightly delay the teardown of a connection, or slightly delay the ramp-up 516of a slow-start TCP connection. While we aren't sure we believe that 517the several FAQs related to packages such as SAMBA and SQUID which advise 518turning off delayed acks may be refering to the slow-start issue. 519.Pp 520The 521.Va net.inet.tcp.inflight_enable 522sysctl turns on bandwidth delay product limiting for all TCP connections. 523The system will attempt to calculate the bandwidth delay product for each 524connection and limit the amount of data queued to the network to just the 525amount required to maintain optimum throughput. This feature is useful 526if you are serving data over modems, GigE, or high speed WAN links (or 527any other link with a high bandwidth*delay product), especially if you are 528also using window scaling or have configured a large send window. If 529you enable this option you should also be sure to set 530.Va net.inet.tcp.inflight_debug 531to 0 (disable debugging), and for production use setting 532.Va net.inet.tcp.inflight_min 533to at least 6144 may be beneficial. Note, however, that setting high 534minimums may effectively disable bandwidth limiting depending on the link. 535The limiting feature reduces the amount of data built up in intermediate 536router and switch packet queues as well as reduces the amount of data built 537up in the local host's interface queue. With fewer packets queued up, 538interactive connections, especially over slow modems, will also be able 539to operate with lower round trip times. However, note that this feature 540only affects data transmission (uploading / server-side). It does not 541affect data reception (downloading). 542.Pp 543Adjusting 544.Va net.inet.tcp.inflight_stab 545is not recommended. 546This parameter defaults to 20, representing 2 maximal packets added 547to the bandwidth delay product window calculation. The additional 548window is required to stabilize the algorithm and improve responsiveness 549to changing conditions, but it can also result in higher ping times 550over slow links (though still much lower then you would get without 551the inflight algorithm). In such cases you may 552wish to try reducing this parameter to 15, 10, or 5, and you may also 553have to reduce 554.Va net.inet.tcp.inflight_min 555(for example, to 3500) to get the desired effect. Reducing these parameters 556should be done as a last resort only. 557.Pp 558The 559.Va net.inet.ip.portrange.* 560sysctls control the port number ranges automatically bound to TCP and UDP 561sockets. There are three ranges: A low range, a default range, and a 562high range, selectable via an IP_PORTRANGE setsockopt() call. Most 563network programs use the default range which is controlled by 564.Va net.inet.ip.portrange.first 565and 566.Va net.inet.ip.portrange.last , 567which defaults to 1024 and 5000 respectively. Bound port ranges are 568used for outgoing connections and it is possible to run the system out 569of ports under certain circumstances. This most commonly occurs when you are 570running a heavily loaded web proxy. The port range is not an issue 571when running serves which handle mainly incoming connections such as a 572normal web server, or has a limited number of outgoing connections such 573as a mail relay. For situations where you may run yourself out of 574ports we recommend increasing 575.Va net.inet.ip.portrange.last 576modestly. A value of 10000 or 20000 or 30000 may be reasonable. You should 577also consider firewall effects when changing the port range. Some firewalls 578may block large ranges of ports (usually low-numbered ports) and expect systems 579to use higher ranges of ports for outgoing connections. For this reason 580we do not recommend that 581.Va net.inet.ip.portrange.first 582be lowered. 583.Pp 584The 585.Va kern.ipc.somaxconn 586sysctl limits the size of the listen queue for accepting new TCP connections. 587The default value of 128 is typically too low for robust handling of new 588connections in a heavily loaded web server environment. 589For such environments, 590we recommend increasing this value to 1024 or higher. 591The service daemon 592may itself limit the listen queue size (e.g.\& 593.Xr sendmail 8 , 594apache) but will 595often have a directive in its configuration file to adjust the queue size up. 596Larger listen queues also do a better job of fending off denial of service 597attacks. 598.Pp 599The 600.Va kern.maxfiles 601sysctl determines how many open files the system supports. 602The default is 603typically a few thousand but you may need to bump this up to ten or twenty 604thousand if you are running databases or large descriptor-heavy daemons. 605The read-only 606.Va kern.openfiles 607sysctl may be interrogated to determine the current number of open files 608on the system. 609.Pp 610The 611.Va vm.swap_idle_enabled 612sysctl is useful in large multi-user systems where you have lots of users 613entering and leaving the system and lots of idle processes. 614Such systems 615tend to generate a great deal of continuous pressure on free memory reserves. 616Turning this feature on and adjusting the swapout hysteresis (in idle 617seconds) via 618.Va vm.swap_idle_threshold1 619and 620.Va vm.swap_idle_threshold2 621allows you to depress the priority of pages associated with idle processes 622more quickly then the normal pageout algorithm. 623This gives a helping hand 624to the pageout daemon. 625Do not turn this option on unless you need it, 626because the tradeoff you are making is to essentially pre-page memory sooner 627rather than later, eating more swap and disk bandwidth. 628In a small system 629this option will have a detrimental effect but in a large system that is 630already doing moderate paging this option allows the VM system to stage 631whole processes into and out of memory more easily. 632.Sh LOADER TUNABLES 633Some aspects of the system behavior may not be tunable at runtime because 634memory allocations they perform must occur early in the boot process. 635To change loader tunables, you must set their values in 636.Xr loader.conf 5 637and reboot the system. 638.Pp 639.Va kern.maxusers 640controls the scaling of a number of static system tables, including defaults 641for the maximum number of open files, sizing of network memory resources, etc. 642On 643.Dx , 644.Va kern.maxusers 645is automatically sized at boot based on the amount of memory available in 646the system, and may be determined at run-time by inspecting the value of the 647read-only 648.Va kern.maxusers 649sysctl. 650Some sites will require larger or smaller values of 651.Va kern.maxusers 652and may set it as a loader tunable; values of 64, 128, and 256 are not 653uncommon. 654We do not recommend going above 256 unless you need a huge number 655of file descriptors; many of the tunable values set to their defaults by 656.Va kern.maxusers 657may be individually overridden at boot-time or run-time as described 658elsewhere in this document. 659.Pp 660The 661.Va kern.dfldsiz 662and 663.Va kern.dflssiz 664tunables set the default soft limits for process data and stack size 665respectively. 666Processes may increase these up to the hard limits by calling 667.Xr setrlimit 2 . 668The 669.Va kern.maxdsiz , 670.Va kern.maxssiz , 671and 672.Va kern.maxtsiz 673tunables set the hard limits for process data, stack, and text size 674respectively; processes may not exceed these limits. 675The 676.Va kern.sgrowsiz 677tunable controls how much the stack segment will grow when a process 678needs to allocate more stack. 679.Pp 680.Va kern.ipc.nmbclusters 681may be adjusted to increase the number of network mbufs the system is 682willing to allocate. 683Each cluster represents approximately 2K of memory, 684so a value of 1024 represents 2M of kernel memory reserved for network 685buffers. 686You can do a simple calculation to figure out how many you need. 687If you have a web server which maxes out at 1000 simultaneous connections, 688and each connection eats a 16K receive and 16K send buffer, you need 689approximately 32MB worth of network buffers to deal with it. 690A good rule of 691thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. 692So for this case 693you would want to set 694.Va kern.ipc.nmbclusters 695to 32768. 696We recommend values between 6971024 and 4096 for machines with moderates amount of memory, and between 4096 698and 32768 for machines with greater amounts of memory. 699Under no circumstances 700should you specify an arbitrarily high value for this parameter, it could 701lead to a boot-time crash. 702The 703.Fl m 704option to 705.Xr netstat 1 706may be used to observe network cluster use. 707.Pp 708More and more programs are using the 709.Xr sendfile 2 710system call to transmit files over the network. 711The 712.Va kern.ipc.nsfbufs 713sysctl controls the number of filesystem buffers 714.Xr sendfile 2 715is allowed to use to perform its work. 716This parameter nominally scales 717with 718.Va kern.maxusers 719so you should not need to modify this parameter except under extreme 720circumstances. 721.Sh KERNEL CONFIG TUNING 722There are a number of kernel options that you may have to fiddle with in 723a large-scale system. 724In order to change these options you need to be 725able to compile a new kernel from source. 726The 727.Xr config 8 728manual page and the handbook are good starting points for learning how to 729do this. 730Generally the first thing you do when creating your own custom 731kernel is to strip out all the drivers and services you do not use. 732Removing things like 733.Dv INET6 734and drivers you do not have will reduce the size of your kernel, sometimes 735by a megabyte or more, leaving more memory available for applications. 736.Pp 737.Dv SCSI_DELAY 738may be used to reduce system boot times. 739The default is fairly high and 740can be responsible for 15+ seconds of delay in the boot process. 741Reducing 742.Dv SCSI_DELAY 743to 5 seconds usually works (especially with modern drives). 744.Pp 745There are a number of 746.Dv *_CPU 747options that can be commented out. 748If you only want the kernel to run 749on a Pentium class CPU, you can easily remove 750.Dv I386_CPU 751and 752.Dv I486_CPU , 753but only remove 754.Dv I586_CPU 755if you are sure your CPU is being recognized as a Pentium II or better. 756Some clones may be recognized as a Pentium or even a 486 and not be able 757to boot without those options. 758If it works, great! 759The operating system 760will be able to better-use higher-end CPU features for MMU, task switching, 761timebase, and even device operations. 762Additionally, higher-end CPUs support 7634MB MMU pages, which the kernel uses to map the kernel itself into memory, 764increasing its efficiency under heavy syscall loads. 765.Sh IDE WRITE CACHING 766.Fx 4.3 767flirted with turning off IDE write caching. 768This reduced write bandwidth 769to IDE disks but was considered necessary due to serious data consistency 770issues introduced by hard drive vendors. 771Basically the problem is that 772IDE drives lie about when a write completes. 773With IDE write caching turned 774on, IDE hard drives will not only write data to disk out of order, they 775will sometimes delay some of the blocks indefinitely under heavy disk 776load. 777A crash or power failure can result in serious filesystem 778corruption. 779So our default was changed to be safe. 780Unfortunately, the 781result was such a huge loss in performance that we caved in and changed the 782default back to on after the release. 783You should check the default on 784your system by observing the 785.Va hw.ata.wc 786sysctl variable. 787If IDE write caching is turned off, you can turn it back 788on by setting the 789.Va hw.ata.wc 790loader tunable to 1. 791More information on tuning the ATA driver system may be found in the 792.Xr ata 4 793man page. 794.Pp 795There is a new experimental feature for IDE hard drives called 796.Va hw.ata.tags 797(you also set this in the boot loader) which allows write caching to be safely 798turned on. 799This brings SCSI tagging features to IDE drives. 800As of this 801writing only IBM DPTA and DTLA drives support the feature. 802Warning! 803These 804drives apparently have quality control problems and I do not recommend 805purchasing them at this time. 806If you need performance, go with SCSI. 807.Sh CPU, MEMORY, DISK, NETWORK 808The type of tuning you do depends heavily on where your system begins to 809bottleneck as load increases. 810If your system runs out of CPU (idle times 811are perpetually 0%) then you need to consider upgrading the CPU or moving to 812an SMP motherboard (multiple CPU's), or perhaps you need to revisit the 813programs that are causing the load and try to optimize them. 814If your system 815is paging to swap a lot you need to consider adding more memory. 816If your 817system is saturating the disk you typically see high CPU idle times and 818total disk saturation. 819.Xr systat 1 820can be used to monitor this. 821There are many solutions to saturated disks: 822increasing memory for caching, mirroring disks, distributing operations across 823several machines, and so forth. 824If disk performance is an issue and you 825are using IDE drives, switching to SCSI can help a great deal. 826While modern 827IDE drives compare with SCSI in raw sequential bandwidth, the moment you 828start seeking around the disk SCSI drives usually win. 829.Pp 830Finally, you might run out of network suds. 831The first line of defense for 832improving network performance is to make sure you are using switches instead 833of hubs, especially these days where switches are almost as cheap. 834Hubs 835have severe problems under heavy loads due to collision backoff and one bad 836host can severely degrade the entire LAN. 837Second, optimize the network path 838as much as possible. 839For example, in 840.Xr firewall 7 841we describe a firewall protecting internal hosts with a topology where 842the externally visible hosts are not routed through it. 843Use 100BaseT rather 844than 10BaseT, or use 1000BaseT rather than 100BaseT, depending on your needs. 845Most bottlenecks occur at the WAN link (e.g.\& 846modem, T1, DSL, whatever). 847If expanding the link is not an option it may be possible to use the 848.Xr dummynet 4 849feature to implement peak shaving or other forms of traffic shaping to 850prevent the overloaded service (such as web services) from affecting other 851services (such as email), or vice versa. 852In home installations this could 853be used to give interactive traffic (your browser, 854.Xr ssh 1 855logins) priority 856over services you export from your box (web services, email). 857.Sh SEE ALSO 858.Xr netstat 1 , 859.Xr systat 1 , 860.Xr ata 4 , 861.Xr dummynet 4 , 862.Xr login.conf 5 , 863.Xr rc.conf 5 , 864.Xr sysctl.conf 5 , 865.Xr firewall 7 , 866.Xr hier 7 , 867.Xr boot 8 , 868.Xr ccdconfig 8 , 869.Xr config 8 , 870.Xr disklabel 8 , 871.Xr fsck 8 , 872.Xr ifconfig 8 , 873.Xr ipfw 8 , 874.Xr loader 8 , 875.Xr mount 8 , 876.Xr newfs 8 , 877.Xr route 8 , 878.Xr sysctl 8 , 879.Xr tunefs 8 , 880.Xr vinum 8 881.Sh HISTORY 882The 883.Nm 884manual page was originally written by 885.An Matthew Dillon 886and first appeared 887in 888.Fx 4.3 , 889May 2001. 890