1.\" 2.\" swapcache - Cache clean filesystem data & meta-data on SSD-based swap 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.Dd February 7, 2010 13.Dt SWAPCACHE 8 14.Os 15.Sh NAME 16.Nm swapcache 17.Nd a mechanism to use fast swap to cache filesystem data and meta-data 18.Sh SYNOPSIS 19.Cd sysctl vm.swapcache.accrate=100000 20.Cd sysctl vm.swapcache.maxfilesize=0 21.Cd sysctl vm.swapcache.maxburst=2000000000 22.Cd sysctl vm.swapcache.curburst=4000000000 23.Cd sysctl vm.swapcache.minburst=10000000 24.Cd sysctl vm.swapcache.read_enable=0 25.Cd sysctl vm.swapcache.meta_enable=0 26.Cd sysctl vm.swapcache.data_enable=0 27.Cd sysctl vm.swapcache.use_chflags=1 28.Cd sysctl vm.swapcache.maxlaunder=256 29.Cd sysctl vm.swapcache.hysteresis=(vm.stats.vm.v_inactive_target/2) 30.Sh DESCRIPTION 31.Nm 32is a system capability which allows a solid state disk (SSD) in a swap 33space configuration to be used to cache clean filesystem data and meta-data 34in addition to its normal function of backing anonymous memory. 35.Pp 36Sysctls are used to manage operational parameters and can be adjusted at 37any time. 38Typically a large initial burst is desired after system boot, 39controlled by the initial 40.Va vm.swapcache.curburst 41parameter. 42This parameter is reduced as data is written to swap by the swapcache 43and increased at a rate specified by 44.Va vm.swapcache.accrate . 45Once this parameter reaches zero write activity ceases until it has 46recovered sufficiently for write activity to resume. 47.Pp 48.Va vm.swapcache.meta_enable 49enables the writing of filesystem meta-data to the swapcache. 50Filesystem 51metadata is any data which the filesystem accesses via the disk device 52using buffercache. 53Meta-data is cached globally regardless of file or directory flags. 54.Pp 55.Va vm.swapcache.data_enable 56enables the writing of clean filesystem file-data to the swapcache. 57Filesystem filedata is any data which the filesystem accesses via a 58regular file. 59In technical terms, when the buffer cache is used to access 60a regular file through its vnode. 61Please do not blindly turn on this option, see the 62.Sx PERFORMANCE TUNING 63section for more information. 64.Pp 65.Va vm.swapcache.use_chflags 66enables the use of the 67.Va cache 68and 69.Va noscache 70.Xr chflags 1 71flags to control which files will be data-cached. 72If this sysctl is disabled and 73.Va data_enable 74is enabled, the system will ignore file flags and attempt to 75swapcache all regular files. 76.Pp 77.Va vm.swapcache.read_enable 78enables reading from the swapcache and should be set to 1 for normal 79operation. 80.Pp 81.Va vm.swapcache.maxfilesize 82controls which files are to be cached based on their size. 83If set to non-zero only files smaller than the specified size 84will be cached. 85Larger files will not be cached. 86.Pp 87.Va vm.swapcache.maxlaunder 88controls the maximum number of clean VM pages which will be added to 89the swap cache and written out to swap on each poll. 90Swapcache polls ten times a second. 91.Pp 92.Va vm.swapcache.hysteresis 93controls how many pages swapcache waits to be added to the inactive page 94queue before continuing its scan. 95Once it decides to scan it continues subject to the above limitations 96until it reaches the end of the inactive page queue. 97This parameter is designed to make swapcache generate more bulky bursts 98to swap which helps SSDs reduce write amplification effects. 99.Sh PERFORMANCE TUNING 100Best operation is achieved when the active data set fits within the 101swapcache. 102.Pp 103.Bl -tag -width 4n -compact 104.It Va vm.swapcache.accrate 105This specifies the burst accumulation rate in bytes per second and 106ultimately controls the write bandwidth to swap averaged over a long 107period of time. 108This parameter must be carefully chosen to manage the write endurance of 109the SSD in order to avoid wearing it out too quickly. 110Even though SSDs have limited write endurance, there is massive 111cost/performance benefit to using one in a swapcache configuration. 112.Pp 113Let's use the Intel X25V 40GB MLC SATA SSD as an example. 114This device has approximately a 11540TB (40 terabyte) write endurance, but see later 116notes on this, it is more a minimum value. 117Limiting the long term average bandwidth to 100KB/sec leads to no more 118than ~9GB/day writing which calculates approximately to a 12 year endurance. 119Endurance scales linearly with size. 120The 80GB version of this SSD 121will have a write endurance of approximately 80TB. 122.Pp 123MLC SSDs have a 1000-10000x write endurance, while the lower density 124higher-cost SLC SSDs have a 10000-100000x write endurance, approximately. 125MLC SSDs can be used for the swapcache (and swap) as long as the system 126manager is cognizant of its limitations. 127.Pp 128.It Va vm.swapcache.meta_enable 129Turning on just 130.Va meta_enable 131causes only filesystem meta-data to be cached and will result 132in very fast directory operations even over millions of inodes 133and even in the face of other invasive operations being run 134by other processes. 135.Pp 136For 137.Nm HAMMER 138filesystems meta-data includes the B-Tree, directory entries, 139and data related to tiny files. 140Approximately 6 GB of swapcache is needed 141for every 14 million or so inodes cached, effectively giving one the 142ability to cache all the meta-data in a multi-terabyte filesystem using 143a fairly small SSD. 144.Pp 145.It Va vm.swapcache.data_enable 146Turning on 147.Va data_enable 148(with or without other features) allows bulk file data to be cached. 149This feature is very useful for web server operation when the 150operational data set fits in swap. 151However, care must be taken to avoid thrashing the swapcache. 152In almost all cases you will want to leave chflags mode enabled 153and use 'chflags cache' on governing directories to control which 154directory subtrees file data should be cached for. 155.Pp 156Vnode recycling can also cause problems. 15732-bit systems are typically limited to 100,000 cached vnodes and 15864-bit systems are typically limited to around 400,000 cached vnodes. 159When operating on a filesystem containing a large number of files 160vnode recycling by the kernel will cause related swapcache data 161to be lost and also cause potential thrashing of the swapcache. 162Cache thrashing due to vnode recyclement can occur whether chflags 163mode is used or not. 164.Pp 165To solve the thrashing problem you can turn on HAMMER's 166double buffering feature via 167.Va vfs.hammer.double_buffer . 168This causes HAMMER to cache file data via its block device. 169HAMMER cannot avoid also caching file data via individual vnodes 170but will try to expire the second copy more quickly (hence 171why it is called double buffer mode), but the key point here is 172that 173.Nm 174will only cache the data blocks via the block device when 175double_buffer mode is used and since the block device is associated 176with the mount it will not get recycled. 177This allows the data for any number (potentially millions) of files to 178be cached. 179You still should use chflags mode to control the size of the dataset 180being cached to remain under 75% of configured swap space. 181.Pp 182Data caching is definitely more wasteful of the SSD's write durability 183than meta-data caching. 184If not carefully managed the swapcache may exhaust its burst and smack 185against the long term average bandwidth limit, causing the SSD to wear 186out at the maximum rate you programmed. 187Data caching is far less wasteful and more efficient 188if (on a 64-bit system only) you provide a sufficiently large SSD. 189.Pp 190When caching large data sets you may want to use a medium-sized SSD 191with good write performance instead of a small SSD to accommodate 192the higher burst write rate data caching incurs and to reduce 193interference between reading and writing. 194Write durability also tends to scale with larger SSDs, but keep in mind 195that newer flash technologies use smaller feature sizes on-chip 196which reduce the write durability of the chips, so pay careful attention 197to the type of flash employed by the SSD when making durability 198assumptions. 199For example, an Intel X25-V only has 40MB/s in write performance 200and burst writing by swapcache will seriously interfere with 201concurrent read operation on the SSD. 202The 80GB X25-M on the otherhand has double the write performance. 203But the Intel 310 series SSDs use flash chips with a smaller feature 204size so an 80G 310 series SSD will wind up with a durability relative 205close to the older 40G X25-V. 206.Pp 207When data caching is turned on you generally always want swapcache's 208chflags mode enabled and use 209.Xr chflags 1 210with the 211.Va cache 212flag to enable data caching on a directory. 213This flag is tracked by the namecache and does not need to be 214recursively set in the directory tree. 215Simply setting the flag in a top level directory or mount point 216is usually sufficient. 217However, the flag does not track across mount points. 218A typical setup is something like this: 219.Pp 220.Dl chflags cache /etc /sbin /bin /usr /home 221.Dl chflags noscache /usr/obj 222.Pp 223It is possible to tell 224.Nm 225to ignore the cache flag by setting 226.Va vm.swapcache.use_chflags 227to zero, but it is not recommended. 228.Nm chflag Ns 'ing . 229.Pp 230Filesystems such as NFS which do not support flags generally 231have a 232.Va cache 233mount option which enables swapcache operation on the mount. 234.Pp 235.It Va vm.swapcache.maxfilesize 236This may be used to reduce cache thrashing when a focus on a small 237potentially fragmented filespace is desired, leaving the 238larger (more linearly accessed) files alone. 239.Pp 240.It Va vm.swapcache.minburst 241This controls hysteresis and prevents nickel-and-dime write bursting. 242Once 243.Va curburst 244drops to zero, writing to the swapcache ceases until it has recovered past 245.Va minburst . 246The idea here is to avoid creating a heavily fragmented swapcache where 247reading data from a file must alternate between the cache and the primary 248filesystem. 249Doing so does not save disk seeks on the primary filesystem 250so we want to avoid doing small bursts. 251This parameter allows us to do larger bursts. 252The larger bursts also tend to improve SSD performance as the SSD itself 253can do a better job write-combining and erasing blocks. 254.Pp 255.It Va vm_swapcache.maxswappct 256This controls the maximum amount of swapspace 257.Nm 258may use, in percentage terms. 259The default is 75%, leaving the remaining 25% of swap available for normal 260paging operations. 261.El 262.Pp 263It is important to note that you should always use 264.Xr disklabel64 8 265to label your SSD. 266Disklabel64 will properly align the base of the 267partition space relative to the physical drive regardless of how badly 268aligned the fdisk slice is. 269This will significantly reduce write amplification and write combining 270inefficiencies on the SSD. 271.Pp 272Finally, interleaved swap (multiple SSDs) may be used to increase 273performance even further. 274A single SATA-II SSD is typically capable of reading 120-220MB/sec. 275Configuring two SSDs for your swap will 276improve aggregate swapcache read performance by 1.5x to 1.8x. 277In tests with two Intel 40GB SSDs 300MB/sec was easily achieved. 278With two SATA-III SSDs it is possible to achieve 600MB/sec or better 279and well over 400MB/sec random-read performance (versus the ~3MB/sec 280random read performance a hard drive gives you). 281.Pp 282At this point you will be configuring more swap space than a 32 bit 283.Dx 284kernel can handle (due to KVM limitations). 285By default, 32 bit 286.Dx 287systems only support 32GB of configured swap and while this limit 288can be increased somewhat by using 289.Va kern.maxswzone 290in 291.Pa /boot/loader.conf 292(a setting of 96m == a maximum of 96GB of swap), 293you will quickly run out of KVM. 294Running a 64-bit system with its 512G maximum swap space default 295is preferable at that point. 296.Pp 297In addition there will be periods of time where the system is in 298steady state and not writing to the swapcache. 299During these periods 300.Va curburst 301will inch back up but will not exceed 302.Va maxburst . 303Thus the 304.Va maxburst 305value controls how large a repeated burst can be. 306Remember that 307.Va curburst 308dynamically tracks burst and will go up and down depending. 309.Pp 310A second bursting parameter called 311.Va vm.swapcache.minburst 312controls bursting when the maximum write bandwidth has been reached. 313When 314.Va minburst 315reaches zero write activity ceases and 316.Va curburst 317is allowed to recover up to 318.Va minburst 319before write activity resumes. 320The recommended range for the 321.Va minburst 322parameter is 1MB to 50MB. 323This parameter has a relationship to 324how fragmented the swapcache gets when not in a steady state. 325Large bursts reduce fragmentation and reduce incidences of 326excessive seeking on the hard drive. 327If set too low the 328swapcache will become fragmented within a single regular file 329and the constant back-and-forth between the swapcache and the 330hard drive will result in excessive seeking on the hard drive. 331.Sh SWAPCACHE SIZE & MANAGEMENT 332The swapcache feature will use up to 75% of configured swap space 333by default. 334The remaining 25% is reserved for normal paging operation. 335The system operator should configure at least 4 times the SWAP space 336versus main memory and no less than 8GB of swap space. 337If a 40GB SSD is used the recommendation is to configure 16GB to 32GB of 338swap (note: 32-bit is limited to 32GB of swap by default, for 64-bit 339it is 512GB of swap), and to leave the remainder unwritten and unused. 340.Pp 341The 342.Va vm_swapcache.maxswappct 343sysctl may be used to change the default. 344You may have to change this default if you also use 345.Xr tmpfs 5 , 346.Xr vn 4 , 347or if you have not allocated enough swap for reasonable normal paging 348activity to occur (in which case you probably shouldn't be using 349.Nm 350anyway). 351.Pp 352If swapcache reaches the 75% limit it will begin tearing down swap 353in linear bursts by iterating through available VM objects, until 354swap space use drops to 70%. 355The tear-down is limited by the rate at 356which new data is written and this rate in turn is often limited by 357.Va vm.swapcache.accrate , 358resulting in an orderly replacement of cached data and meta-data. 359The limit is typically only reached when doing full data+meta-data 360caching with no file size limitations and serving primarily large 361files, or (on a 64-bit system) bumping 362.Va kern.maxvnodes 363up to very high values. 364.Sh NORMAL SWAP PAGING ACTIVITY WITH SSD SWAP 365This is not a function of 366.Nm 367per se but instead a normal function of the system. 368Most systems have 369sufficient memory that they do not need to page memory to swap. 370These types of systems are the ones best suited for MLC SSD 371configured swap running with a 372.Nm 373configuration. 374Systems which modestly page to swap, in the range of a few hundred 375megabytes a day worth of writing, are also well suited for MLC SSD 376configured swap. 377Desktops usually fall into this category even if they 378page out a bit more because swap activity is governed by the actions of 379a single person. 380.Pp 381Systems which page anonymous memory heavily when 382.Nm 383would otherwise be turned off are not usually well suited for MLC SSD 384configured swap. 385Heavy paging activity is not governed by 386.Nm 387bandwidth control parameters and can lead to excessive uncontrolled 388writing to the MLC SSD, causing premature wearout. 389You would have to use the lower density, more expensive SLC SSD 390technology (which has 10x the durability). 391This isn't to say that 392.Nm 393would be ineffective, just that the aggregate write bandwidth required 394to support the system would be too large for MLC flash technologies. 395.Pp 396With this caveat in mind, SSD based paging on systems with insufficient 397RAM can be extremely effective in extending the useful life of the system. 398For example, a system with a measly 192MB of RAM and SSD swap can run 399a -j 8 parallel build world in a little less than twice the time it 400would take if the system had 2GB of RAM, whereas it would take 5x to 10x 401as long with normal HD based swap. 402.Sh USING SWAPCACHE WITH NORMAL HARD DRIVES 403Although 404.Nm 405is designed to work with SSD-based storage it can also be used with 406HD-based storage as an aid for offloading the primary storage system. 407Here we need to make a distinction between using RAID for fanning out 408storage versus using RAID for redundancy. There are numerous situations 409where RAID-based redundancy does not make sense. 410.Pp 411A good example would be in an environment where the servers themselves 412are redundant and can suffer a total failure without effecting 413ongoing operations. When the primary storage requirements easily fit onto 414a single large-capacity drive it doesn't make a whole lot of sense to 415use RAID if your only desire is to improve performance. If you had a farm 416of, say, 20 servers supporting the same facility adding RAID to each one 417would not accomplish anything other than to bloat your deployment and 418maintenance costs. 419.Pp 420In these sorts of situations it may be desirable and convenient to have 421the primary filesystem for each machine on a single large drive and then 422use the 423.Nm 424facility to offload the drive and make the machine more effective without 425actually distributing the filesystem itself across multiple drives. 426For the purposes of offloading while a SSD would be the most effective 427from a performance standpoint, a second medium sized HD with its much lower 428cost and higher capacity might actually be more cost effective. 429.Pp 430In cases where you might desire to use 431.Nm 432with a normal hard drive you should probably consider running a 64-bit 433.Dx 434instead of a 32-bit system. 435The 64-bit build is capable of supporting much larger swap configurations 436(upwards of 512G) and would be a more suitable match against a medium-sized 437HD. 438.Sh EXPLANATION OF STATIC VS DYNAMIC WEARING LEVELING, AND WRITE-COMBINING 439Modern SSDs keep track of space that has never been written to. 440This would also include space freed up via TRIM, but simply not 441touching a bit of storage in a factory fresh SSD works just as well. 442Once you touch (write to) the storage all bets are off, even if 443you reformat/repartition later. It takes sending the SSD a 444whole-device TRIM command or special format command to take it back 445to its factory-fresh condition (sans wear already present). 446.Pp 447SSDs have wear leveling algorithms which are responsible for trying 448to even out the erase/write cycles across all flash cells in the 449storage. The better a job the SSD can do the longer the SSD will 450remain usable. 451.Pp 452The more unused storage there is from the SSDs point of view the 453easier a time the SSD has running its wear leveling algorithms. 454Basically the wear leveling algorithm in a modern SSD (say Intel or OCZ) 455uses a combination of static and dynamic leveling. Static is the 456best, allowing the SSD to reuse flash cells that have not been 457erased very much by moving static (unchanging) data out of them and 458into other cells that have more wear. Dynamic wear leveling involves 459writing data to available flash cells and then marking the cells containing 460the previous copy of the data as being free/reusable. Dynamic wear leveling 461is the worst kind but the easiest to implement. Modern SSDs use a combination 462of both algorithms plus also do write-combining. 463.Pp 464USB sticks often use only dynamic wear leveling and have short life spans 465because of that. 466.Pp 467In anycase, any unused space in the SSD effectively makes the dynamic 468wear leveling the SSD does more efficient by giving the SSD more 'unused' 469space above and beyond the physical space it reserves beyond its stated 470storage capacity to cycle data throgh, so the SSD lasts longer in theory. 471.Pp 472Write-combining is a feature whereby the SSD is able to reduced write 473amplification effects by combining OS writes of smaller, discrete, 474non-contiguous logical sectors into a single contiguous 128KB physical 475flash block. 476.Pp 477On the flip side write-combining also results in more complex lookup tables 478which can become fragmented over time and reduce the SSDs read performance. 479Fragmentation can also occur when write-combined blocks are rewritten 480piecemeal. 481Modern SSDs can regain the lost performance by de-combining previously 482write-combined areas as part of their static wear leveling algorithm, but 483at the cost of extra write/erase cycles which slightly increase write 484amplification effects. 485Operating systems can also help maintain the SSDs performance by utilizing 486larger blocks. 487Write-combining results in a net-reduction 488of write-amplification effects but due to having to de-combine later and 489other fragmentary effects it isn't 100%. 490From testing with Intel devices write-amplification can be well controlled 491in the 2x-4x range with the OS doing 16K writes, versus a worst-case 4928x write-amplification with 16K blocks, 32x with 4K blocks, and a truly 493horrid worst-case with 512 byte blocks. 494.Pp 495The 496.Dx 497.Nm 498feature utilizes 64K-128K writes and is specifically designed to minimize 499write amplification and write-combining stresses. 500In terms of placing an actual filesystem on the SSD, the 501.Dx 502.Xr hammer 8 503filesystem utilizes 16K blocks and is well behaved as long as you limit 504reblocking operations. 505For UFS you should create the filesystem with at least a 4K fragment 506size, versus the default 2K. 507Modern Windows filesystems use 4K clusters but it is unclear how SSD-friendly 508NTFS is. 509.Sh EXPLANATION OF FLASH CHIP FEATURE SIZE VS ERASE/REWRITE CYCLE DURABILITY 510Manufacturers continue to produce flash chips with smaller feature sizes. 511Smaller flash cells means reduced erase/rewrite cycle durability which in 512turn reduces the durability of the SSD. 513.Pp 514The older 34nm flash typically had a 10,000 cell durability while the newer 51525nm flash is closer to 1000. The newer flash uses larger ECCs and more 516sensitive voltage comparators on-chip to increase the durability closer to 5173000 cycles. Generally speaking you should assume a durability of around 5181/3 for the same storage capacity using the new chips versus the older 519chips. If you can squeeze out a 400TB durability from an older 40GB X25-V 520using 34nm technology then you should assume around a 400TB durability from 521a newer 120GB 310 series SSD using 25nm technology. 522.Sh WARNINGS 523I am going to repeat and expand a bit on SSD wear. 524Wear on SSDs is a function of the write durability of the cells, 525whether the SSD implements static or dynamic wear leveling (or both), 526write amplification effects when the OS does not issue write-aligned 128KB 527ops or when the SSD is unable to write-combine adjacent logical sectors, 528or if the SSD has a poor write-combining algorithm for non-adjacent sectors. 529In addition some additional erase/rewrite activity occurs from cleanup 530operations the SSD performs as part of its static wear leveling algorithms 531and its write-decombining algorithms (necessary to maintain performance over 532time). MLC flash uses 128KB physical write/erase blocks while SLC flash 533typically uses 64KB physical write/erase blocks. 534.Pp 535The algorithms the SSD implements in its firmware are probably the most 536important part of the device and a major differentiator between e.g. SATA 537and USB-based SSDs. SATA form factor drives will universally be far superior 538to USB storage sticks. 539SSDs can also have wildly different wearout rates and wildly different 540performance curves over time. 541For example the performance of a SSD which does not implement 542write-decombining can seriously degrade over time as its lookup 543tables become severely fragmented. 544For the purposes of this manual page we are primarily using Intel and OCZ 545drives when describing performance and wear issues. 546.Pp 547.Nm 548parameters should be carefully chosen to avoid early wearout. 549For example, the Intel X25V 40GB SSD has a minimum write durability 550of 40TB and an actual durability that can be quite a bit higher. 551Generally speaking, you want to select parameters that will give you 552at least 10 years of service life. 553The most important parameter to control this is 554.Va vm.swapcache.accrate . 555.Nm 556uses a very conservative 100KB/sec default but even a small X25V 557can probably handle 300KB/sec of continuous writing and still last 10 years. 558.Pp 559Depending on the wear leveling algorithm the drive uses, durability 560and performance can sometimes be improved by configuring less 561space (in a manufacturer-fresh drive) than the drive's probed capacity. 562For example, by only using 32GB of a 40GB SSD. 563SSDs typically implement 10% more storage than advertised and 564use this storage to improve wear leveling. 565As cells begin to fail 566this overallotment slowly becomes part of the primary storage 567until it has been exhausted. 568After that the SSD has basically failed. 569Keep in mind that if you use a larger portion of the SSD's advertised 570storage the SSD will not know if/when you decide to use less unless 571appropriate TRIM commands are sent (if supported), or a low level 572factory erase is issued. 573.Pp 574.Nm smartctl 575(from pkgsrc's sysutils/smartmontools) may be used to retrieve 576the wear indicator from the drive. 577One usually runs something like 578.Ql smartctl -d sat -a /dev/daXX 579(for AHCI/SILI/SCSI), or 580.Ql smartctl -a /dev/adXX 581for NATA. 582Some SSDs 583(particularly the Intels) will brick the SATA port when smart operations 584are done while the drive is busy with normal activity, so the tool should 585only be run when the SSD is idle. 586.Pp 587ID 232 (0xe8) in the SMART data dump indicates available reserved 588space and ID 233 (0xe9) is the wear-out meter. 589Reserved space 590typically starts at 100 and decrements to 10, after which the SSD 591is considered to operate in a degraded mode. 592The wear-out meter typically starts at 99 and decrements to 0, 593after which the SSD has failed. 594.Pp 595.Nm 596tends to use large 64KB writes and tends to cluster multiple writes 597linearly. 598The SSD is able to take significant advantage of this 599and write amplification effects are greatly reduced. 600If we take a 40GB Intel X25V as an example the vendor specifies a write 601durability of approximately 40TB, but 602.Nm 603should be able to squeeze out upwards of 200TB due the fairly optimal 604write clustering it does. 605The theoretical limit for the Intel X25V is 400TB (10,000 erase cycles 606per MLC cell, 40GB drive, with 34nm technology), but the firmware doesn't 607do perfect static wear leveling so the actual durability is less. 608In tests over several hundred days we have validated a write endurance 609greater than 200TB on the 40G Intel X25V using 610.Nm . 611.Pp 612In contrast, filesystems directly stored on a SSD could have 613fairly severe write amplification effects and will have durabilities 614ranging closer to the vendor-specified limit. 615.Pp 616Power-on hours, power cycles, and read operations do not really affect wear. 617There is something called read-disturb but it is unclear what sort of 618ratio would be needed. Since the data is cached in ram and thus not 619re-read at a high rate there is no expectation of a practical effect. 620For all intents and purposes only write operations effect wear. 621.Pp 622SSD's with MLC-based flash technology are high-density, low-cost solutions 623with limited write durability. 624SLC-based flash technology is a low-density, 625higher-cost solution with 10x the write durability as MLC. 626The durability also scales with the amount of flash storage. 627SLC based flash is typically 628twice as expensive per gigabyte. 629From a cost perspective, SLC based flash 630is at least 5x more cost effective in situations where high write 631bandwidths are required (because it lasts 10x longer). 632MLC is at least 2x more cost effective in situations where high 633write bandwidth is not required. 634When wear calculations are in years, these differences become huge, but 635often the quantity of storage needed trumps the wear life so we expect most 636people will be using MLC. 637.Nm 638is usable with both technologies. 639.Sh SEE ALSO 640.Xr chflags 1 , 641.Xr fstab 5 , 642.Xr disklabel64 8 , 643.Xr hammer 8 , 644.Xr swapon 8 645.Sh HISTORY 646.Nm 647first appeared in 648.Dx 2.5 . 649.Sh AUTHORS 650.An Matthew Dillon 651