xref: /dragonfly/share/man/man8/swapcache.8 (revision 926deccb)
1.\"
2.\" swapcache - Cache clean filesystem data & meta-data on SSD-based swap
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.Dd February 7, 2010
13.Dt SWAPCACHE 8
14.Os
15.Sh NAME
16.Nm swapcache
17.Nd a mechanism to use fast swap to cache filesystem data and meta-data
18.Sh SYNOPSIS
19.Cd sysctl vm.swapcache.accrate=100000
20.Cd sysctl vm.swapcache.maxfilesize=0
21.Cd sysctl vm.swapcache.maxburst=2000000000
22.Cd sysctl vm.swapcache.curburst=4000000000
23.Cd sysctl vm.swapcache.minburst=10000000
24.Cd sysctl vm.swapcache.read_enable=0
25.Cd sysctl vm.swapcache.meta_enable=0
26.Cd sysctl vm.swapcache.data_enable=0
27.Cd sysctl vm.swapcache.use_chflags=1
28.Cd sysctl vm.swapcache.maxlaunder=256
29.Cd sysctl vm.swapcache.hysteresis=(vm.stats.vm.v_inactive_target/2)
30.Sh DESCRIPTION
31.Nm
32is a system capability which allows a solid state disk (SSD) in a swap
33space configuration to be used to cache clean filesystem data and meta-data
34in addition to its normal function of backing anonymous memory.
35.Pp
36Sysctls are used to manage operational parameters and can be adjusted at
37any time.
38Typically a large initial burst is desired after system boot,
39controlled by the initial
40.Va vm.swapcache.curburst
41parameter.
42This parameter is reduced as data is written to swap by the swapcache
43and increased at a rate specified by
44.Va vm.swapcache.accrate .
45Once this parameter reaches zero write activity ceases until it has
46recovered sufficiently for write activity to resume.
47.Pp
48.Va vm.swapcache.meta_enable
49enables the writing of filesystem meta-data to the swapcache.
50Filesystem
51metadata is any data which the filesystem accesses via the disk device
52using buffercache.
53Meta-data is cached globally regardless of file or directory flags.
54.Pp
55.Va vm.swapcache.data_enable
56enables the writing of clean filesystem file-data to the swapcache.
57Filesystem filedata is any data which the filesystem accesses via a
58regular file.
59In technical terms, when the buffer cache is used to access
60a regular file through its vnode.
61Please do not blindly turn on this option, see the
62.Sx PERFORMANCE TUNING
63section for more information.
64.Pp
65.Va vm.swapcache.use_chflags
66enables the use of the
67.Va cache
68and
69.Va noscache
70.Xr chflags 1
71flags to control which files will be data-cached.
72If this sysctl is disabled and
73.Va data_enable
74is enabled, the system will ignore file flags and attempt to
75swapcache all regular files.
76.Pp
77.Va vm.swapcache.read_enable
78enables reading from the swapcache and should be set to 1 for normal
79operation.
80.Pp
81.Va vm.swapcache.maxfilesize
82controls which files are to be cached based on their size.
83If set to non-zero only files smaller than the specified size
84will be cached.
85Larger files will not be cached.
86.Pp
87.Va vm.swapcache.maxlaunder
88controls the maximum number of clean VM pages which will be added to
89the swap cache and written out to swap on each poll.
90Swapcache polls ten times a second.
91.Pp
92.Va vm.swapcache.hysteresis
93controls how many pages swapcache waits to be added to the inactive page
94queue before continuing its scan.
95Once it decides to scan it continues subject to the above limitations
96until it reaches the end of the inactive page queue.
97This parameter is designed to make swapcache generate more bulky bursts
98to swap which helps SSDs reduce write amplification effects.
99.Sh PERFORMANCE TUNING
100Best operation is achieved when the active data set fits within the
101swapcache.
102.Pp
103.Bl -tag -width 4n -compact
104.It Va vm.swapcache.accrate
105This specifies the burst accumulation rate in bytes per second and
106ultimately controls the write bandwidth to swap averaged over a long
107period of time.
108This parameter must be carefully chosen to manage the write endurance of
109the SSD in order to avoid wearing it out too quickly.
110Even though SSDs have limited write endurance, there is massive
111cost/performance benefit to using one in a swapcache configuration.
112.Pp
113Let's use the Intel X25V 40GB MLC SATA SSD as an example.
114This device has approximately a
11540TB (40 terabyte) write endurance, but see later
116notes on this, it is more a minimum value.
117Limiting the long term average bandwidth to 100KB/sec leads to no more
118than ~9GB/day writing which calculates approximately to a 12 year endurance.
119Endurance scales linearly with size.
120The 80GB version of this SSD
121will have a write endurance of approximately 80TB.
122.Pp
123MLC SSDs have a 1000-10000x write endurance, while the lower density
124higher-cost SLC SSDs have a 10000-100000x write endurance, approximately.
125MLC SSDs can be used for the swapcache (and swap) as long as the system
126manager is cognizant of its limitations.
127.Pp
128.It Va vm.swapcache.meta_enable
129Turning on just
130.Va meta_enable
131causes only filesystem meta-data to be cached and will result
132in very fast directory operations even over millions of inodes
133and even in the face of other invasive operations being run
134by other processes.
135.Pp
136For
137.Nm HAMMER
138filesystems meta-data includes the B-Tree, directory entries,
139and data related to tiny files.
140Approximately 6 GB of swapcache is needed
141for every 14 million or so inodes cached, effectively giving one the
142ability to cache all the meta-data in a multi-terabyte filesystem using
143a fairly small SSD.
144.Pp
145.It Va vm.swapcache.data_enable
146Turning on
147.Va data_enable
148(with or without other features) allows bulk file data to be cached.
149This feature is very useful for web server operation when the
150operational data set fits in swap.
151However, care must be taken to avoid thrashing the swapcache.
152In almost all cases you will want to leave chflags mode enabled
153and use 'chflags cache' on governing directories to control which
154directory subtrees file data should be cached for.
155.Pp
156Vnode recycling can also cause problems.
15732-bit systems are typically limited to 100,000 cached vnodes and
15864-bit systems are typically limited to around 400,000 cached vnodes.
159When operating on a filesystem containing a large number of files
160vnode recycling by the kernel will cause related swapcache data
161to be lost and also cause potential thrashing of the swapcache.
162Cache thrashing due to vnode recyclement can occur whether chflags
163mode is used or not.
164.Pp
165To solve the thrashing problem you can turn on HAMMER's
166double buffering feature via
167.Va vfs.hammer.double_buffer .
168This causes HAMMER to cache file data via its block device.
169HAMMER cannot avoid also caching file data via individual vnodes
170but will try to expire the second copy more quickly (hence
171why it is called double buffer mode), but the key point here is
172that
173.Nm
174will only cache the data blocks via the block device when
175double_buffer mode is used and since the block device is associated
176with the mount it will not get recycled.
177This allows the data for any number (potentially millions) of files to
178be cached.
179You still should use chflags mode to control the size of the dataset
180being cached to remain under 75% of configured swap space.
181.Pp
182Data caching is definitely more wasteful of the SSD's write durability
183than meta-data caching.
184If not carefully managed the swapcache may exhaust its burst and smack
185against the long term average bandwidth limit, causing the SSD to wear
186out at the maximum rate you programmed.
187Data caching is far less wasteful and more efficient
188if (on a 64-bit system only) you provide a sufficiently large SSD.
189.Pp
190When caching large data sets you may want to use a medium-sized SSD
191with good write performance instead of a small SSD to accommodate
192the higher burst write rate data caching incurs and to reduce
193interference between reading and writing.
194Write durability also tends to scale with larger SSDs, but keep in mind
195that newer flash technologies use smaller feature sizes on-chip
196which reduce the write durability of the chips, so pay careful attention
197to the type of flash employed by the SSD when making durability
198assumptions.
199For example, an Intel X25-V only has 40MB/s in write performance
200and burst writing by swapcache will seriously interfere with
201concurrent read operation on the SSD.
202The 80GB X25-M on the otherhand has double the write performance.
203But the Intel 310 series SSDs use flash chips with a smaller feature
204size so an 80G 310 series SSD will wind up with a durability relative
205close to the older 40G X25-V.
206.Pp
207When data caching is turned on you generally always want swapcache's
208chflags mode enabled and use
209.Xr chflags 1
210with the
211.Va cache
212flag to enable data caching on a directory.
213This flag is tracked by the namecache and does not need to be
214recursively set in the directory tree.
215Simply setting the flag in a top level directory or mount point
216is usually sufficient.
217However, the flag does not track across mount points.
218A typical setup is something like this:
219.Pp
220.Dl chflags cache /etc /sbin /bin /usr /home
221.Dl chflags noscache /usr/obj
222.Pp
223It is possible to tell
224.Nm
225to ignore the cache flag by setting
226.Va vm.swapcache.use_chflags
227to zero, but it is not recommended.
228.Nm chflag Ns 'ing .
229.Pp
230Filesystems such as NFS which do not support flags generally
231have a
232.Va cache
233mount option which enables swapcache operation on the mount.
234.Pp
235.It Va vm.swapcache.maxfilesize
236This may be used to reduce cache thrashing when a focus on a small
237potentially fragmented filespace is desired, leaving the
238larger (more linearly accessed) files alone.
239.Pp
240.It Va vm.swapcache.minburst
241This controls hysteresis and prevents nickel-and-dime write bursting.
242Once
243.Va curburst
244drops to zero, writing to the swapcache ceases until it has recovered past
245.Va minburst .
246The idea here is to avoid creating a heavily fragmented swapcache where
247reading data from a file must alternate between the cache and the primary
248filesystem.
249Doing so does not save disk seeks on the primary filesystem
250so we want to avoid doing small bursts.
251This parameter allows us to do larger bursts.
252The larger bursts also tend to improve SSD performance as the SSD itself
253can do a better job write-combining and erasing blocks.
254.Pp
255.It Va vm_swapcache.maxswappct
256This controls the maximum amount of swapspace
257.Nm
258may use, in percentage terms.
259The default is 75%, leaving the remaining 25% of swap available for normal
260paging operations.
261.El
262.Pp
263It is important to note that you should always use
264.Xr disklabel64 8
265to label your SSD.
266Disklabel64 will properly align the base of the
267partition space relative to the physical drive regardless of how badly
268aligned the fdisk slice is.
269This will significantly reduce write amplification and write combining
270inefficiencies on the SSD.
271.Pp
272Finally, interleaved swap (multiple SSDs) may be used to increase
273performance even further.
274A single SATA-II SSD is typically capable of reading 120-220MB/sec.
275Configuring two SSDs for your swap will
276improve aggregate swapcache read performance by 1.5x to 1.8x.
277In tests with two Intel 40GB SSDs 300MB/sec was easily achieved.
278With two SATA-III SSDs it is possible to achieve 600MB/sec or better
279and well over 400MB/sec random-read performance (versus the ~3MB/sec
280random read performance a hard drive gives you).
281.Pp
282At this point you will be configuring more swap space than a 32 bit
283.Dx
284kernel can handle (due to KVM limitations).
285By default, 32 bit
286.Dx
287systems only support 32GB of configured swap and while this limit
288can be increased somewhat by using
289.Va kern.maxswzone
290in
291.Pa /boot/loader.conf
292(a setting of 96m == a maximum of 96GB of swap),
293you will quickly run out of KVM.
294Running a 64-bit system with its 512G maximum swap space default
295is preferable at that point.
296.Pp
297In addition there will be periods of time where the system is in
298steady state and not writing to the swapcache.
299During these periods
300.Va curburst
301will inch back up but will not exceed
302.Va maxburst .
303Thus the
304.Va maxburst
305value controls how large a repeated burst can be.
306Remember that
307.Va curburst
308dynamically tracks burst and will go up and down depending.
309.Pp
310A second bursting parameter called
311.Va vm.swapcache.minburst
312controls bursting when the maximum write bandwidth has been reached.
313When
314.Va minburst
315reaches zero write activity ceases and
316.Va curburst
317is allowed to recover up to
318.Va minburst
319before write activity resumes.
320The recommended range for the
321.Va minburst
322parameter is 1MB to 50MB.
323This parameter has a relationship to
324how fragmented the swapcache gets when not in a steady state.
325Large bursts reduce fragmentation and reduce incidences of
326excessive seeking on the hard drive.
327If set too low the
328swapcache will become fragmented within a single regular file
329and the constant back-and-forth between the swapcache and the
330hard drive will result in excessive seeking on the hard drive.
331.Sh SWAPCACHE SIZE & MANAGEMENT
332The swapcache feature will use up to 75% of configured swap space
333by default.
334The remaining 25% is reserved for normal paging operation.
335The system operator should configure at least 4 times the SWAP space
336versus main memory and no less than 8GB of swap space.
337If a 40GB SSD is used the recommendation is to configure 16GB to 32GB of
338swap (note: 32-bit is limited to 32GB of swap by default, for 64-bit
339it is 512GB of swap), and to leave the remainder unwritten and unused.
340.Pp
341The
342.Va vm_swapcache.maxswappct
343sysctl may be used to change the default.
344You may have to change this default if you also use
345.Xr tmpfs 5 ,
346.Xr vn 4 ,
347or if you have not allocated enough swap for reasonable normal paging
348activity to occur (in which case you probably shouldn't be using
349.Nm
350anyway).
351.Pp
352If swapcache reaches the 75% limit it will begin tearing down swap
353in linear bursts by iterating through available VM objects, until
354swap space use drops to 70%.
355The tear-down is limited by the rate at
356which new data is written and this rate in turn is often limited by
357.Va vm.swapcache.accrate ,
358resulting in an orderly replacement of cached data and meta-data.
359The limit is typically only reached when doing full data+meta-data
360caching with no file size limitations and serving primarily large
361files, or (on a 64-bit system) bumping
362.Va kern.maxvnodes
363up to very high values.
364.Sh NORMAL SWAP PAGING ACTIVITY WITH SSD SWAP
365This is not a function of
366.Nm
367per se but instead a normal function of the system.
368Most systems have
369sufficient memory that they do not need to page memory to swap.
370These types of systems are the ones best suited for MLC SSD
371configured swap running with a
372.Nm
373configuration.
374Systems which modestly page to swap, in the range of a few hundred
375megabytes a day worth of writing, are also well suited for MLC SSD
376configured swap.
377Desktops usually fall into this category even if they
378page out a bit more because swap activity is governed by the actions of
379a single person.
380.Pp
381Systems which page anonymous memory heavily when
382.Nm
383would otherwise be turned off are not usually well suited for MLC SSD
384configured swap.
385Heavy paging activity is not governed by
386.Nm
387bandwidth control parameters and can lead to excessive uncontrolled
388writing to the MLC SSD, causing premature wearout.
389You would have to use the lower density, more expensive SLC SSD
390technology (which has 10x the durability).
391This isn't to say that
392.Nm
393would be ineffective, just that the aggregate write bandwidth required
394to support the system would be too large for MLC flash technologies.
395.Pp
396With this caveat in mind, SSD based paging on systems with insufficient
397RAM can be extremely effective in extending the useful life of the system.
398For example, a system with a measly 192MB of RAM and SSD swap can run
399a -j 8 parallel build world in a little less than twice the time it
400would take if the system had 2GB of RAM, whereas it would take 5x to 10x
401as long with normal HD based swap.
402.Sh USING SWAPCACHE WITH NORMAL HARD DRIVES
403Although
404.Nm
405is designed to work with SSD-based storage it can also be used with
406HD-based storage as an aid for offloading the primary storage system.
407Here we need to make a distinction between using RAID for fanning out
408storage versus using RAID for redundancy.  There are numerous situations
409where RAID-based redundancy does not make sense.
410.Pp
411A good example would be in an environment where the servers themselves
412are redundant and can suffer a total failure without effecting
413ongoing operations.  When the primary storage requirements easily fit onto
414a single large-capacity drive it doesn't make a whole lot of sense to
415use RAID if your only desire is to improve performance.  If you had a farm
416of, say, 20 servers supporting the same facility adding RAID to each one
417would not accomplish anything other than to bloat your deployment and
418maintenance costs.
419.Pp
420In these sorts of situations it may be desirable and convenient to have
421the primary filesystem for each machine on a single large drive and then
422use the
423.Nm
424facility to offload the drive and make the machine more effective without
425actually distributing the filesystem itself across multiple drives.
426For the purposes of offloading while a SSD would be the most effective
427from a performance standpoint, a second medium sized HD with its much lower
428cost and higher capacity might actually be more cost effective.
429.Pp
430In cases where you might desire to use
431.Nm
432with a normal hard drive you should probably consider running a 64-bit
433.Dx
434instead of a 32-bit system.
435The 64-bit build is capable of supporting much larger swap configurations
436(upwards of 512G) and would be a more suitable match against a medium-sized
437HD.
438.Sh EXPLANATION OF STATIC VS DYNAMIC WEARING LEVELING, AND WRITE-COMBINING
439Modern SSDs keep track of space that has never been written to.
440This would also include space freed up via TRIM, but simply not
441touching a bit of storage in a factory fresh SSD works just as well.
442Once you touch (write to) the storage all bets are off, even if
443you reformat/repartition later.  It takes sending the SSD a
444whole-device TRIM command or special format command to take it back
445to its factory-fresh condition (sans wear already present).
446.Pp
447SSDs have wear leveling algorithms which are responsible for trying
448to even out the erase/write cycles across all flash cells in the
449storage.  The better a job the SSD can do the longer the SSD will
450remain usable.
451.Pp
452The more unused storage there is from the SSDs point of view the
453easier a time the SSD has running its wear leveling algorithms.
454Basically the wear leveling algorithm in a modern SSD (say Intel or OCZ)
455uses a combination of static and dynamic leveling.  Static is the
456best, allowing the SSD to reuse flash cells that have not been
457erased very much by moving static (unchanging) data out of them and
458into other cells that have more wear.  Dynamic wear leveling involves
459writing data to available flash cells and then marking the cells containing
460the previous copy of the data as being free/reusable.  Dynamic wear leveling
461is the worst kind but the easiest to implement.  Modern SSDs use a combination
462of both algorithms plus also do write-combining.
463.Pp
464USB sticks often use only dynamic wear leveling and have short life spans
465because of that.
466.Pp
467In anycase, any unused space in the SSD effectively makes the dynamic
468wear leveling the SSD does more efficient by giving the SSD more 'unused'
469space above and beyond the physical space it reserves beyond its stated
470storage capacity to cycle data through, so the SSD lasts longer in theory.
471.Pp
472Write-combining is a feature whereby the SSD is able to reduced write
473amplification effects by combining OS writes of smaller, discrete,
474non-contiguous logical sectors into a single contiguous 128KB physical
475flash block.
476.Pp
477On the flip side write-combining also results in more complex lookup tables
478which can become fragmented over time and reduce the SSDs read performance.
479Fragmentation can also occur when write-combined blocks are rewritten
480piecemeal.
481Modern SSDs can regain the lost performance by de-combining previously
482write-combined areas as part of their static wear leveling algorithm, but
483at the cost of extra write/erase cycles which slightly increase write
484amplification effects.
485Operating systems can also help maintain the SSDs performance by utilizing
486larger blocks.
487Write-combining results in a net-reduction
488of write-amplification effects but due to having to de-combine later and
489other fragmentary effects it isn't 100%.
490From testing with Intel devices write-amplification can be well controlled
491in the 2x-4x range with the OS doing 16K writes, versus a worst-case
4928x write-amplification with 16K blocks, 32x with 4K blocks, and a truly
493horrid worst-case with 512 byte blocks.
494.Pp
495The
496.Dx
497.Nm
498feature utilizes 64K-128K writes and is specifically designed to minimize
499write amplification and write-combining stresses.
500In terms of placing an actual filesystem on the SSD, the
501.Dx
502.Xr hammer 8
503filesystem utilizes 16K blocks and is well behaved as long as you limit
504reblocking operations.
505For UFS you should create the filesystem with at least a 4K fragment
506size, versus the default 2K.
507Modern Windows filesystems use 4K clusters but it is unclear how SSD-friendly
508NTFS is.
509.Sh EXPLANATION OF FLASH CHIP FEATURE SIZE VS ERASE/REWRITE CYCLE DURABILITY
510Manufacturers continue to produce flash chips with smaller feature sizes.
511Smaller flash cells means reduced erase/rewrite cycle durability which in
512turn reduces the durability of the SSD.
513.Pp
514The older 34nm flash typically had a 10,000 cell durability while the newer
51525nm flash is closer to 1000.  The newer flash uses larger ECCs and more
516sensitive voltage comparators on-chip to increase the durability closer to
5173000 cycles.  Generally speaking you should assume a durability of around
5181/3 for the same storage capacity using the new chips versus the older
519chips.  If you can squeeze out a 400TB durability from an older 40GB X25-V
520using 34nm technology then you should assume around a 400TB durability from
521a newer 120GB 310 series SSD using 25nm technology.
522.Sh WARNINGS
523I am going to repeat and expand a bit on SSD wear.
524Wear on SSDs is a function of the write durability of the cells,
525whether the SSD implements static or dynamic wear leveling (or both),
526write amplification effects when the OS does not issue write-aligned 128KB
527ops or when the SSD is unable to write-combine adjacent logical sectors,
528or if the SSD has a poor write-combining algorithm for non-adjacent sectors.
529In addition some additional erase/rewrite activity occurs from cleanup
530operations the SSD performs as part of its static wear leveling algorithms
531and its write-decombining algorithms (necessary to maintain performance over
532time).  MLC flash uses 128KB physical write/erase blocks while SLC flash
533typically uses 64KB physical write/erase blocks.
534.Pp
535The algorithms the SSD implements in its firmware are probably the most
536important part of the device and a major differentiator between e.g. SATA
537and USB-based SSDs.  SATA form factor drives will universally be far superior
538to USB storage sticks.
539SSDs can also have wildly different wearout rates and wildly different
540performance curves over time.
541For example the performance of a SSD which does not implement
542write-decombining can seriously degrade over time as its lookup
543tables become severely fragmented.
544For the purposes of this manual page we are primarily using Intel and OCZ
545drives when describing performance and wear issues.
546.Pp
547.Nm
548parameters should be carefully chosen to avoid early wearout.
549For example, the Intel X25V 40GB SSD has a minimum write durability
550of 40TB and an actual durability that can be quite a bit higher.
551Generally speaking, you want to select parameters that will give you
552at least 10 years of service life.
553The most important parameter to control this is
554.Va vm.swapcache.accrate .
555.Nm
556uses a very conservative 100KB/sec default but even a small X25V
557can probably handle 300KB/sec of continuous writing and still last 10 years.
558.Pp
559Depending on the wear leveling algorithm the drive uses, durability
560and performance can sometimes be improved by configuring less
561space (in a manufacturer-fresh drive) than the drive's probed capacity.
562For example, by only using 32GB of a 40GB SSD.
563SSDs typically implement 10% more storage than advertised and
564use this storage to improve wear leveling.
565As cells begin to fail
566this overallotment slowly becomes part of the primary storage
567until it has been exhausted.
568After that the SSD has basically failed.
569Keep in mind that if you use a larger portion of the SSD's advertised
570storage the SSD will not know if/when you decide to use less unless
571appropriate TRIM commands are sent (if supported), or a low level
572factory erase is issued.
573.Pp
574.Nm smartctl
575(from
576.Xr dports Ap
577.Pa sysutils/smartmontools )
578may be used to retrieve the wear indicator from the drive.
579One usually runs something like
580.Ql smartctl -d sat -a /dev/daXX
581(for AHCI/SILI/SCSI), or
582.Ql smartctl -a /dev/adXX
583for NATA.
584Some SSDs
585(particularly the Intels) will brick the SATA port when smart operations
586are done while the drive is busy with normal activity, so the tool should
587only be run when the SSD is idle.
588.Pp
589ID 232 (0xe8) in the SMART data dump indicates available reserved
590space and ID 233 (0xe9) is the wear-out meter.
591Reserved space
592typically starts at 100 and decrements to 10, after which the SSD
593is considered to operate in a degraded mode.
594The wear-out meter typically starts at 99 and decrements to 0,
595after which the SSD has failed.
596.Pp
597.Nm
598tends to use large 64KB writes and tends to cluster multiple writes
599linearly.
600The SSD is able to take significant advantage of this
601and write amplification effects are greatly reduced.
602If we take a 40GB Intel X25V as an example the vendor specifies a write
603durability of approximately 40TB, but
604.Nm
605should be able to squeeze out upwards of 200TB due the fairly optimal
606write clustering it does.
607The theoretical limit for the Intel X25V is 400TB (10,000 erase cycles
608per MLC cell, 40GB drive, with 34nm technology), but the firmware doesn't
609do perfect static wear leveling so the actual durability is less.
610In tests over several hundred days we have validated a write endurance
611greater than 200TB on the 40G Intel X25V using
612.Nm .
613.Pp
614In contrast, filesystems directly stored on a SSD could have
615fairly severe write amplification effects and will have durabilities
616ranging closer to the vendor-specified limit.
617.Pp
618Power-on hours, power cycles, and read operations do not really affect wear.
619There is something called read-disturb but it is unclear what sort of
620ratio would be needed.  Since the data is cached in ram and thus not
621re-read at a high rate there is no expectation of a practical effect.
622For all intents and purposes only write operations effect wear.
623.Pp
624SSD's with MLC-based flash technology are high-density, low-cost solutions
625with limited write durability.
626SLC-based flash technology is a low-density,
627higher-cost solution with 10x the write durability as MLC.
628The durability also scales with the amount of flash storage.
629SLC based flash is typically
630twice as expensive per gigabyte.
631From a cost perspective, SLC based flash
632is at least 5x more cost effective in situations where high write
633bandwidths are required (because it lasts 10x longer).
634MLC is at least 2x more cost effective in situations where high
635write bandwidth is not required.
636When wear calculations are in years, these differences become huge, but
637often the quantity of storage needed trumps the wear life so we expect most
638people will be using MLC.
639.Nm
640is usable with both technologies.
641.Sh SEE ALSO
642.Xr chflags 1 ,
643.Xr fstab 5 ,
644.Xr disklabel64 8 ,
645.Xr hammer 8 ,
646.Xr swapon 8
647.Sh HISTORY
648.Nm
649first appeared in
650.Dx 2.5 .
651.Sh AUTHORS
652.An Matthew Dillon
653