1/*! @m_page{{c,java},file_formats,File formats and compression}
2
3@section file_formats_formats File formats
4
5WiredTiger supports two underlying file formats: row-store and
6column-store, where both are B+tree implementations of key/value stores.
7WiredTiger also supports @ref lsm, implemented as a tree of B+trees.
8
9In a row-store, both keys and data are variable-length byte strings.  In
10a column-store, keys are 64-bit record numbers (key_format type 'r'),
11and values are either variable- or fixed-length byte strings.
12
13Generally, row-stores are faster for queries where all of the columns
14are required by every lookup (because there's only a single set of
15meta-data pages to read into the cache and search).  Column-stores are
16faster when most queries require only a subset of the columns (because
17columns can be separated into multiple files and only the columns being
18returned need be present in the cache).
19
20Row-store keys and values, and variable-length column-store values, can
21be up to (4GB - 512B) in length.  Keys and values too large to fit on a
22normal page are stored as overflow items in the file, and are likely to
23require additional file I/O to access.
24
25Fixed-length column-store values (value_format type 't'), are limited
26to 8-bits, and only values between 0 and 255 may be stored.
27Additionally, there is no out-of-band fixed-length "deleted" value, and
28deleting a value is the same as storing a value of 0.  For the same
29reason, storing a value of 0 will cause cursor scans to skip the record.
30
31WiredTiger does not support duplicate data items: there can be only a
32single value associated with any given key, and applications are
33responsible for creating unique key/value pairs.
34
35WiredTiger allocates space from the underlying files in block units.
36The minimum file allocation unit WiredTiger supports is 512B and the
37maximum is 512MB.  File offsets are signed 8B values, making the maximum
38file size very, very large.
39
40@section file_formats_choice Choosing a file format
41
42The row-store format is the default choice for most applications. When
43the primary key is a record number, there are advantages to storing
44columns in separate files, or the underlying data is a set of bits,
45column-store format may be a better choice.
46
47Both row- and column-store formats can maintain high volumes of writes,
48but for data sets requiring sustained, extreme write throughput,
49@ref lsm are usually a better choice.  For applications that do not
50require extreme write throughput, row- or column-store is likely to be
51a better choice because the read throughput is better than with LSM trees
52(an effect that becomes more pronounced as additional read threads are
53added).
54
55Applications with complex schemas may also benefit from using multiple
56storage formats, that is, using a combination of different formats in
57the database, and even in individual tables (for example, a sparse, wide
58table configured with a column-store primary, where indexes are stored
59in an LSM tree).
60
61Finally, as WiredTiger makes it easy to switch back-and-forth between
62storage configurations, it's usually worthwhile benchmarking possible
63configurations when there is any question.
64
65@section file_formats_compression File formats and compression
66
67Row-stores support four types of compression: key prefix compression,
68dictionary compression, Huffman encoding and block compression.
69
70- Key prefix compression reduces the size requirement of both in-memory
71and on-disk objects by storing any identical key prefix only once per
72page.
73
74  The cost is additional CPU and memory when operating on the in-memory tree.
75Specifically, sequential cursor movement through prefix-compressed page in
76reverse (but not forward) order, or the random lookup of a key/value pair will
77allocate sufficient memory to hold some number of uncompressed keys.  So, for
78example, if key prefix compression only saves a small number of bytes per key,
79the additional memory cost of instantiating the uncompressed key may mean
80prefix compression is not worthwhile.  Further, in cases where the
81on-disk cost is the primary concern, block compression may mean prefix
82compression is less useful.
83
84  Applications may limit the use of prefix compression by configuring the
85minimum number of bytes that must be gained before prefix compression is
86used with the WT_SESSION::create method's \c prefix_compression_min
87configuration string.
88
89  Key prefix compression is disabled by default.
90
91- Dictionary compression reduces the size requirement of both the
92in-memory and on-disk objects by storing any identical value only once
93per page.  The cost is minor additional CPU and memory use when writing
94pages to disk.
95
96  Dictionary compression is disabled by default.
97
98- Huffman encoding reduces the size requirement of both the in-memory
99and on-disk objects by compressing individual key/value items, and can
100be separately configured either or both keys and values.  The cost is
101additional CPU and memory use when searching the in-memory tree (if keys
102are encoded), and additional CPU and memory use when returning values
103from the in-memory tree and when writing pages to disk.  Note the
104additional CPU cost of Huffman encoding can be high, and should be
105considered.  (See @subpage_single huffman for details.)
106
107  Huffman encoding is disabled by default.
108
109- Block compression reduces the size requirement of on-disk objects by
110compressing blocks of the backing object's file.  The cost is additional
111CPU and memory use when reading and writing pages to disk.  Note the
112additional CPU cost of block compression can be high, and should be
113considered.   (See @x_ref compression_considerations for details.)
114
115  Block compression is disabled by default.
116
117Column-stores with variable-length byte string values support four
118types of compression: run-length encoding, dictionary compression,
119Huffman encoding and block compression.
120
121- Run-length encoding reduces the size requirement of both the in-memory
122and on-disk objects by storing sequential, duplicate values in the store
123only a single time (with an associated count).  The cost is minor
124additional CPU and memory use when returning values from the in-memory
125tree and when writing pages to disk.
126
127  Run-length encoding is always enabled and cannot be turned off.
128
129- Dictionary compression reduces the size requirement of both the
130in-memory and on-disk objects by storing any identical value only once
131per page.  The cost is minor additional CPU and memory use when
132returning values from the in-memory tree and when writing pages to disk.
133
134  Dictionary compression is disabled by default.
135
136- Huffman encoding reduces the size requirement of both the in-memory
137and on-disk objects by compressing individual value items.  The cost is
138additional CPU and memory use when returning values from the in-memory
139tree and when writing pages to disk.  Note the additional CPU cost of
140Huffman encoding can be high, and should be considered.
141(See @ref_single huffman for details.)
142
143  Huffman encoding is disabled by default.
144
145- Block compression reduces the size requirement of on-disk objects by
146compressing blocks of the backing object's file.  The cost is additional
147CPU and memory use when reading and writing pages to disk.  Note the
148additional CPU cost of block compression can be high, and should be
149considered.   (See @x_ref compression_considerations for details.)
150
151  Block compression is disabled by default.
152
153Column-stores with fixed-length byte values support a single type of
154compression: block compression.
155
156- Block compression reduces the size requirement of on-disk objects by
157compressing blocks of the backing object's file.  The cost is additional
158CPU and memory use when reading and writing pages to disk.  Note the
159additional CPU cost of block compression can be high, and should be
160considered.   (See @x_ref compression_considerations for details.)
161
162  Block compression is disabled by default.
163
164*/
165