1This file documents the Microsoft MDB file format for Jet3 and Jet4 databases.
2
3[TOC]
4
5General Notes
6-------------
7
8Access (Jet) does not in general initialize pages to zero before writing them,
9so the file will contains a lot of unititialized data.  This makes the task of
10figuring out the format a bit more difficult than it otherwise would be.
11
12This document will, generally speaking, provide all offsets and constants in
13hex format.
14
15Most multibyte pointer and integers are stored in little endian (LSB-MSB) order.
16There is an exception in the case of indexes, see the section on index pages for
17details.
18
19Terminology
20-----------
21
22This section contains a mix of information about data structures used in the MDB
23file format along with general database terminology needed to explain these
24structures.
25```
26Page          - A fixed size region within the file on a 2 or 4K boundry. All
27                data in the file exists inside pages.
28
29System Table  - Tables in Access generally starting with "MSys".  The 'Flags'
30                field in the table's Catalog Entry will contain a flag in one
31                of two positions (0x80000000 or 0x00000002).  See also the TDEF
32		        (table definition) pages for "System Table" field.
33
34Catalog Entry - A row from the MSysObjects table describing another database
35                object.  The MSysObjects table definition page is always at
36                page 2 of the database, and a phony tdef structure is
37                bootstrapped to initially read the database.
38
39Page Split    - A process in which a row is added to a page with no space left.
40                A second page is allocated and rows on the original page are
41		        split between the two pages and then indexes are updated. Pages
42                can use a variety of algorithms for splitting the rows, the
43                most popular being a 50/50 split in which rows are divided
44                evenly between pages.
45
46Overflow Page - Instead of doing a full page split with associated index writes,
47                a pointer to an "overflow" page can be stored at the original
48                row's location. Compacting a database would normally rewrite
49                overflow pages back into regular pages.
50
51Leaf Page     - The lowest page on an index tree.  In Access, leaf pages are of
52                a different type than other index pages.
53
54UCS-2         - a two byte unicode encoding used in Jet4 files.
55
56Covered Query - a query that can be satisfied by reading only index pages.  For
57                instance if the query
58		        "SELECT count(*) from Table1 where Column3 = 4" were run and
59                Column3 was indexed, the query could be satisfied by reading
60                only indexes.  Because of the way Access hashes text columns
61                in indexes, covered queries on text columns are not possible.
62```
63
64Pages
65-----
66
67At its topmost level, a MDB file is organized into a series of fixed-size
68pages.  These are 2K in size for Jet3 (Access 97) and 4K for Jet4 (Access
692000/2002).  All data in MDB files exists within pages, of which there are
70a number of types.
71
72The first byte of each page identifies the page type as follows.
73
74```
750x00 Database definition page.  (Always page 0)
760x01 Data page
770x02 Table definition
780x03 Intermediate Index pages
790x04 Leaf Index pages
800x05 Page Usage Bitmaps (extended page usage)
810x08 ??
82```
83
84Database Definition Page
85------------------------
86
87Each MDB database has a single definition page located at beginning of the
88file.  Not a lot is known about this page, and it is one of the least
89documented page types.  However, it contains things like Jet version,
90encryption keys, and name of the creating program.  Note, this page is
91"encrypted" with a simple rc4 key starting at offset 0x18 and extending for
92126 (Jet3) or 128 (Jet4) bytes.
93
94Offset 0x14 contains the Jet version of this database:
95
96- 0x00 for 3
97- 0x01 for 4
98- 0x02 for 5
99- 0x03 for Access 2010
100- 0x04 for Access 2013
101- 0x05 for Access 2016
102- 0x06 for Access 2019
103
104This is used by the `mdb-ver` utility to determine the Jet version.
105
106The 20 bytes (Jet3) or 40 bytes (Jet4) starting at 0x42 are the database
107password.  In Jet4, there is an additional mask applied to this password
108derived from the database creation date (also stored on this page as 8 bytes
109starting at offset 0x72).
110
111The 4 bytes at 0x3e on the Database Definition Page are the database key.
112
113The 2 bytes at 0x3C are the default database code page (useless in Jet4?).
114
115The 2 bytes at 0x3A (Jet3) or 4 bytes at 0x6E (Jet4) are the default text
116collating sort order.
117
118Data Pages
119----------
120
121Data rows are all stored in data pages.
122
123The header of a Jet3 data page looks like this:
124```
125+--------------------------------------------------------------------------+
126| Jet3 Data Page Definition                                                |
127+------+---------+---------------------------------------------------------+
128| data | length  | name       | description                                |
129+------+---------+---------------------------------------------------------+
130| 0x01 | 1 byte  | page_type  | 0x01 indicates a data page.                |
131| 0x01 | 1 byte  | unknown    |                                            |
132| ???? | 2 bytes | free_space | Free space in this page                    |
133| ???? | 4 bytes | tdef_pg    | Page pointer to table definition           |
134| ???? | 2 bytes | num_rows   | number of records on this page             |
135+--------------------------------------------------------------------------+
136| Iterate for the number of records                                        |
137+--------------------------------------------------------------------------+
138| ???? | 2 bytes | offset_row | The record's location on this page         |
139+--------------------------------------------------------------------------+
140```
141
142Notes:
143
144- In Jet4, an additional four-byte field was added after tdef_pg.  Its purpose
145  is currently unknown.
146- Offsets that have 0x40 in the high order byte point to a location within the
147  page where a Data Pointer (4 bytes) to another data page (also known as an
148  overflow page) is stored.  Called 'lookupflag' in source code.
149- Offsets that have 0x80 in the high order byte are deleted rows.  Called
150  'delflag' in source code.
151
152
153Rows are stored from the end of the page to the top of the page.  So, the first
154row stored runs from the row's offset to page_size - 1.  The next row runs from
155its offset to the previous row's offset - 1, and so on.
156
157Decoding a row requires knowing the number and types of columns from its TDEF
158page. Decoding is handled by the routine mdb_crack_row().
159
160```
161+--------------------------------------------------------------------------+
162| Jet3 Row Definition                                                      |
163+------+---------+---------------------------------------------------------+
164| data | length  | name       | description                                |
165+------+---------+---------------------------------------------------------+
166| ???? | 1 byte  | num_cols   | Number of columns stored on this row.      |
167| ???? | n bytes | fixed_cols | Fixed length columns                       |
168| ???? | n bytes | var_cols   | Variable length columns                    |
169| ???? | 1 byte  | eod        | length of data from begining of record     |
170| ???? | n bytes | var_table[]| offset from start of row for each var_col  |
171| ???? | n bytes | jump_table | Jump table (see description below)         |
172| ???? | 1 byte  | var_len    | number of variable length columns          |
173| ???? | n bytes | null_mask  | Null indicator.  See notes.                |
174+--------------------------------------------------------------------------+
175```
176
177```
178+--------------------------------------------------------------------------+
179| Jet4 Row Definition                                                      |
180+------+---------+---------------------------------------------------------+
181| data | length  | name       | description                                |
182+------+---------+---------------------------------------------------------+
183| ???? | 2 bytes | num_cols   | Number of columns stored on this row.      |
184| ???? | n bytes | fixed_cols | Fixed length columns                       |
185| ???? | n bytes | var_cols   | Variable length columns                    |
186| ???? | 2 bytes | eod        | length of data from begining of record     |
187| ???? | n bytes | var_table[]| offset from start of row for each var_col  |
188| ???? | 2 bytes | var_len    | number of variable length columns          |
189| ???? | n bytes | null_mask  | Null indicator.  See notes.                |
190+--------------------------------------------------------------------------+
191```
192
193Notes:
194
195- A row will always have the number of fixed columns as specified in the table
196  definition, but may have fewer variable columns, as rows are not updated when
197  columns are added.
198- All fixed-length columns are stored first to last, followed by non-null
199  variable-length columns stored first to last.
200- If the number of variable columns, as given in the TDEF, is 0, then the
201  only items in the row are num_cols, fixed_cols, and null_mask.
202- The var_len field indicates the number of entries in the var_table[].
203- The var_table[] and jump_table[] are stored in reverse order.
204- The eod field points at the first byte after the var_cols field.  It is used
205  to determine where the last var_col ends.
206- The size of the null mask is computed by (num_cols + 7)/8.
207- Fixed columns can be null (unlike some other databases).
208- The null mask stores one bit for each column, starting with the
209  least-significant bit of the first byte.
210- In the null mask, 0 represents null, and 1 represents not null.
211- Values for boolean fixed columns are in the null mask: 0 - false, 1 - true.
212
213In Jet3, offsets are stored as 1-byte fields yielding a maximum of 256 bytes.
214To get around this, offsets are computed using a jump table.  The jump table
215stores the number of the first column in each jump segment.  If the size of the
216row is less than 256 then the jump table will not be present.  Also, eod is
217treated as an additional entry of the var_table[].
218
219For example, if the row contains 45 columns and the 15th column is the first
220with an offset of 256 or greater, then the first entry in the jump table will be
2210xe (14).  If the 24th column is the first one at offset >= 512, the second
222entry of the jump table would be 0x17 (23).  If eod is the first entry >= 768,
223the last entry in this case will be 0x2d (45).
224
225The number of jump table entries is calculated based on the size of the row,
226rather than the location of eod.  As a result, there may be a dummy entry that
227contains 0xff.  In this case, and using the example above, the values in the
228jump table would be 0x2d 0x17 0x0e 0xff.
229
230In Jet4 all offsets are stored as 2 byte fields, including the var_table
231entries.  Thus, the jump table was (thankfully) ditched in Jet4.
232
233
234Each memo column (or other long binary data) in a row
235
236```
237+-------------------------------------------------------------------------+
238| Memo Field Definition (12 bytes)                                        |
239+------+---------+-------------+------------------------------------------+
240| data | length  | name        | description                              |
241+------+---------+-------------+------------------------------------------+
242| ???? | 3 bytes | memo_len    | Total length of the memo                 |
243| ???? | 1 bytes | bitmask     | See values                               |
244| ???? | 4 bytes | lval_dp     | Data pointer to LVAL page (if needed)    |
245| 0x00 | 4 bytes | unknown     |                                          |
246+------+---------+-------------+------------------------------------------+
247```
248
249Values for the bitmask:
250
251- 0x80 = the memo is in a string at the end of this header (memo_len bytes)
252- 0x40 = the memo is in a unique LVAL page in a record type 1
253- 0x00 = the memo is in n LVAL pages in a record type 2
254
255If the memo is in a LVAL page, we use row_id of lval_dp to find the row.
256
257```c
258offset_start of memo = (int16*) LVAL_page[offset_num_rows + (row_id * 2) + 2]
259if (row_id = 0)
260     offset_stop of memo = 2048(jet3) or 4096(jet4)
261else
262     offset_stop of memo = (int16*) LVAL_page[offset_num_row + (row_id * 2)]
263```
264
265The length (partial if type 2) for the memo is:
266memo_page_len = offset_stop - offset_start
267
268Update: The bitmask can't be an entire byte long.
269OLE fields can hold up to 1gig. That requires at least 30 bits, leaving only 2
270bits for flags. Maybe sometimes 0xC0000000 is ignored?
271See http://office.microsoft.com/en-us/access-help/access-2007-specifications-HA010030739.aspx
272Number of characters in a Memo field: 65,535 when entering data through the
273user interface; 2 gigabytes of character storage when entering data
274programmatically. That would mean 31 bits for length.
275
276Note: if a memo field is marked for compression, only at value which is at
277most 1024 characters when uncompressed can be compressed.  fields longer than
278that _must_ be stored uncompressed.
279
280
281LVAL (Long Value) Pages
282-----------------------
283
284The header of a LVAL page is just like that of a regular data page,
285except that in place of the tdef_pg is the word 'LVAL'.
286
287Each memo record type 1 looks like this:
288
289```
290+------+---------+-------------+------------------------------------------+
291| data | length  | name        | description                              |
292+------+---------+-------------+------------------------------------------+
293| ???? | n bytes | memo_value  | A string which is the memo               |
294+-------------------------------------------------------------------------+
295```
296
297Each memo record type 2 looks like this:
298
299```
300+------+---------+-------------+------------------------------------------+
301| data | length  | name        | description                              |
302+------+---------+-------------+------------------------------------------+
303| ???? | 4 bytes | lval_dp     | Next page LVAL type 2 if memo is too long|
304| ???? | n bytes | memo_value  | A string which is the memo (partial)     |
305+-------------------------------------------------------------------------+
306```
307
308In a LVAL type 2 data page, you have
309- 10 or 14 bytes for the header of the data page,
310- 2 bytes for an offset,
311- 4 bytes for the next lval_pg
312
313So there is a block of 2048 - (10+2+4) = 2032(jet3)
314or 4096 - (14+2+4) = 4076(jet4) bytes max in a page.
315
316
317TDEF (Table Definition) Pages
318-----------------------------
319
320Every table in the database has a TDEF page.  It contains a definition of
321the columns, types, sizes, indexes, and similar information.
322
323```
324+-------------------------------------------------------------------------+
325| Jet3/Jet4 TDEF Header
326+------+---------+-------------+------------------------------------------+
327| data | length  | name        | description                              |
328+------+---------+-------------+------------------------------------------+
329| 0x02 | 1 bytes | page_type   | 0x02 indicate a tabledef page            |
330| 0x01 | 1 bytes | unknown     |                                          |
331| ???? | 2 bytes | tdef_id     | (jet3) The word 'VC'                     |
332|      |         |             | (jet4) Free space in this page minus 8   |
333| 0x00 | 4 bytes | next_pg     | Next tdef page pointer (0 if none)       |
334+------+---------+-------------+------------------------------------------+
335```
336
337TDEFs can span multiple pages for large tables, this is accomplished using the
338next_pg field.
339
340```
341+-------------------------------------------------------------------------+
342| Jet3 Table Definition Block (35 bytes)                                  |
343+------+---------+-------------+------------------------------------------+
344| data | length  | name        | description                              |
345+------+---------+-------------+------------------------------------------+
346| ???? | 4 bytes | tdef_len    | Length of the data for this page         |
347| ???? | 4 bytes | num_rows    | Number of records in this table          |
348| 0x00 | 4 bytes | autonumber  | value for the next value of the          |
349|      |         |             | autonumber column, if any. 0 otherwise   |
350| 0x4e | 1 byte  | table_type  | 0x4e: user table, 0x53: system table     |
351| ???? | 2 bytes | max_cols    | Max columns a row will have (deletions)  |
352| ???? | 2 bytes | num_var_cols| Number of variable columns in table      |
353| ???? | 2 bytes | num_cols    | Number of columns in table (repeat)      |
354| ???? | 4 bytes | num_idx     | Number of logical indexes in table       |
355| ???? | 4 bytes | num_real_idx| Number of index entries                  |
356| ???? | 4 bytes | used_pages  | Points to a record containing the        |
357|      |         |             | usage bitmask for this table.            |
358| ???? | 4 bytes | free_pages  | Points to a similar record as above,     |
359|      |         |             | listing pages which contain free space.  |
360+-------------------------------------------------------------------------+
361| Iterate for the number of num_real_idx (8 bytes per idxs)               |
362+-------------------------------------------------------------------------+
363| 0x00 | 4 bytes | ???         |                                          |
364| ???? | 4 bytes | num_idx_rows| (not sure)                               |
365+-------------------------------------------------------------------------+
366| Iterate for the number of num_cols (18 bytes per column)                |
367+-------------------------------------------------------------------------+
368| ???? | 1 byte  | col_type    | Column Type (see table below)            |
369| ???? | 2 bytes | col_num     | Column Number (includes deleted columns) |
370| ???? | 2 bytes | offset_V    | Offset for variable length columns       |
371| ???? | 2 bytes | col_num     | Column Number                            |
372| ???? | 2 bytes | sort_order  | textual column sort order(0x409=General) |
373| ???? | 2 bytes | misc        | prec/scale (1 byte each), or code page   |
374|      |         |             | for textual columns (0x4E4=cp1252)       |
375| ???? | 2 bytes | ???         |                                          |
376| ???? | 1 byte  | bitmask     | See Column flags bellow                  |
377| ???? | 2 bytes | offset_F    | Offset for fixed length columns          |
378| ???? | 2 bytes | col_len     | Length of the column (0 if memo)         |
379+-------------------------------------------------------------------------+
380| Iterate for the number of num_cols (n bytes per column)                 |
381+-------------------------------------------------------------------------+
382| ???? | 1 byte  | col_name_len| len of the name of the column            |
383| ???? | n bytes | col_name    | Name of the column                       |
384+-------------------------------------------------------------------------+
385| Iterate for the number of num_real_idx (30+9 = 39 bytes)                |
386+-------------------------------------------------------------------------+
387|     Iterate 10 times for 10 possible columns (10*3 = 30 bytes)          |
388+-------------------------------------------------------------------------+
389| ???? | 2 bytes | col_num     | number of a column (0xFFFF= none)        |
390| ???? | 1 byte  | col_order   | 0x01 =  ascendency order                 |
391+-------------------------------------------------------------------------+
392| ???? | 4 bytes | used_pages  | Points to usage bitmap for index         |
393| ???? | 4 bytes | first_dp    | Data pointer of the index page           |
394| ???? | 1 byte  | flags       | See flags table for indexes              |
395+-------------------------------------------------------------------------+
396| Iterate for the number of num_idx (20 bytes)                            |
397+-------------------------------------------------------------------------+
398| ???? | 4 bytes | index_num   | Number of the index                      |
399|      |         |             |(warn: not always in the sequential order)|
400| ???? | 4 bytes | index_num2  | Index into index cols list               |
401| 0x00 | 1 byte  | rel_tbl_type| type of the other table in this fk       |
402|      |         |             | (same values as index_type)              |
403| 0xFF | 4 bytes | rel_idx_num | index number of other index in fk        |
404|      |         |             | (or -1 if this index is not a fk)        |
405| 0x00 | 4 bytes | rel_tbl_page| page number of other table in fk         |
406| 0x01 | 1 byte  | cascade_ups | flag indicating if updates are cascaded  |
407| 0x01 | 1 byte  | cascade_dels| flag indicating if deletes are cascaded  |
408| ???? | 1 byte  | index_type  | 0x01 if index is primary, 0x02 if foreign|
409+-------------------------------------------------------------------------+
410| Iterate for the number of num_idx                                       |
411+-------------------------------------------------------------------------+
412| ???? | 1 byte  | idx_name_len| len of the name of the index             |
413| ???? | n bytes | idx_name    | Name of the index                        |
414+-------------------------------------------------------------------------+
415| Iterate while col_num != 0xffff                                         |
416+-------------------------------------------------------------------------+
417| ???? | 2 bytes | col_num     | Column number with variable length       |
418| ???? | 4 bytes | used_pages  | Points to a record containing the        |
419|      |         |             | usage bitmask for this column.           |
420| ???? | 4 bytes | free_pages  | Points to a similar record as above,     |
421|      |         |             | listing pages which contain free space.  |
422+-------------------------------------------------------------------------+
423
424+-------------------------------------------------------------------------+
425| Jet4 Table Definition Block (55 bytes)                                  |
426+------+---------+-------------+------------------------------------------+
427| data | length  | name        | description                              |
428+------+---------+-------------+------------------------------------------+
429| ???? | 4 bytes | tdef_len    | Length of the data for this page         |
430| ???? | 4 bytes | unknown     | unknown                                  |
431| ???? | 4 bytes | num_rows    | Number of records in this table          |
432| 0x00 | 4 bytes | autonumber  | value for the next value of the          |
433|      |         |             | autonumber column, if any. 0 otherwise   |
434| 0x01 | 1 byte  | autonum_flag| 0x01 makes autonumbers work in access    |
435| ???? | 3 bytes | unknown     | unknown                                  |
436| 0x00 | 4 bytes | ct_autonum  | autonumber value for complex type column(s) |
437|      |         |             | (shared across all columns in the table) |
438| ???? | 8 bytes | unknown     | unknown                                  |
439| 0x4e | 1 byte  | table_type  | 0x4e: user table, 0x53: system table     |
440| ???? | 2 bytes | max_cols    | Max columns a row will have (deletions)  |
441| ???? | 2 bytes | num_var_cols| Number of variable columns in table      |
442| ???? | 2 bytes | num_cols    | Number of columns in table (repeat)      |
443| ???? | 4 bytes | num_idx     | Number of logical indexes in table       |
444| ???? | 4 bytes | num_real_idx| Number of index entries                  |
445| ???? | 4 bytes | used_pages  | Points to a record containing the        |
446|      |         |             | usage bitmask for this table.            |
447| ???? | 4 bytes | free_pages  | Points to a similar record as above,     |
448|      |         |             | listing pages which contain free space.  |
449+-------------------------------------------------------------------------+
450| Iterate for the number of num_real_idx (12 bytes per idxs)              |
451+-------------------------------------------------------------------------+
452| 0x00 | 4 bytes | ???         |                                          |
453| ???? | 4 bytes | num_idx_rows| (not sure)                               |
454| 0x00 | 4 bytes | ???         |                                          |
455+-------------------------------------------------------------------------+
456| Iterate for the number of num_cols (25 bytes per column)                |
457+-------------------------------------------------------------------------+
458| ???? | 1 byte  | col_type    | Column Type (see table below)            |
459| ???? | 4 bytes | unknown     | matches first unknown definition block   |
460| ???? | 2 bytes | col_num     | Column Number (includes deleted columns) |
461| ???? | 2 bytes | offset_V    | Offset for variable length columns       |
462| ???? | 2 bytes | col_num     | Column Number                            |
463| ???? | 2 bytes | misc        | prec/scale (1 byte each), or sort order  |
464|      |         |             | for textual columns(0x409=General)       |
465|      |         |             | or "complexid" for complex columns (4bytes)|
466| ???? | 2 bytes | misc_ext    | text sort order version num is 2nd byte  |
467| ???? | 1 byte  | bitmask     | See column flags below                   |
468| ???? | 1 byte  | misc_flags  | 0x01 for compressed unicode              |
469| 0000 | 4 bytes | ???         |                                          |
470| ???? | 2 bytes | offset_F    | Offset for fixed length columns          |
471| ???? | 2 bytes | col_len     | Length of the column (0 if memo/ole)     |
472+-------------------------------------------------------------------------+
473| Iterate for the number of num_cols (n*2 bytes per column)               |
474+-------------------------------------------------------------------------+
475| ???? | 2 bytes | col_name_len| len of the name of the column            |
476| ???? | n bytes | col_name    | Name of the column (UCS-2 format)        |
477+-------------------------------------------------------------------------+
478| Iterate for the number of num_real_idx (30+22 = 52 bytes)               |
479+-------------------------------------------------------------------------+
480| ???? | 4 bytes | ???         |                                          |
481+-------------------------------------------------------------------------+
482| Iterate 10 times for 10 possible columns (10*3 = 30 bytes)              |
483+-------------------------------------------------------------------------+
484| ???? | 2 bytes | col_num     | number of a column (0xFFFF= none)        |
485| ???? | 1 byte  | col_order   | 0x01 =  ascendency order                 |
486+-------------------------------------------------------------------------+
487| ???? | 4 bytes | used_pages  | Points to usage bitmap for index         |
488| ???? | 4 bytes | first_dp    | Data pointer of the index page           |
489| ???? | 1 byte  | flags       | See flags table for indexes              |
490| ???? | 9 bytes | unknown     |                                          |
491+-------------------------------------------------------------------------+
492| Iterate for the number of num_idx (28 bytes)                            |
493+-------------------------------------------------------------------------+
494| ???? | 4 bytes | unknown     | matches first unknown definition block   |
495| ???? | 4 bytes | index_num   | Number of the index                      |
496|      |         |             |(warn: not always in the sequential order)|
497| ???? | 4 bytes | index_num2  | Index into index cols list               |
498| 0x00 | 1 byte  | rel_tbl_type| type of the other table in this fk       |
499|      |         |             | (same values as index_type)              |
500| 0xFF | 4 bytes | rel_idx_num | index number of other index in fk        |
501|      |         |             | (or -1 if this index is not a fk)        |
502| 0x00 | 4 bytes | rel_tbl_page| page number of other table in fk         |
503| 0x01 | 1 byte  | cascade_ups | flag indicating if updates are cascaded  |
504| 0x01 | 1 byte  | cascade_dels| flag indicating if deletes are cascaded  |
505| ???? | 1 byte  | index_type  | 0x01 if index is primary, 0x02 if foreign|
506+-------------------------------------------------------------------------+
507| Iterate for the number of num_idx                                       |
508+-------------------------------------------------------------------------+
509| ???? | 2 bytes | idx_name_len| len of the name of the index             |
510| ???? | n bytes | idx_name    | Name of the index (UCS-2)                |
511+-------------------------------------------------------------------------+
512| Iterate while col_num != 0xffff                                         |
513+-------------------------------------------------------------------------+
514| ???? | 2 bytes | col_num     | Column number with variable length       |
515| ???? | 4 bytes | used_pages  | Points to a record containing the        |
516|      |         |             | usage bitmask for this column.           |
517| ???? | 4 bytes | free_pages  | Points to a similar record as above,     |
518|      |         |             | listing pages which contain free space.  |
519+-------------------------------------------------------------------------+
520```
521
522Columns flags (not complete):
523
524- 0x01: fixed length column
525- 0x02: can be null (possibly related to joins?)
526- 0x04: is auto long
527- 0x10: replication related field (or hidden?). These columns start with "s_" or
528      "Gen_" (the "Gen_" fields are for memo fields)
529- 0x40: is auto guid
530- 0x80: hyperlink. Syntax is "Link Title#http://example.com/somepage.html#" or
531      "#PAGE.HTM#"
532
533In Access 2007 and Access 2010, "Complex Columns" (multivalued fields, version
534history, attachments) always have the flag byte set to exactly 0x07.
535
536Index flags (not complete):
537- 0x01 Unique
538- 0x02 IgnoreNuls
539- 0x08 Required
540
541Column Type may be one of the following (not complete):
542```
543    BOOL            = 0x01 /* Boolean         ( 1 bit ) */
544    BYTE            = 0x02 /* Byte            ( 8 bits) */
545    INT             = 0x03 /* Integer         (16 bits) */
546    LONGINT         = 0x04 /* Long Integer    (32 bits) */
547    MONEY           = 0x05 /* Currency        (64 bits) */
548    FLOAT           = 0x06 /* Single          (32 bits) */
549    DOUBLE          = 0x07 /* Double          (64 bits) */
550    DATETIME        = 0x08 /* Date/Time       (64 bits) */
551    BINARY          = 0x09 /* Binary        (255 bytes) */
552    TEXT            = 0x0A /* Text          (255 bytes) */
553    OLE             = 0x0B /* OLE = Long binary */
554    MEMO            = 0x0C /* Memo = Long text*/
555    UNKNOWN_0D      = 0x0D
556    UNKNOWN_0E      = 0x0E
557    REPID           = 0x0F /* GUID */
558    NUMERIC         = 0x10 /* Scaled decimal  (17 bytes) */
559
560```
561
562Notes on reading index metadata:
563
564There are 2 types of index metadata, "physical" index info (denoted by
565num_real_idx) and "logical" index info (denoted by num_idx).  Normally, there
566is a 1 to 1 relationship between these 2 types of information.  However there
567can be more logical index infos than physical index infos (currently only seen
568for foreign key indexes).  In this situation, one or more of the logical
569indexes actually share the same underlying physical index (the index_num2
570indicates which physical index backs which logical index).
571
572As noted in the previous paragraph, physical index sharing is generally only
573seen when a foreign key index has been created.  When access creates a
574relationship between 2 tables with "enforce referential integrity" enabled,
575each of the tables gets an extra logical index with type 2 (foreign key).
576These logical indexes contain extra information, primarily pointers to the
577related table (rel_tbl_page) and logical index (rel_idx_num).  Also, the
578rel_tbl_type value indicates which table in the relationship is the "primary"
579table (the one one from which cascaded updates/deletes flow).  If the indexed
580columns for the foreign key are already indexed by another logical index in
581the table (e.g. an index which the user has explicitly created), then the
582logical foreign key index will simply share the underlying physical index
583data.
584
585Notes on deleted and added columns: (sort of Jet4 specific)
586
587If a fixed length column is deleted the offset_F field will contain the offsets
588of the original row definition.  Thus if the number of columns on the row does
589not match the number in the tdef, the offset_F field could be used to return
590the proper data. Columns are never really deleted in the row data.  The deleted
591column will forever exist and be set to null for new rows.
592
593A row may have less than max_cols columns but will never have more, as max_cols
594is never decremented.  If you have a table with 6 columns, delete one, and add
595one, then max_cols will be 7.
596
597For variable length columns, offset_V will hold the position in the offset table
598of that column.  Missing columns are set to null for new rows.
599
600
601Page Usage Maps
602---------------
603
604There are three uses for the page usage bitmaps.  There is a global page usage
605stored on page 1 which tracks allocated pages throughout the database.
606
607Tables store two page usage bitmaps.  One is a straight map of which pages are
608owned by the table.  The second is a map of the pages owned by the table which
609have free space on them (used for inserting data).
610
611The table bitmaps appear to be of a fixed size for both Jet 3 and 4 (128 and 64
612bytes respectively).  The first byte of the map is a type field.
613
614```
615+--------------------------------------------------------------------------+
616| Type 0 Page Usage Map                                                    |
617+------+---------+---------------------------------------------------------+
618| data | length  | name       | description                                |
619+------+---------+---------------------------------------------------------+
620| 0x00 | 1 byte  | map_type   | 0x00 indicates map stored within.          |
621| ???? | 4 byte  | page_start | first page for which this map applies      |
622+------+---------+---------------------------------------------------------+
623| Iterate for the length of map                                            |
624+--------------------------------------------------------------------------+
625| ???? | 1 byte  | bitmap     | each bit encodes the allocation status of a|
626|      |         |            | page. 1 indicates allocated to this table. |
627|      |         |            | Pages are stored starting with the low     |
628|      |         |            | order bit of the first byte.               |
629+--------------------------------------------------------------------------+
630```
631
632If you're paying attention then you'll realize that the relatively small size of
633the map (128*8*2048 or 64*8*4096 = 2 Meg) means that this scheme won't work with
634larger database files although the initial start page helps a bit.  To overcome
635this there is a second page usage map scheme with the map_type of 0x01.
636
637```
638+--------------------------------------------------------------------------+
639| Type 1 Page Usage Map                                                    |
640+------+---------+---------------------------------------------------------+
641| data | length  | name       | description                                |
642+------+---------+---------------------------------------------------------+
643| 0x01 | 1 byte  | map_type   | 0x01 indicates this is a indirection list. |
644+------+---------+---------------------------------------------------------+
645| Iterate for the length of map                                            |
646+--------------------------------------------------------------------------+
647| ???? | 4 bytes | map_page   | pointer to page type 0x05 containing map   |
648+--------------------------------------------------------------------------+
649```
650
651Note that the initial start page is gone and is reused for the first page
652indirection.  The 0x05 type page header looks like:
653
654```
655+--------------------------------------------------------------------------+
656| Usage Map Page (type 0x05)                                               |
657+------+---------+---------------------------------------------------------+
658| data | length  | name       | description                                |
659+------+---------+---------------------------------------------------------+
660| 0x05 | 1 byte  | page_type  | allocation map page                        |
661| 0x01 | 1 byte  | unknown    | always 1 as with other page types          |
662| 0x00 | 2 bytes | unknown    |                                            |
663+------+---------+---------------------------------------------------------+
664```
665
666The rest of the page is the allocation bitmap following the same scheme (lsb
667to msb order, 1 bit per page) as a type 0 map.  This yields a maximum of
6682044*8=16352 (jet3) or 4092*8 = 32736 (jet4) pages mapped per type 0x05 page.
669Given 128/4+1 = 33 or 64/4+1 = 17 page pointers per indirection row (remember
670the start page field is reused, thus the +1), this yields 33*16352*2048 = 1053
671Meg (jet3) or 17*32736*4096 = 2173 Meg (jet4) or enough to cover the maximum
672size of each of the database formats comfortably, so there is no reason to
673believe any other page map schemes exist.
674
675
676Indices
677-------
678
679Indices are not completely understood but here is what we know.
680
681```
682+-------------------------------------------------------------------------+
683| Index Page (type 0x03)                                                  |
684+------+---------+-------------+------------------------------------------+
685| data | length  | name        | description                              |
686+------+---------+-------------+------------------------------------------+
687| 0x03 | 1 bytes | page_type   | 0x03 indicate an index page              |
688| 0x01 | 1 bytes | unknown     |                                          |
689| ???? | 2 bytes | free_space  | The free space at the end this page      |
690| ???? | 4 bytes | parent_page | The page number of the TDEF for this idx |
691| ???? | 4 bytes | prev_page   | Previous page at this index level        |
692| ???? | 4 bytes | next_page   | Next page at this index level            |
693| ???? | 4 bytes | tail_page   | Pointer to tail leaf page                |
694| ???? | 2 bytes | pref_len    | Length of the shared entry prefix        |
695+-------------------------------------------------------------------------+
696```
697
698Index pages come in two flavors.
699
7000x04 pages are leaf pages which contain one entry for each row in the table.
701Each entry is composed of a flag, the indexed column values and a page/row
702pointer to the data.
703
7040x03 index pages make up the rest of the index tree and contain a flag, the
705indexed columns, the page/row that contains this entry, and the leaf page or
706intermediate (another 0x03 page) page pointer for which this is the first
707entry on.
708
709Both index types have a bitmask starting at 0x16(jet3) or 0x1b(jet4) which
710identifies the starting location of each index entry on this page.  The first
711entry begins at offset 0xf8(jet3) or 0x1e0(jet4), and is not explicitly
712indicated in the bitmask.  Note that the count in each byte begins with the
713low order bit.  For example take the data:
714
715```
71600 20 00 04 80 00 ...
717```
718
719Convert the bytes to binary starting with the low order bit in each byte.  v's
720mark where each entry begins:
721```
722v                v                 v                v
7230000 0000  0000 0100  0000 0000  0010 0000  0000 0001  0000 0000
724-- 00 ---  -- 20 ---  -- 00 ---  -- 04 ---  -- 80 ---  -- 00 ---
725```
726
727As noted earlier, the first entry is implicit.  The second entry begins at an
728offset of 13 (0xd) bytes from the first.  The third entry 26 (0x1a) bytes from
729the first.  The final entry starts at an offset of 39 (0x27) bytes from the
730first.  In this example the rest of the mask (up to offset 0xf8/0x1e0) would be
731zero-filled and thus this last entry isn't an actual entry, but the stopping
732point of the data.
733
734For Jet3, (0xf8 - 0x16) * 8 = 0x710 and 0x800 - 0xf8 = 0x708.
735For Jet4, (0x1e0 - 0x1b) * 8 = 0xe28 and 0x1000 - 0x1e0 = 0xe20.
736So the mask just covers the page, including space to indicate if the last entry
737goes to the end of the page.  One wonders why MS didn't use a row offset table
738like they did on data pages.  It seems like it would have been easier and more
739flexible.
740
741So now we come to the index entries for type 0x03 pages which look like this:
742
743```
744+-------------------------------------------------------------------------+
745| Index Record                                                            |
746+------+---------+-------------+------------------------------------------+
747| data | length  | name        | description                              |
748+------+---------+-------------+------------------------------------------+
749| 0x7f | 1 byte  | flags       | 0x80 LSB, 0x7f MSB, 0x00 null?           |
750| ???? | variable| indexed cols| indexed column data                      |
751| ???? | 3 bytes | data page   | page containing row referred to by this  |
752|      |         |             | index entry                              |
753| ???? | 1 byte  | data row    | row number on that page of this entry    |
754| ???? | 4 bytes | child page  | next level index page containing this    |
755|      |         |             | entry as last entry.  Could be a leaf    |
756|      |         |             | node.                                    |
757+-------------------------------------------------------------------------+
758```
759
760The flag field is generally either 0x00, 0x7f, 0x80, or 0xFF.  0x80 is the
761one's complement of 0x7f and all text data in the index would then need to be
762negated.  The reason for this negation is descending order.  The 0x00 flag
763indicates that the key column is null (or 0xFF for descending order), and no
764data will follow, only the page pointer.  In multicolumn indexes the flag
765field plus data is repeated for the number of columns participating in the
766key.  Index entries are always sorted based on the lexicographical order of
767the entry bytes of the entire index entry (thus descending order is achieved
768by negating the bytes).  The flag field ensures that null values are always
769sorted at the beginning (for ascending) or end (for descending) of the index.
770
771Note, there is a compression scheme utilizing a shared entry prefix.  If an
772index page has a shared entry prefix (idicated by a pref_len > 0), then the
773first pref_len bytes from the first entry need to be pre-pended to every
774subsequent entry on the page to get the full entry bytes.  For example,
775normally an index entry with an integer primary key would be 9 bytes (1 for
776the flags field, 4 for the integer, 4 for page/row).  If the pref_len on the
777index page were 4, every entry after the first would then contain only 5
778bytes, where the first byte is the last octet of the encoded primary key field
779(integer) and the last four are the page/row pointer.  Thus if the first key
780value on the page is 1 and it points to page 261 (00 01 05) row 3, it becomes:
781
782```
7837f 00 00 00 01 00 01 05 03
784```
785and the next index entry can be:
786
787```
78802 00 01 05 04
789```
790That is, the shared prefix is [7f 00 00 00], so the actual next entry is:
791
792```
793[7f 00 00 00] 02 00 01 05 04
794```
795
796so the key value is 2 (the last octet changes to 02) page 261 row 4.
797
798Access stores an 'alphabetic sort order' version of the text key columns in the
799index.  Here is the encoding as we know it:
800
801```
8020-9: 0x56-0x5f
803A-Z: 0x60-0x79
804a-z: 0x60-0x79
805```
806
807Once converted into this (non-ascii) character set, the text value can be
808sorted in 'alphabetic' order using the lexicographical order of the entry
809bytes.  A text column will end with a NULL (0x00 or 0xff if negated).
810
811Note, this encoding is the "General" sort order in Access 2000-2007 (1033,
812version 0).  As of Access 2010, this is now called the "General legacy" sort
813order, and the 2010 "General" sort order is a new encoding (1033, vesion 1).
814
815The leaf page entries store the key column and the 3 byte page and 1 byte row
816number.
817
818The value of the index root page in the index definition may be an index page
819(type 0x03), an index leaf page (type 0x04) if there is only one index page,
820or (in the case of tables small enough to fit on one page) a data page
821(type 0x01).
822
823So to search the index, you need to convert your value into the alphabetic
824character set, compare against each index entry, and on successful comparison
825follow the page and row number to the data.  Because text data is managled
826during this conversion there is no 'covered querys' possible on text columns.
827
828To conserve on frequent index updates, Jet also does something special when
829creating new leaf pages at the end of a primary key index (or other index
830where new values are generally added to the end of the index).  The tail leaf
831page pointer of the last leaf node points to the new leaf page but the index
832tree is not otherwise updated.  Since index entries in type 0x03 index pages
833point to the last entry in the page, adding a new entry to the end of a large
834index would cause updates all the way up the index tree.  Instead, the tail
835page can be updated in isolation until it is full, and then moved into the
836index proper.  In src/libmdb/index.c, the last leaf read is stored, once the
837index search has been exhausted by the normal search routine, it enters a
838"clean up mode" and reads the next leaf page pointer until it's null.
839
840Properties
841----------
842
843Design View table definitions are stored in LvProp column of MSysObjects as OLE
844fields. They contain default values, description, format, required ...
845
846They start with a 32 bits header: 'KKD\0' in Jet3 and 'MR2\0' in Jet 4.
847
848Next come chunks. Each chunk starts with:
849
850- 32 bits length value (this includes the length)
851- 16 bits chunk type (0x0080 contains the names, 0x0000 and 0x0001 contain
852	the values.  0x0000 seems to contain information about the "main" object,
853	e.g. the table, and 0x0001 seems to contain information about other
854	objects, e.g. the table columns)
855
856```
857Name chunk blocks (0x0080) simply contain occurences of:
85816 bit name length
859name
860For instance:
8610x0d 0x00 and 'AccessVersion' (AccessVersion is 13 bytes, 0x0d 0x00 intel order)
862```
863
864Value chunk blocks (0x0000 and 0x0001) contain a header:
865- 32 bits length value (this includes the length)
866- 16 bits name length
867- name  (0x0000 chunk blocks are not usually named, 0x0001 chunk blocks have the
868      column name to which the properties belong)
869
870Next comes one of more chunks of data:
871- 16 bit length value    (this includes the length)
872- 8 bit ddl flag
873- 8 bit type
874- 16 bit name (index in the name array of above chunk 0x0080)
875- 16 bit value length field (non-inclusive)
876  value (07.53 for the AccessVersion example above)
877
878See ``props.c``` for an example.
879
880
881Text Data Type
882--------------
883
884In Jet3, the encoding of text depends on the machine on which it was created.
885So for databases created on U.S. English systems, it can be expected that text
886is encoded in CP1252.  This is the default used by mdbtools.  If you know that
887another encoding has been used, you can override the default by setting the
888environment variable MDB_JET3_CHARSET.  To find out what encodings will work on
889your system, run 'iconv -l'.
890
891In Jet4, the encoding can be either little-endian UCS-2, or a special
892compressed form of it.  This compressed format begins with 0xff 0xfe.
893The string then starts in compressed mode, where characters with 0x00 for the
894most-significant byte do not encode it.  In the compressed format, a 0x00 byte
895signals a change from compressed mode to uncompressed mode, or from
896uncompressed mode back to compressed mode.  The string may end in either mode.
897Note that a string containing any character 0x##00 (UCS-2) will not be
898compressed.  Also, the string will only be compressed if it really does make
899the string shorter as compared to uncompressed UCS-2.
900
901Programs that use mdbtools libraries will receive strings encoded in UTF-8 by
902default.  This default can by overridden by setting the environment variable
903MDBICONV to the desired encoding.
904