1This file documents the Microsoft MDB file format for Jet3 and Jet4 databases. 2 3[TOC] 4 5General Notes 6------------- 7 8Access (Jet) does not in general initialize pages to zero before writing them, 9so the file will contains a lot of unititialized data. This makes the task of 10figuring out the format a bit more difficult than it otherwise would be. 11 12This document will, generally speaking, provide all offsets and constants in 13hex format. 14 15Most multibyte pointer and integers are stored in little endian (LSB-MSB) order. 16There is an exception in the case of indexes, see the section on index pages for 17details. 18 19Terminology 20----------- 21 22This section contains a mix of information about data structures used in the MDB 23file format along with general database terminology needed to explain these 24structures. 25``` 26Page - A fixed size region within the file on a 2 or 4K boundry. All 27 data in the file exists inside pages. 28 29System Table - Tables in Access generally starting with "MSys". The 'Flags' 30 field in the table's Catalog Entry will contain a flag in one 31 of two positions (0x80000000 or 0x00000002). See also the TDEF 32 (table definition) pages for "System Table" field. 33 34Catalog Entry - A row from the MSysObjects table describing another database 35 object. The MSysObjects table definition page is always at 36 page 2 of the database, and a phony tdef structure is 37 bootstrapped to initially read the database. 38 39Page Split - A process in which a row is added to a page with no space left. 40 A second page is allocated and rows on the original page are 41 split between the two pages and then indexes are updated. Pages 42 can use a variety of algorithms for splitting the rows, the 43 most popular being a 50/50 split in which rows are divided 44 evenly between pages. 45 46Overflow Page - Instead of doing a full page split with associated index writes, 47 a pointer to an "overflow" page can be stored at the original 48 row's location. Compacting a database would normally rewrite 49 overflow pages back into regular pages. 50 51Leaf Page - The lowest page on an index tree. In Access, leaf pages are of 52 a different type than other index pages. 53 54UCS-2 - a two byte unicode encoding used in Jet4 files. 55 56Covered Query - a query that can be satisfied by reading only index pages. For 57 instance if the query 58 "SELECT count(*) from Table1 where Column3 = 4" were run and 59 Column3 was indexed, the query could be satisfied by reading 60 only indexes. Because of the way Access hashes text columns 61 in indexes, covered queries on text columns are not possible. 62``` 63 64Pages 65----- 66 67At its topmost level, a MDB file is organized into a series of fixed-size 68pages. These are 2K in size for Jet3 (Access 97) and 4K for Jet4 (Access 692000/2002). All data in MDB files exists within pages, of which there are 70a number of types. 71 72The first byte of each page identifies the page type as follows. 73 74``` 750x00 Database definition page. (Always page 0) 760x01 Data page 770x02 Table definition 780x03 Intermediate Index pages 790x04 Leaf Index pages 800x05 Page Usage Bitmaps (extended page usage) 810x08 ?? 82``` 83 84Database Definition Page 85------------------------ 86 87Each MDB database has a single definition page located at beginning of the 88file. Not a lot is known about this page, and it is one of the least 89documented page types. However, it contains things like Jet version, 90encryption keys, and name of the creating program. Note, this page is 91"encrypted" with a simple rc4 key starting at offset 0x18 and extending for 92126 (Jet3) or 128 (Jet4) bytes. 93 94Offset 0x14 contains the Jet version of this database: 95 96- 0x00 for 3 97- 0x01 for 4 98- 0x02 for 5 99- 0x03 for Access 2010 100- 0x04 for Access 2013 101- 0x05 for Access 2016 102- 0x06 for Access 2019 103 104This is used by the `mdb-ver` utility to determine the Jet version. 105 106The 20 bytes (Jet3) or 40 bytes (Jet4) starting at 0x42 are the database 107password. In Jet4, there is an additional mask applied to this password 108derived from the database creation date (also stored on this page as 8 bytes 109starting at offset 0x72). 110 111The 4 bytes at 0x3e on the Database Definition Page are the database key. 112 113The 2 bytes at 0x3C are the default database code page (useless in Jet4?). 114 115The 2 bytes at 0x3A (Jet3) or 4 bytes at 0x6E (Jet4) are the default text 116collating sort order. 117 118Data Pages 119---------- 120 121Data rows are all stored in data pages. 122 123The header of a Jet3 data page looks like this: 124``` 125+--------------------------------------------------------------------------+ 126| Jet3 Data Page Definition | 127+------+---------+---------------------------------------------------------+ 128| data | length | name | description | 129+------+---------+---------------------------------------------------------+ 130| 0x01 | 1 byte | page_type | 0x01 indicates a data page. | 131| 0x01 | 1 byte | unknown | | 132| ???? | 2 bytes | free_space | Free space in this page | 133| ???? | 4 bytes | tdef_pg | Page pointer to table definition | 134| ???? | 2 bytes | num_rows | number of records on this page | 135+--------------------------------------------------------------------------+ 136| Iterate for the number of records | 137+--------------------------------------------------------------------------+ 138| ???? | 2 bytes | offset_row | The record's location on this page | 139+--------------------------------------------------------------------------+ 140``` 141 142Notes: 143 144- In Jet4, an additional four-byte field was added after tdef_pg. Its purpose 145 is currently unknown. 146- Offsets that have 0x40 in the high order byte point to a location within the 147 page where a Data Pointer (4 bytes) to another data page (also known as an 148 overflow page) is stored. Called 'lookupflag' in source code. 149- Offsets that have 0x80 in the high order byte are deleted rows. Called 150 'delflag' in source code. 151 152 153Rows are stored from the end of the page to the top of the page. So, the first 154row stored runs from the row's offset to page_size - 1. The next row runs from 155its offset to the previous row's offset - 1, and so on. 156 157Decoding a row requires knowing the number and types of columns from its TDEF 158page. Decoding is handled by the routine mdb_crack_row(). 159 160``` 161+--------------------------------------------------------------------------+ 162| Jet3 Row Definition | 163+------+---------+---------------------------------------------------------+ 164| data | length | name | description | 165+------+---------+---------------------------------------------------------+ 166| ???? | 1 byte | num_cols | Number of columns stored on this row. | 167| ???? | n bytes | fixed_cols | Fixed length columns | 168| ???? | n bytes | var_cols | Variable length columns | 169| ???? | 1 byte | eod | length of data from begining of record | 170| ???? | n bytes | var_table[]| offset from start of row for each var_col | 171| ???? | n bytes | jump_table | Jump table (see description below) | 172| ???? | 1 byte | var_len | number of variable length columns | 173| ???? | n bytes | null_mask | Null indicator. See notes. | 174+--------------------------------------------------------------------------+ 175``` 176 177``` 178+--------------------------------------------------------------------------+ 179| Jet4 Row Definition | 180+------+---------+---------------------------------------------------------+ 181| data | length | name | description | 182+------+---------+---------------------------------------------------------+ 183| ???? | 2 bytes | num_cols | Number of columns stored on this row. | 184| ???? | n bytes | fixed_cols | Fixed length columns | 185| ???? | n bytes | var_cols | Variable length columns | 186| ???? | 2 bytes | eod | length of data from begining of record | 187| ???? | n bytes | var_table[]| offset from start of row for each var_col | 188| ???? | 2 bytes | var_len | number of variable length columns | 189| ???? | n bytes | null_mask | Null indicator. See notes. | 190+--------------------------------------------------------------------------+ 191``` 192 193Notes: 194 195- A row will always have the number of fixed columns as specified in the table 196 definition, but may have fewer variable columns, as rows are not updated when 197 columns are added. 198- All fixed-length columns are stored first to last, followed by non-null 199 variable-length columns stored first to last. 200- If the number of variable columns, as given in the TDEF, is 0, then the 201 only items in the row are num_cols, fixed_cols, and null_mask. 202- The var_len field indicates the number of entries in the var_table[]. 203- The var_table[] and jump_table[] are stored in reverse order. 204- The eod field points at the first byte after the var_cols field. It is used 205 to determine where the last var_col ends. 206- The size of the null mask is computed by (num_cols + 7)/8. 207- Fixed columns can be null (unlike some other databases). 208- The null mask stores one bit for each column, starting with the 209 least-significant bit of the first byte. 210- In the null mask, 0 represents null, and 1 represents not null. 211- Values for boolean fixed columns are in the null mask: 0 - false, 1 - true. 212 213In Jet3, offsets are stored as 1-byte fields yielding a maximum of 256 bytes. 214To get around this, offsets are computed using a jump table. The jump table 215stores the number of the first column in each jump segment. If the size of the 216row is less than 256 then the jump table will not be present. Also, eod is 217treated as an additional entry of the var_table[]. 218 219For example, if the row contains 45 columns and the 15th column is the first 220with an offset of 256 or greater, then the first entry in the jump table will be 2210xe (14). If the 24th column is the first one at offset >= 512, the second 222entry of the jump table would be 0x17 (23). If eod is the first entry >= 768, 223the last entry in this case will be 0x2d (45). 224 225The number of jump table entries is calculated based on the size of the row, 226rather than the location of eod. As a result, there may be a dummy entry that 227contains 0xff. In this case, and using the example above, the values in the 228jump table would be 0x2d 0x17 0x0e 0xff. 229 230In Jet4 all offsets are stored as 2 byte fields, including the var_table 231entries. Thus, the jump table was (thankfully) ditched in Jet4. 232 233 234Each memo column (or other long binary data) in a row 235 236``` 237+-------------------------------------------------------------------------+ 238| Memo Field Definition (12 bytes) | 239+------+---------+-------------+------------------------------------------+ 240| data | length | name | description | 241+------+---------+-------------+------------------------------------------+ 242| ???? | 3 bytes | memo_len | Total length of the memo | 243| ???? | 1 bytes | bitmask | See values | 244| ???? | 4 bytes | lval_dp | Data pointer to LVAL page (if needed) | 245| 0x00 | 4 bytes | unknown | | 246+------+---------+-------------+------------------------------------------+ 247``` 248 249Values for the bitmask: 250 251- 0x80 = the memo is in a string at the end of this header (memo_len bytes) 252- 0x40 = the memo is in a unique LVAL page in a record type 1 253- 0x00 = the memo is in n LVAL pages in a record type 2 254 255If the memo is in a LVAL page, we use row_id of lval_dp to find the row. 256 257```c 258offset_start of memo = (int16*) LVAL_page[offset_num_rows + (row_id * 2) + 2] 259if (row_id = 0) 260 offset_stop of memo = 2048(jet3) or 4096(jet4) 261else 262 offset_stop of memo = (int16*) LVAL_page[offset_num_row + (row_id * 2)] 263``` 264 265The length (partial if type 2) for the memo is: 266memo_page_len = offset_stop - offset_start 267 268Update: The bitmask can't be an entire byte long. 269OLE fields can hold up to 1gig. That requires at least 30 bits, leaving only 2 270bits for flags. Maybe sometimes 0xC0000000 is ignored? 271See http://office.microsoft.com/en-us/access-help/access-2007-specifications-HA010030739.aspx 272Number of characters in a Memo field: 65,535 when entering data through the 273user interface; 2 gigabytes of character storage when entering data 274programmatically. That would mean 31 bits for length. 275 276Note: if a memo field is marked for compression, only at value which is at 277most 1024 characters when uncompressed can be compressed. fields longer than 278that _must_ be stored uncompressed. 279 280 281LVAL (Long Value) Pages 282----------------------- 283 284The header of a LVAL page is just like that of a regular data page, 285except that in place of the tdef_pg is the word 'LVAL'. 286 287Each memo record type 1 looks like this: 288 289``` 290+------+---------+-------------+------------------------------------------+ 291| data | length | name | description | 292+------+---------+-------------+------------------------------------------+ 293| ???? | n bytes | memo_value | A string which is the memo | 294+-------------------------------------------------------------------------+ 295``` 296 297Each memo record type 2 looks like this: 298 299``` 300+------+---------+-------------+------------------------------------------+ 301| data | length | name | description | 302+------+---------+-------------+------------------------------------------+ 303| ???? | 4 bytes | lval_dp | Next page LVAL type 2 if memo is too long| 304| ???? | n bytes | memo_value | A string which is the memo (partial) | 305+-------------------------------------------------------------------------+ 306``` 307 308In a LVAL type 2 data page, you have 309- 10 or 14 bytes for the header of the data page, 310- 2 bytes for an offset, 311- 4 bytes for the next lval_pg 312 313So there is a block of 2048 - (10+2+4) = 2032(jet3) 314or 4096 - (14+2+4) = 4076(jet4) bytes max in a page. 315 316 317TDEF (Table Definition) Pages 318----------------------------- 319 320Every table in the database has a TDEF page. It contains a definition of 321the columns, types, sizes, indexes, and similar information. 322 323``` 324+-------------------------------------------------------------------------+ 325| Jet3/Jet4 TDEF Header 326+------+---------+-------------+------------------------------------------+ 327| data | length | name | description | 328+------+---------+-------------+------------------------------------------+ 329| 0x02 | 1 bytes | page_type | 0x02 indicate a tabledef page | 330| 0x01 | 1 bytes | unknown | | 331| ???? | 2 bytes | tdef_id | (jet3) The word 'VC' | 332| | | | (jet4) Free space in this page minus 8 | 333| 0x00 | 4 bytes | next_pg | Next tdef page pointer (0 if none) | 334+------+---------+-------------+------------------------------------------+ 335``` 336 337TDEFs can span multiple pages for large tables, this is accomplished using the 338next_pg field. 339 340``` 341+-------------------------------------------------------------------------+ 342| Jet3 Table Definition Block (35 bytes) | 343+------+---------+-------------+------------------------------------------+ 344| data | length | name | description | 345+------+---------+-------------+------------------------------------------+ 346| ???? | 4 bytes | tdef_len | Length of the data for this page | 347| ???? | 4 bytes | num_rows | Number of records in this table | 348| 0x00 | 4 bytes | autonumber | value for the next value of the | 349| | | | autonumber column, if any. 0 otherwise | 350| 0x4e | 1 byte | table_type | 0x4e: user table, 0x53: system table | 351| ???? | 2 bytes | max_cols | Max columns a row will have (deletions) | 352| ???? | 2 bytes | num_var_cols| Number of variable columns in table | 353| ???? | 2 bytes | num_cols | Number of columns in table (repeat) | 354| ???? | 4 bytes | num_idx | Number of logical indexes in table | 355| ???? | 4 bytes | num_real_idx| Number of index entries | 356| ???? | 4 bytes | used_pages | Points to a record containing the | 357| | | | usage bitmask for this table. | 358| ???? | 4 bytes | free_pages | Points to a similar record as above, | 359| | | | listing pages which contain free space. | 360+-------------------------------------------------------------------------+ 361| Iterate for the number of num_real_idx (8 bytes per idxs) | 362+-------------------------------------------------------------------------+ 363| 0x00 | 4 bytes | ??? | | 364| ???? | 4 bytes | num_idx_rows| (not sure) | 365+-------------------------------------------------------------------------+ 366| Iterate for the number of num_cols (18 bytes per column) | 367+-------------------------------------------------------------------------+ 368| ???? | 1 byte | col_type | Column Type (see table below) | 369| ???? | 2 bytes | col_num | Column Number (includes deleted columns) | 370| ???? | 2 bytes | offset_V | Offset for variable length columns | 371| ???? | 2 bytes | col_num | Column Number | 372| ???? | 2 bytes | sort_order | textual column sort order(0x409=General) | 373| ???? | 2 bytes | misc | prec/scale (1 byte each), or code page | 374| | | | for textual columns (0x4E4=cp1252) | 375| ???? | 2 bytes | ??? | | 376| ???? | 1 byte | bitmask | See Column flags bellow | 377| ???? | 2 bytes | offset_F | Offset for fixed length columns | 378| ???? | 2 bytes | col_len | Length of the column (0 if memo) | 379+-------------------------------------------------------------------------+ 380| Iterate for the number of num_cols (n bytes per column) | 381+-------------------------------------------------------------------------+ 382| ???? | 1 byte | col_name_len| len of the name of the column | 383| ???? | n bytes | col_name | Name of the column | 384+-------------------------------------------------------------------------+ 385| Iterate for the number of num_real_idx (30+9 = 39 bytes) | 386+-------------------------------------------------------------------------+ 387| Iterate 10 times for 10 possible columns (10*3 = 30 bytes) | 388+-------------------------------------------------------------------------+ 389| ???? | 2 bytes | col_num | number of a column (0xFFFF= none) | 390| ???? | 1 byte | col_order | 0x01 = ascendency order | 391+-------------------------------------------------------------------------+ 392| ???? | 4 bytes | used_pages | Points to usage bitmap for index | 393| ???? | 4 bytes | first_dp | Data pointer of the index page | 394| ???? | 1 byte | flags | See flags table for indexes | 395+-------------------------------------------------------------------------+ 396| Iterate for the number of num_idx (20 bytes) | 397+-------------------------------------------------------------------------+ 398| ???? | 4 bytes | index_num | Number of the index | 399| | | |(warn: not always in the sequential order)| 400| ???? | 4 bytes | index_num2 | Index into index cols list | 401| 0x00 | 1 byte | rel_tbl_type| type of the other table in this fk | 402| | | | (same values as index_type) | 403| 0xFF | 4 bytes | rel_idx_num | index number of other index in fk | 404| | | | (or -1 if this index is not a fk) | 405| 0x00 | 4 bytes | rel_tbl_page| page number of other table in fk | 406| 0x01 | 1 byte | cascade_ups | flag indicating if updates are cascaded | 407| 0x01 | 1 byte | cascade_dels| flag indicating if deletes are cascaded | 408| ???? | 1 byte | index_type | 0x01 if index is primary, 0x02 if foreign| 409+-------------------------------------------------------------------------+ 410| Iterate for the number of num_idx | 411+-------------------------------------------------------------------------+ 412| ???? | 1 byte | idx_name_len| len of the name of the index | 413| ???? | n bytes | idx_name | Name of the index | 414+-------------------------------------------------------------------------+ 415| Iterate while col_num != 0xffff | 416+-------------------------------------------------------------------------+ 417| ???? | 2 bytes | col_num | Column number with variable length | 418| ???? | 4 bytes | used_pages | Points to a record containing the | 419| | | | usage bitmask for this column. | 420| ???? | 4 bytes | free_pages | Points to a similar record as above, | 421| | | | listing pages which contain free space. | 422+-------------------------------------------------------------------------+ 423 424+-------------------------------------------------------------------------+ 425| Jet4 Table Definition Block (55 bytes) | 426+------+---------+-------------+------------------------------------------+ 427| data | length | name | description | 428+------+---------+-------------+------------------------------------------+ 429| ???? | 4 bytes | tdef_len | Length of the data for this page | 430| ???? | 4 bytes | unknown | unknown | 431| ???? | 4 bytes | num_rows | Number of records in this table | 432| 0x00 | 4 bytes | autonumber | value for the next value of the | 433| | | | autonumber column, if any. 0 otherwise | 434| 0x01 | 1 byte | autonum_flag| 0x01 makes autonumbers work in access | 435| ???? | 3 bytes | unknown | unknown | 436| 0x00 | 4 bytes | ct_autonum | autonumber value for complex type column(s) | 437| | | | (shared across all columns in the table) | 438| ???? | 8 bytes | unknown | unknown | 439| 0x4e | 1 byte | table_type | 0x4e: user table, 0x53: system table | 440| ???? | 2 bytes | max_cols | Max columns a row will have (deletions) | 441| ???? | 2 bytes | num_var_cols| Number of variable columns in table | 442| ???? | 2 bytes | num_cols | Number of columns in table (repeat) | 443| ???? | 4 bytes | num_idx | Number of logical indexes in table | 444| ???? | 4 bytes | num_real_idx| Number of index entries | 445| ???? | 4 bytes | used_pages | Points to a record containing the | 446| | | | usage bitmask for this table. | 447| ???? | 4 bytes | free_pages | Points to a similar record as above, | 448| | | | listing pages which contain free space. | 449+-------------------------------------------------------------------------+ 450| Iterate for the number of num_real_idx (12 bytes per idxs) | 451+-------------------------------------------------------------------------+ 452| 0x00 | 4 bytes | ??? | | 453| ???? | 4 bytes | num_idx_rows| (not sure) | 454| 0x00 | 4 bytes | ??? | | 455+-------------------------------------------------------------------------+ 456| Iterate for the number of num_cols (25 bytes per column) | 457+-------------------------------------------------------------------------+ 458| ???? | 1 byte | col_type | Column Type (see table below) | 459| ???? | 4 bytes | unknown | matches first unknown definition block | 460| ???? | 2 bytes | col_num | Column Number (includes deleted columns) | 461| ???? | 2 bytes | offset_V | Offset for variable length columns | 462| ???? | 2 bytes | col_num | Column Number | 463| ???? | 2 bytes | misc | prec/scale (1 byte each), or sort order | 464| | | | for textual columns(0x409=General) | 465| | | | or "complexid" for complex columns (4bytes)| 466| ???? | 2 bytes | misc_ext | text sort order version num is 2nd byte | 467| ???? | 1 byte | bitmask | See column flags below | 468| ???? | 1 byte | misc_flags | 0x01 for compressed unicode | 469| 0000 | 4 bytes | ??? | | 470| ???? | 2 bytes | offset_F | Offset for fixed length columns | 471| ???? | 2 bytes | col_len | Length of the column (0 if memo/ole) | 472+-------------------------------------------------------------------------+ 473| Iterate for the number of num_cols (n*2 bytes per column) | 474+-------------------------------------------------------------------------+ 475| ???? | 2 bytes | col_name_len| len of the name of the column | 476| ???? | n bytes | col_name | Name of the column (UCS-2 format) | 477+-------------------------------------------------------------------------+ 478| Iterate for the number of num_real_idx (30+22 = 52 bytes) | 479+-------------------------------------------------------------------------+ 480| ???? | 4 bytes | ??? | | 481+-------------------------------------------------------------------------+ 482| Iterate 10 times for 10 possible columns (10*3 = 30 bytes) | 483+-------------------------------------------------------------------------+ 484| ???? | 2 bytes | col_num | number of a column (0xFFFF= none) | 485| ???? | 1 byte | col_order | 0x01 = ascendency order | 486+-------------------------------------------------------------------------+ 487| ???? | 4 bytes | used_pages | Points to usage bitmap for index | 488| ???? | 4 bytes | first_dp | Data pointer of the index page | 489| ???? | 1 byte | flags | See flags table for indexes | 490| ???? | 9 bytes | unknown | | 491+-------------------------------------------------------------------------+ 492| Iterate for the number of num_idx (28 bytes) | 493+-------------------------------------------------------------------------+ 494| ???? | 4 bytes | unknown | matches first unknown definition block | 495| ???? | 4 bytes | index_num | Number of the index | 496| | | |(warn: not always in the sequential order)| 497| ???? | 4 bytes | index_num2 | Index into index cols list | 498| 0x00 | 1 byte | rel_tbl_type| type of the other table in this fk | 499| | | | (same values as index_type) | 500| 0xFF | 4 bytes | rel_idx_num | index number of other index in fk | 501| | | | (or -1 if this index is not a fk) | 502| 0x00 | 4 bytes | rel_tbl_page| page number of other table in fk | 503| 0x01 | 1 byte | cascade_ups | flag indicating if updates are cascaded | 504| 0x01 | 1 byte | cascade_dels| flag indicating if deletes are cascaded | 505| ???? | 1 byte | index_type | 0x01 if index is primary, 0x02 if foreign| 506+-------------------------------------------------------------------------+ 507| Iterate for the number of num_idx | 508+-------------------------------------------------------------------------+ 509| ???? | 2 bytes | idx_name_len| len of the name of the index | 510| ???? | n bytes | idx_name | Name of the index (UCS-2) | 511+-------------------------------------------------------------------------+ 512| Iterate while col_num != 0xffff | 513+-------------------------------------------------------------------------+ 514| ???? | 2 bytes | col_num | Column number with variable length | 515| ???? | 4 bytes | used_pages | Points to a record containing the | 516| | | | usage bitmask for this column. | 517| ???? | 4 bytes | free_pages | Points to a similar record as above, | 518| | | | listing pages which contain free space. | 519+-------------------------------------------------------------------------+ 520``` 521 522Columns flags (not complete): 523 524- 0x01: fixed length column 525- 0x02: can be null (possibly related to joins?) 526- 0x04: is auto long 527- 0x10: replication related field (or hidden?). These columns start with "s_" or 528 "Gen_" (the "Gen_" fields are for memo fields) 529- 0x40: is auto guid 530- 0x80: hyperlink. Syntax is "Link Title#http://example.com/somepage.html#" or 531 "#PAGE.HTM#" 532 533In Access 2007 and Access 2010, "Complex Columns" (multivalued fields, version 534history, attachments) always have the flag byte set to exactly 0x07. 535 536Index flags (not complete): 537- 0x01 Unique 538- 0x02 IgnoreNuls 539- 0x08 Required 540 541Column Type may be one of the following (not complete): 542``` 543 BOOL = 0x01 /* Boolean ( 1 bit ) */ 544 BYTE = 0x02 /* Byte ( 8 bits) */ 545 INT = 0x03 /* Integer (16 bits) */ 546 LONGINT = 0x04 /* Long Integer (32 bits) */ 547 MONEY = 0x05 /* Currency (64 bits) */ 548 FLOAT = 0x06 /* Single (32 bits) */ 549 DOUBLE = 0x07 /* Double (64 bits) */ 550 DATETIME = 0x08 /* Date/Time (64 bits) */ 551 BINARY = 0x09 /* Binary (255 bytes) */ 552 TEXT = 0x0A /* Text (255 bytes) */ 553 OLE = 0x0B /* OLE = Long binary */ 554 MEMO = 0x0C /* Memo = Long text*/ 555 UNKNOWN_0D = 0x0D 556 UNKNOWN_0E = 0x0E 557 REPID = 0x0F /* GUID */ 558 NUMERIC = 0x10 /* Scaled decimal (17 bytes) */ 559 560``` 561 562Notes on reading index metadata: 563 564There are 2 types of index metadata, "physical" index info (denoted by 565num_real_idx) and "logical" index info (denoted by num_idx). Normally, there 566is a 1 to 1 relationship between these 2 types of information. However there 567can be more logical index infos than physical index infos (currently only seen 568for foreign key indexes). In this situation, one or more of the logical 569indexes actually share the same underlying physical index (the index_num2 570indicates which physical index backs which logical index). 571 572As noted in the previous paragraph, physical index sharing is generally only 573seen when a foreign key index has been created. When access creates a 574relationship between 2 tables with "enforce referential integrity" enabled, 575each of the tables gets an extra logical index with type 2 (foreign key). 576These logical indexes contain extra information, primarily pointers to the 577related table (rel_tbl_page) and logical index (rel_idx_num). Also, the 578rel_tbl_type value indicates which table in the relationship is the "primary" 579table (the one one from which cascaded updates/deletes flow). If the indexed 580columns for the foreign key are already indexed by another logical index in 581the table (e.g. an index which the user has explicitly created), then the 582logical foreign key index will simply share the underlying physical index 583data. 584 585Notes on deleted and added columns: (sort of Jet4 specific) 586 587If a fixed length column is deleted the offset_F field will contain the offsets 588of the original row definition. Thus if the number of columns on the row does 589not match the number in the tdef, the offset_F field could be used to return 590the proper data. Columns are never really deleted in the row data. The deleted 591column will forever exist and be set to null for new rows. 592 593A row may have less than max_cols columns but will never have more, as max_cols 594is never decremented. If you have a table with 6 columns, delete one, and add 595one, then max_cols will be 7. 596 597For variable length columns, offset_V will hold the position in the offset table 598of that column. Missing columns are set to null for new rows. 599 600 601Page Usage Maps 602--------------- 603 604There are three uses for the page usage bitmaps. There is a global page usage 605stored on page 1 which tracks allocated pages throughout the database. 606 607Tables store two page usage bitmaps. One is a straight map of which pages are 608owned by the table. The second is a map of the pages owned by the table which 609have free space on them (used for inserting data). 610 611The table bitmaps appear to be of a fixed size for both Jet 3 and 4 (128 and 64 612bytes respectively). The first byte of the map is a type field. 613 614``` 615+--------------------------------------------------------------------------+ 616| Type 0 Page Usage Map | 617+------+---------+---------------------------------------------------------+ 618| data | length | name | description | 619+------+---------+---------------------------------------------------------+ 620| 0x00 | 1 byte | map_type | 0x00 indicates map stored within. | 621| ???? | 4 byte | page_start | first page for which this map applies | 622+------+---------+---------------------------------------------------------+ 623| Iterate for the length of map | 624+--------------------------------------------------------------------------+ 625| ???? | 1 byte | bitmap | each bit encodes the allocation status of a| 626| | | | page. 1 indicates allocated to this table. | 627| | | | Pages are stored starting with the low | 628| | | | order bit of the first byte. | 629+--------------------------------------------------------------------------+ 630``` 631 632If you're paying attention then you'll realize that the relatively small size of 633the map (128*8*2048 or 64*8*4096 = 2 Meg) means that this scheme won't work with 634larger database files although the initial start page helps a bit. To overcome 635this there is a second page usage map scheme with the map_type of 0x01. 636 637``` 638+--------------------------------------------------------------------------+ 639| Type 1 Page Usage Map | 640+------+---------+---------------------------------------------------------+ 641| data | length | name | description | 642+------+---------+---------------------------------------------------------+ 643| 0x01 | 1 byte | map_type | 0x01 indicates this is a indirection list. | 644+------+---------+---------------------------------------------------------+ 645| Iterate for the length of map | 646+--------------------------------------------------------------------------+ 647| ???? | 4 bytes | map_page | pointer to page type 0x05 containing map | 648+--------------------------------------------------------------------------+ 649``` 650 651Note that the initial start page is gone and is reused for the first page 652indirection. The 0x05 type page header looks like: 653 654``` 655+--------------------------------------------------------------------------+ 656| Usage Map Page (type 0x05) | 657+------+---------+---------------------------------------------------------+ 658| data | length | name | description | 659+------+---------+---------------------------------------------------------+ 660| 0x05 | 1 byte | page_type | allocation map page | 661| 0x01 | 1 byte | unknown | always 1 as with other page types | 662| 0x00 | 2 bytes | unknown | | 663+------+---------+---------------------------------------------------------+ 664``` 665 666The rest of the page is the allocation bitmap following the same scheme (lsb 667to msb order, 1 bit per page) as a type 0 map. This yields a maximum of 6682044*8=16352 (jet3) or 4092*8 = 32736 (jet4) pages mapped per type 0x05 page. 669Given 128/4+1 = 33 or 64/4+1 = 17 page pointers per indirection row (remember 670the start page field is reused, thus the +1), this yields 33*16352*2048 = 1053 671Meg (jet3) or 17*32736*4096 = 2173 Meg (jet4) or enough to cover the maximum 672size of each of the database formats comfortably, so there is no reason to 673believe any other page map schemes exist. 674 675 676Indices 677------- 678 679Indices are not completely understood but here is what we know. 680 681``` 682+-------------------------------------------------------------------------+ 683| Index Page (type 0x03) | 684+------+---------+-------------+------------------------------------------+ 685| data | length | name | description | 686+------+---------+-------------+------------------------------------------+ 687| 0x03 | 1 bytes | page_type | 0x03 indicate an index page | 688| 0x01 | 1 bytes | unknown | | 689| ???? | 2 bytes | free_space | The free space at the end this page | 690| ???? | 4 bytes | parent_page | The page number of the TDEF for this idx | 691| ???? | 4 bytes | prev_page | Previous page at this index level | 692| ???? | 4 bytes | next_page | Next page at this index level | 693| ???? | 4 bytes | tail_page | Pointer to tail leaf page | 694| ???? | 2 bytes | pref_len | Length of the shared entry prefix | 695+-------------------------------------------------------------------------+ 696``` 697 698Index pages come in two flavors. 699 7000x04 pages are leaf pages which contain one entry for each row in the table. 701Each entry is composed of a flag, the indexed column values and a page/row 702pointer to the data. 703 7040x03 index pages make up the rest of the index tree and contain a flag, the 705indexed columns, the page/row that contains this entry, and the leaf page or 706intermediate (another 0x03 page) page pointer for which this is the first 707entry on. 708 709Both index types have a bitmask starting at 0x16(jet3) or 0x1b(jet4) which 710identifies the starting location of each index entry on this page. The first 711entry begins at offset 0xf8(jet3) or 0x1e0(jet4), and is not explicitly 712indicated in the bitmask. Note that the count in each byte begins with the 713low order bit. For example take the data: 714 715``` 71600 20 00 04 80 00 ... 717``` 718 719Convert the bytes to binary starting with the low order bit in each byte. v's 720mark where each entry begins: 721``` 722v v v v 7230000 0000 0000 0100 0000 0000 0010 0000 0000 0001 0000 0000 724-- 00 --- -- 20 --- -- 00 --- -- 04 --- -- 80 --- -- 00 --- 725``` 726 727As noted earlier, the first entry is implicit. The second entry begins at an 728offset of 13 (0xd) bytes from the first. The third entry 26 (0x1a) bytes from 729the first. The final entry starts at an offset of 39 (0x27) bytes from the 730first. In this example the rest of the mask (up to offset 0xf8/0x1e0) would be 731zero-filled and thus this last entry isn't an actual entry, but the stopping 732point of the data. 733 734For Jet3, (0xf8 - 0x16) * 8 = 0x710 and 0x800 - 0xf8 = 0x708. 735For Jet4, (0x1e0 - 0x1b) * 8 = 0xe28 and 0x1000 - 0x1e0 = 0xe20. 736So the mask just covers the page, including space to indicate if the last entry 737goes to the end of the page. One wonders why MS didn't use a row offset table 738like they did on data pages. It seems like it would have been easier and more 739flexible. 740 741So now we come to the index entries for type 0x03 pages which look like this: 742 743``` 744+-------------------------------------------------------------------------+ 745| Index Record | 746+------+---------+-------------+------------------------------------------+ 747| data | length | name | description | 748+------+---------+-------------+------------------------------------------+ 749| 0x7f | 1 byte | flags | 0x80 LSB, 0x7f MSB, 0x00 null? | 750| ???? | variable| indexed cols| indexed column data | 751| ???? | 3 bytes | data page | page containing row referred to by this | 752| | | | index entry | 753| ???? | 1 byte | data row | row number on that page of this entry | 754| ???? | 4 bytes | child page | next level index page containing this | 755| | | | entry as last entry. Could be a leaf | 756| | | | node. | 757+-------------------------------------------------------------------------+ 758``` 759 760The flag field is generally either 0x00, 0x7f, 0x80, or 0xFF. 0x80 is the 761one's complement of 0x7f and all text data in the index would then need to be 762negated. The reason for this negation is descending order. The 0x00 flag 763indicates that the key column is null (or 0xFF for descending order), and no 764data will follow, only the page pointer. In multicolumn indexes the flag 765field plus data is repeated for the number of columns participating in the 766key. Index entries are always sorted based on the lexicographical order of 767the entry bytes of the entire index entry (thus descending order is achieved 768by negating the bytes). The flag field ensures that null values are always 769sorted at the beginning (for ascending) or end (for descending) of the index. 770 771Note, there is a compression scheme utilizing a shared entry prefix. If an 772index page has a shared entry prefix (idicated by a pref_len > 0), then the 773first pref_len bytes from the first entry need to be pre-pended to every 774subsequent entry on the page to get the full entry bytes. For example, 775normally an index entry with an integer primary key would be 9 bytes (1 for 776the flags field, 4 for the integer, 4 for page/row). If the pref_len on the 777index page were 4, every entry after the first would then contain only 5 778bytes, where the first byte is the last octet of the encoded primary key field 779(integer) and the last four are the page/row pointer. Thus if the first key 780value on the page is 1 and it points to page 261 (00 01 05) row 3, it becomes: 781 782``` 7837f 00 00 00 01 00 01 05 03 784``` 785and the next index entry can be: 786 787``` 78802 00 01 05 04 789``` 790That is, the shared prefix is [7f 00 00 00], so the actual next entry is: 791 792``` 793[7f 00 00 00] 02 00 01 05 04 794``` 795 796so the key value is 2 (the last octet changes to 02) page 261 row 4. 797 798Access stores an 'alphabetic sort order' version of the text key columns in the 799index. Here is the encoding as we know it: 800 801``` 8020-9: 0x56-0x5f 803A-Z: 0x60-0x79 804a-z: 0x60-0x79 805``` 806 807Once converted into this (non-ascii) character set, the text value can be 808sorted in 'alphabetic' order using the lexicographical order of the entry 809bytes. A text column will end with a NULL (0x00 or 0xff if negated). 810 811Note, this encoding is the "General" sort order in Access 2000-2007 (1033, 812version 0). As of Access 2010, this is now called the "General legacy" sort 813order, and the 2010 "General" sort order is a new encoding (1033, vesion 1). 814 815The leaf page entries store the key column and the 3 byte page and 1 byte row 816number. 817 818The value of the index root page in the index definition may be an index page 819(type 0x03), an index leaf page (type 0x04) if there is only one index page, 820or (in the case of tables small enough to fit on one page) a data page 821(type 0x01). 822 823So to search the index, you need to convert your value into the alphabetic 824character set, compare against each index entry, and on successful comparison 825follow the page and row number to the data. Because text data is managled 826during this conversion there is no 'covered querys' possible on text columns. 827 828To conserve on frequent index updates, Jet also does something special when 829creating new leaf pages at the end of a primary key index (or other index 830where new values are generally added to the end of the index). The tail leaf 831page pointer of the last leaf node points to the new leaf page but the index 832tree is not otherwise updated. Since index entries in type 0x03 index pages 833point to the last entry in the page, adding a new entry to the end of a large 834index would cause updates all the way up the index tree. Instead, the tail 835page can be updated in isolation until it is full, and then moved into the 836index proper. In src/libmdb/index.c, the last leaf read is stored, once the 837index search has been exhausted by the normal search routine, it enters a 838"clean up mode" and reads the next leaf page pointer until it's null. 839 840Properties 841---------- 842 843Design View table definitions are stored in LvProp column of MSysObjects as OLE 844fields. They contain default values, description, format, required ... 845 846They start with a 32 bits header: 'KKD\0' in Jet3 and 'MR2\0' in Jet 4. 847 848Next come chunks. Each chunk starts with: 849 850- 32 bits length value (this includes the length) 851- 16 bits chunk type (0x0080 contains the names, 0x0000 and 0x0001 contain 852 the values. 0x0000 seems to contain information about the "main" object, 853 e.g. the table, and 0x0001 seems to contain information about other 854 objects, e.g. the table columns) 855 856``` 857Name chunk blocks (0x0080) simply contain occurences of: 85816 bit name length 859name 860For instance: 8610x0d 0x00 and 'AccessVersion' (AccessVersion is 13 bytes, 0x0d 0x00 intel order) 862``` 863 864Value chunk blocks (0x0000 and 0x0001) contain a header: 865- 32 bits length value (this includes the length) 866- 16 bits name length 867- name (0x0000 chunk blocks are not usually named, 0x0001 chunk blocks have the 868 column name to which the properties belong) 869 870Next comes one of more chunks of data: 871- 16 bit length value (this includes the length) 872- 8 bit ddl flag 873- 8 bit type 874- 16 bit name (index in the name array of above chunk 0x0080) 875- 16 bit value length field (non-inclusive) 876 value (07.53 for the AccessVersion example above) 877 878See ``props.c``` for an example. 879 880 881Text Data Type 882-------------- 883 884In Jet3, the encoding of text depends on the machine on which it was created. 885So for databases created on U.S. English systems, it can be expected that text 886is encoded in CP1252. This is the default used by mdbtools. If you know that 887another encoding has been used, you can override the default by setting the 888environment variable MDB_JET3_CHARSET. To find out what encodings will work on 889your system, run 'iconv -l'. 890 891In Jet4, the encoding can be either little-endian UCS-2, or a special 892compressed form of it. This compressed format begins with 0xff 0xfe. 893The string then starts in compressed mode, where characters with 0x00 for the 894most-significant byte do not encode it. In the compressed format, a 0x00 byte 895signals a change from compressed mode to uncompressed mode, or from 896uncompressed mode back to compressed mode. The string may end in either mode. 897Note that a string containing any character 0x##00 (UCS-2) will not be 898compressed. Also, the string will only be compressed if it really does make 899the string shorter as compared to uncompressed UCS-2. 900 901Programs that use mdbtools libraries will receive strings encoded in UTF-8 by 902default. This default can by overridden by setting the environment variable 903MDBICONV to the desired encoding. 904