1=====================================
2The MSF File Format
3=====================================
4
5.. contents::
6   :local:
7
8.. _msf_layout:
9
10File Layout
11===========
12
13The MSF file format consists of the following components:
14
151. :ref:`msf_superblock`
162. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
173. Data
18
19Each component is stored as an indexed block, the length of which is specified
20in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
21following pattern (sometimes referred to as an "interval"):
22
231. 1 block of data
242. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
253. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
264. ``SuperBlock::BlockSize - 3`` blocks of data
27
28In the first interval, the first data block is used to store
29:ref:`msf_superblock`.
30
31The following diagram demonstrates the general layout of the file (\| denotes
32the end of an interval, and is for visualization purposes only):
33
34+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
35| Block Index | 0                     | 1                | 2                | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
36+=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
37| Meaning     | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data     | \| | Data | FPM1 | FPM2 | Data        | \| | ... |
38+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
39
40The file may end after any block, including immediately after a FPM1.
41
42.. note::
43  LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
44  variant), so the rest of this document will assume a block size of 4096.
45
46.. _msf_superblock:
47
48The Superblock
49==============
50At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
51follows:
52
53.. code-block:: c++
54
55  struct SuperBlock {
56    char FileMagic[sizeof(Magic)];
57    ulittle32_t BlockSize;
58    ulittle32_t FreeBlockMapBlock;
59    ulittle32_t NumBlocks;
60    ulittle32_t NumDirectoryBytes;
61    ulittle32_t Unknown;
62    ulittle32_t BlockMapAddr;
63  };
64
65- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
66  followed by the bytes ``1A 44 53 00 00 00``.
67- **BlockSize** - The block size of the internal file system.  Valid values are
68  512, 1024, 2048, and 4096 bytes.  Certain aspects of the MSF file layout vary
69  depending on the block sizes.  For the purposes of LLVM, we handle only block
70  sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
71- **FreeBlockMapBlock** - The index of a block within the file, at which begins
72  a bitfield representing the set of all blocks within the file which are "free"
73  (i.e. the data within that block is not used).  See :ref:`msf_freeblockmap`
74  for more information.
75  **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
76- **NumBlocks** - The total number of blocks in the file.  ``NumBlocks *
77  BlockSize`` should equal the size of the file on disk.
78- **NumDirectoryBytes** - The size of the stream directory, in bytes.  The
79  stream directory contains information about each stream's size and the set of
80  blocks that it occupies.  It will be described in more detail later.
81- **BlockMapAddr** - The index of a block within the MSF file.  At this block is
82  an array of ``ulittle32_t``'s listing the blocks that the stream directory
83  resides on.  For large MSF files, the stream directory (which describes the
84  block layout of each stream) may not fit entirely on a single block.  As a
85  result, this extra layer of indirection is introduced, whereby this block
86  contains the list of blocks that the stream directory occupies, and the stream
87  directory itself can be stitched together accordingly.  The number of
88  ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes /
89  BlockSize)``.
90
91.. _msf_freeblockmap:
92
93The Free Block Map
94==================
95
96The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
97series of blocks which contains a bit flag for every block in the file. The
98flag will be set to 0 if the block is in use, and 1 if the block is unused.
99
100Each file contains two FPMs, one of which is active at any given time. This
101feature is designed to support incremental and atomic updates of the underlying
102MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
103write your new modified bitfield to FPM2, and vice versa. Only when you commit
104the file to disk do you need to swap the value in the SuperBlock to point to
105the new ``FreeBlockMapBlock``.
106
107The Free Block Maps are stored as a series of single blocks throughout the file
108at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
109bytes, it contains 8 times as many bits as an interval has blocks. This means
110that the first block of each FPM refers to the first 8 intervals of the file
111(the first 32768 blocks), the second block of each FPM refers to the next 8
112blocks, and so on. This results in far more FPM blocks being present than are
113required, but in order to maintain backwards compatibility the format must stay
114this way.
115
116The Stream Directory
117====================
118The Stream Directory is the root of all access to the other streams in an MSF
119file.  Beginning at byte 0 of the stream directory is the following structure:
120
121.. code-block:: c++
122
123  struct StreamDirectory {
124    ulittle32_t NumStreams;
125    ulittle32_t StreamSizes[NumStreams];
126    ulittle32_t StreamBlocks[NumStreams][];
127  };
128
129And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
130Note that each of the last two arrays is of variable length, and in particular
131that the second array is jagged.
132
133**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
134streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
135
136Stream 0: ceil(1000 / 4096) = 1 block
137
138Stream 1: ceil(8000 / 4096) = 2 blocks
139
140Stream 2: ceil(16000 / 4096) = 4 blocks
141
142Stream 3: ceil(9000 / 4096) = 3 blocks
143
144In total, 10 blocks are used.  Let's see what the stream directory might look
145like:
146
147.. code-block:: c++
148
149  struct StreamDirectory {
150    ulittle32_t NumStreams = 4;
151    ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
152    ulittle32_t StreamBlocks[][] = {
153      {4},
154      {5, 6},
155      {11, 9, 7, 8},
156      {10, 15, 12}
157    };
158  };
159
160In total, this occupies ``15 * 4 = 60`` bytes, so
161``SuperBlock->NumDirectoryBytes`` would equal ``60``, and
162``SuperBlock->BlockMapAddr`` would be an array of one ``ulittle32_t``, since
163``60 <= SuperBlock->BlockSize``.
164
165Note also that the streams are discontiguous, and that part of stream 3 is in the
166middle of part of stream 2.  You cannot assume anything about the layout of the
167blocks!
168
169Alignment and Block Boundaries
170==============================
171As may be clear by now, it is possible for a single field (whether it be a high
172level record, a long string field, or even a single ``uint16``) to begin and
173end in separate blocks.  For example, if the block size is 4096 bytes, and a
174``uint16`` field begins at the last byte of the current block, then it would
175need to end on the first byte of the next block.  Since blocks are not
176necessarily contiguously laid out in the file, this means that both the consumer
177and the producer of an MSF file must be prepared to split data apart
178accordingly.  In the aforementioned example, the high byte of the ``uint16``
179would be written to the last byte of block N, and the low byte would be written
180to the first byte of block N+1, which could be tens of thousands of bytes later
181(or even earlier!) in the file, depending on what the stream directory says.
182