1@section mmo backend 2The mmo object format is used exclusively together with Professor 3Donald E.@: Knuth's educational 64-bit processor MMIX. The simulator 4@command{mmix} which is available at 5@url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz} 6understands this format. That package also includes a combined 7assembler and linker called @command{mmixal}. The mmo format has 8no advantages feature-wise compared to e.g. ELF. It is a simple 9non-relocatable object format with no support for archives or 10debugging information, except for symbol value information and 11line numbers (which is not yet implemented in BFD). See 12@url{http://www-cs-faculty.stanford.edu/~knuth/mmix.html} for more 13information about MMIX. The ELF format is used for intermediate 14object files in the BFD implementation. 15 16@c We want to xref the symbol table node. A feature in "chew" 17@c requires that "commands" do not contain spaces in the 18@c arguments. Hence the hyphen in "Symbol-table". 19@menu 20* File layout:: 21* Symbol-table:: 22* mmo section mapping:: 23@end menu 24 25@node File layout, Symbol-table, mmo, mmo 26@subsection File layout 27The mmo file contents is not partitioned into named sections as 28with e.g.@: ELF. Memory areas is formed by specifying the 29location of the data that follows. Only the memory area 30@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so 31it is used for code (and constants) and the area 32@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for 33writable data. @xref{mmo section mapping}. 34 35There is provision for specifying ``special data'' of 65536 36different types. We use type 80 (decimal), arbitrarily chosen the 37same as the ELF @code{e_machine} number for MMIX, filling it with 38section information normally found in ELF objects. @xref{mmo 39section mapping}. 40 41Contents is entered as 32-bit words, xor:ed over previous 42contents, always zero-initialized. A word that starts with the 43byte @samp{0x98} forms a command called a @samp{lopcode}, where 44the next byte distinguished between the thirteen lopcodes. The 45two remaining bytes, called the @samp{Y} and @samp{Z} fields, or 46the @samp{YZ} field (a 16-bit big-endian number), are used for 47various purposes different for each lopcode. As documented in 48@url{http://www-cs-faculty.stanford.edu/~knuth/mmixal-intro.ps.gz}, 49the lopcodes are: 50 51@table @code 52@item lop_quote 530x98000001. The next word is contents, regardless of whether it 54starts with 0x98 or not. 55 56@item lop_loc 570x9801YYZZ, where @samp{Z} is 1 or 2. This is a location 58directive, setting the location for the next data to the next 5932-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}), 60plus @math{Y * 2^56}. Normally @samp{Y} is 0 for the text segment 61and 2 for the data segment. 62 63@item lop_skip 640x9802YYZZ. Increase the current location by @samp{YZ} bytes. 65 66@item lop_fixo 670x9803YYZZ, where @samp{Z} is 1 or 2. Store the current location 68as 64 bits into the location pointed to by the next 32-bit 69(@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y * 702^56}. 71 72@item lop_fixr 730x9804YYZZ. @samp{YZ} is stored into the current location plus 74@math{2 - 4 * YZ}. 75 76@item lop_fixrx 770x980500ZZ. @samp{Z} is 16 or 24. A value @samp{L} derived from 78the following 32-bit word are used in a manner similar to 79@samp{YZ} in lop_fixr: it is xor:ed into the current location 80minus @math{4 * L}. The first byte of the word is 0 or 1. If it 81is 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0, 82then @math{L = (@var{lowest 24 bits of word})}. 83 84@item lop_file 850x9806YYZZ. @samp{Y} is the file number, @samp{Z} is count of 8632-bit words. Set the file number to @samp{Y} and the line 87counter to 0. The next @math{Z * 4} bytes contain the file name, 88padded with zeros if the count is not a multiple of four. The 89same @samp{Y} may occur multiple times, but @samp{Z} must be 0 for 90all but the first occurrence. 91 92@item lop_line 930x9807YYZZ. @samp{YZ} is the line number. Together with 94lop_file, it forms the source location for the next 32-bit word. 95Note that for each non-lopcode 32-bit word, line numbers are 96assumed incremented by one. 97 98@item lop_spec 990x9808YYZZ. @samp{YZ} is the type number. Data until the next 100lopcode other than lop_quote forms special data of type @samp{YZ}. 101@xref{mmo section mapping}. 102 103Other types than 80, (or type 80 with a content that does not 104parse) is stored in sections named @code{.MMIX.spec_data.@var{n}} 105where @var{n} is the @samp{YZ}-type. The flags for such a 106sections say not to allocate or load the data. The vma is 0. 107Contents of multiple occurrences of special data @var{n} is 108concatenated to the data of the previous lop_spec @var{n}s. The 109location in data or code at which the lop_spec occurred is lost. 110 111@item lop_pre 1120x980901ZZ. The first lopcode in a file. The @samp{Z} field forms the 113length of header information in 32-bit words, where the first word 114tells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}. 115 116@item lop_post 1170x980a00ZZ. @math{Z > 32}. This lopcode follows after all 118content-generating lopcodes in a program. The @samp{Z} field 119denotes the value of @samp{rG} at the beginning of the program. 120The following @math{256 - Z} big-endian 64-bit words are loaded 121into global registers @samp{$G} @dots{} @samp{$255}. 122 123@item lop_stab 1240x980b0000. The next-to-last lopcode in a program. Must follow 125immediately after the lop_post lopcode and its data. After this 126lopcode follows all symbols in a compressed format 127(@pxref{Symbol-table}). 128 129@item lop_end 1300x980cYYZZ. The last lopcode in a program. It must follow the 131lop_stab lopcode and its data. The @samp{YZ} field contains the 132number of 32-bit words of symbol table information after the 133preceding lop_stab lopcode. 134@end table 135 136Note that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and 137@code{lop_fixo} are not generated by BFD, but are handled. They are 138generated by @code{mmixal}. 139 140This trivial one-label, one-instruction file: 141 142@example 143 :Main TRAP 1,2,3 144@end example 145 146can be represented this way in mmo: 147 148@example 149 0x98090101 - lop_pre, one 32-bit word with timestamp. 150 <timestamp> 151 0x98010002 - lop_loc, text segment, using a 64-bit address. 152 Note that mmixal does not emit this for the file above. 153 0x00000000 - Address, high 32 bits. 154 0x00000000 - Address, low 32 bits. 155 0x98060002 - lop_file, 2 32-bit words for file-name. 156 0x74657374 - "test" 157 0x2e730000 - ".s\0\0" 158 0x98070001 - lop_line, line 1. 159 0x00010203 - TRAP 1,2,3 160 0x980a00ff - lop_post, setting $255 to 0. 161 0x00000000 162 0x00000000 163 0x980b0000 - lop_stab for ":Main" = 0, serial 1. 164 0x203a4040 @xref{Symbol-table}. 165 0x10404020 166 0x4d206120 167 0x69016e00 168 0x81000000 169 0x980c0005 - lop_end; symbol table contained five 32-bit words. 170@end example 171@node Symbol-table, mmo section mapping, File layout, mmo 172@subsection Symbol table format 173From mmixal.w (or really, the generated mmixal.tex) in 174@url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}): 175``Symbols are stored and retrieved by means of a @samp{ternary 176search trie}, following ideas of Bentley and Sedgewick. (See 177ACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369; 178R.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@: 179Addison--Wesley, 1998), @samp{15.4}.) Each trie node stores a 180character, and there are branches to subtries for the cases where 181a given character is less than, equal to, or greater than the 182character in the trie. There also is a pointer to a symbol table 183entry if a symbol ends at the current node.'' 184 185So it's a tree encoded as a stream of bytes. The stream of bytes 186acts on a single virtual global symbol, adding and removing 187characters and signalling complete symbol points. Here, we read 188the stream and create symbols at the completion points. 189 190First, there's a control byte @code{m}. If any of the listed bits 191in @code{m} is nonzero, we execute what stands at the right, in 192the listed order: 193 194@example 195 (MMO3_LEFT) 196 0x40 - Traverse left trie. 197 (Read a new command byte and recurse.) 198 199 (MMO3_SYMBITS) 200 0x2f - Read the next byte as a character and store it in the 201 current character position; increment character position. 202 Test the bits of @code{m}: 203 204 (MMO3_WCHAR) 205 0x80 - The character is 16-bit (so read another byte, 206 merge into current character. 207 208 (MMO3_TYPEBITS) 209 0xf - We have a complete symbol; parse the type, value 210 and serial number and do what should be done 211 with a symbol. The type and length information 212 is in j = (m & 0xf). 213 214 (MMO3_REGQUAL_BITS) 215 j == 0xf: A register variable. The following 216 byte tells which register. 217 j <= 8: An absolute symbol. Read j bytes as the 218 big-endian number the symbol equals. 219 A j = 2 with two zero bytes denotes an 220 unknown symbol. 221 j > 8: As with j <= 8, but add (0x20 << 56) 222 to the value in the following j - 8 223 bytes. 224 225 Then comes the serial number, as a variant of 226 uleb128, but better named ubeb128: 227 Read bytes and shift the previous value left 7 228 (multiply by 128). Add in the new byte, repeat 229 until a byte has bit 7 set. The serial number 230 is the computed value minus 128. 231 232 (MMO3_MIDDLE) 233 0x20 - Traverse middle trie. (Read a new command byte 234 and recurse.) Decrement character position. 235 236 (MMO3_RIGHT) 237 0x10 - Traverse right trie. (Read a new command byte and 238 recurse.) 239@end example 240 241Let's look again at the @code{lop_stab} for the trivial file 242(@pxref{File layout}). 243 244@example 245 0x980b0000 - lop_stab for ":Main" = 0, serial 1. 246 0x203a4040 247 0x10404020 248 0x4d206120 249 0x69016e00 250 0x81000000 251@end example 252 253This forms the trivial trie (note that the path between ``:'' and 254``M'' is redundant): 255 256@example 257 203a ":" 258 40 / 259 40 / 260 10 \ 261 40 / 262 40 / 263 204d "M" 264 2061 "a" 265 2069 "i" 266 016e "n" is the last character in a full symbol, and 267 with a value represented in one byte. 268 00 The value is 0. 269 81 The serial number is 1. 270@end example 271 272@node mmo section mapping, , Symbol-table, mmo 273@subsection mmo section mapping 274The implementation in BFD uses special data type 80 (decimal) to 275encapsulate and describe named sections, containing e.g.@: debug 276information. If needed, any datum in the encapsulation will be 277quoted using lop_quote. First comes a 32-bit word holding the 278number of 32-bit words containing the zero-terminated zero-padded 279segment name. After the name there's a 32-bit word holding flags 280describing the section type. Then comes a 64-bit big-endian word 281with the section length (in bytes), then another with the section 282start address. Depending on the type of section, the contents 283might follow, zero-padded to 32-bit boundary. For a loadable 284section (such as data or code), the contents might follow at some 285later point, not necessarily immediately, as a lop_loc with the 286same start address as in the section description, followed by the 287contents. This in effect forms a descriptor that must be emitted 288before the actual contents. Sections described this way must not 289overlap. 290 291For areas that don't have such descriptors, synthetic sections are 292formed by BFD. Consecutive contents in the two memory areas 293@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and 294@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in 295sections named @code{.text} and @code{.data} respectively. If an area 296is not otherwise described, but would together with a neighboring 297lower area be less than @samp{0x40000000} bytes long, it is joined 298with the lower area and the gap is zero-filled. For other cases, 299a new section is formed, named @code{.MMIX.sec.@var{n}}. Here, 300@var{n} is a number, a running count through the mmo file, 301starting at 0. 302 303A loadable section specified as: 304 305@example 306 .section secname,"ax" 307 TETRA 1,2,3,4,-1,-2009 308 BYTE 80 309@end example 310 311and linked to address @samp{0x4}, is represented by the sequence: 312 313@example 314 0x98080050 - lop_spec 80 315 0x00000002 - two 32-bit words for the section name 316 0x7365636e - "secn" 317 0x616d6500 - "ame\0" 318 0x00000033 - flags CODE, READONLY, LOAD, ALLOC 319 0x00000000 - high 32 bits of section length 320 0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits 321 0x00000000 - high 32 bits of section address 322 0x00000004 - section address is 4 323 0x98010002 - 64 bits with address of following data 324 0x00000000 - high 32 bits of address 325 0x00000004 - low 32 bits: data starts at address 4 326 0x00000001 - 1 327 0x00000002 - 2 328 0x00000003 - 3 329 0x00000004 - 4 330 0xffffffff - -1 331 0xfffff827 - -2009 332 0x50000000 - 80 as a byte, padded with zeros. 333@end example 334 335Note that the lop_spec wrapping does not include the section 336contents. Compare this to a non-loaded section specified as: 337 338@example 339 .section thirdsec 340 TETRA 200001,100002 341 BYTE 38,40 342@end example 343 344This, when linked to address @samp{0x200000000000001c}, is 345represented by: 346 347@example 348 0x98080050 - lop_spec 80 349 0x00000002 - two 32-bit words for the section name 350 0x7365636e - "thir" 351 0x616d6500 - "dsec" 352 0x00000010 - flag READONLY 353 0x00000000 - high 32 bits of section length 354 0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits 355 0x20000000 - high 32 bits of address 356 0x0000001c - low 32 bits of address 0x200000000000001c 357 0x00030d41 - 200001 358 0x000186a2 - 100002 359 0x26280000 - 38, 40 as bytes, padded with zeros 360@end example 361 362For the latter example, the section contents must not be 363loaded in memory, and is therefore specified as part of the 364special data. The address is usually unimportant but might 365provide information for e.g.@: the DWARF 2 debugging format. 366