1@c PSPP - a program for statistical analysis. 2@c Copyright (C) 2019 Free Software Foundation, Inc. 3@c Permission is granted to copy, distribute and/or modify this document 4@c under the terms of the GNU Free Documentation License, Version 1.3 5@c or any later version published by the Free Software Foundation; 6@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. 7@c A copy of the license is included in the section entitled "GNU 8@c Free Documentation License". 9@c 10 11@node System File Format 12@appendix System File Format 13 14A system file encapsulates a set of cases and dictionary information 15that describes how they may be interpreted. This chapter describes 16the format of a system file. 17 18System files use four data types: 8-bit characters, 32-bit integers, 1964-bit integers, 20and 64-bit floating points, called here @code{char}, @code{int32}, 21@code{int64}, and 22@code{flt64}, respectively. Data is not necessarily aligned on a word 23or double-word boundary: the long variable name record (@pxref{Long 24Variable Names Record}) and very long string records (@pxref{Very Long 25String Record}) have arbitrary byte length and can therefore cause all 26data coming after them in the file to be misaligned. 27 28Integer data in system files may be big-endian or little-endian. A 29reader may detect the endianness of a system file by examining 30@code{layout_code} in the file header record 31(@pxref{layout_code,,@code{layout_code}}). 32 33Floating-point data in system files may nominally be in IEEE 754, IBM, 34or VAX formats. A reader may detect the floating-point format in use 35by examining @code{bias} in the file header record 36(@pxref{bias,,@code{bias}}). 37 38PSPP detects big-endian and little-endian integer formats in system 39files and translates as necessary. PSPP also detects the 40floating-point format in use, as well as the endianness of IEEE 754 41floating-point numbers, and translates as needed. However, only IEEE 42754 numbers with the same endianness as integer data in the same file 43have actually been observed in system files, and it is likely that 44other formats are obsolete or were never used. 45 46System files use a few floating point values for special purposes: 47 48@table @asis 49@item SYSMIS 50The system-missing value is represented by the largest possible 51negative number in the floating point format (@code{-DBL_MAX}). 52 53@item HIGHEST 54HIGHEST is used as the high end of a missing value range with an 55unbounded maximum. It is represented by the largest possible positive 56number (@code{DBL_MAX}). 57 58@item LOWEST 59LOWEST is used as the low end of a missing value range with an 60unbounded minimum. It was originally represented by the 61second-largest negative number (in IEEE 754 format, 62@code{0xffeffffffffffffe}). System files written by SPSS 21 and later 63instead use the largest negative number (@code{-DBL_MAX}), the same 64value as SYSMIS. This does not lead to ambiguity because LOWEST 65appears in system files only in missing value ranges, which never 66contain SYSMIS. 67@end table 68 69System files may use most character encodings based on an 8-bit unit. 70UTF-16 and UTF-32, based on wider units, appear to be unacceptable. 71@code{rec_type} in the file header record is sufficient to distinguish 72between ASCII and EBCDIC based encodings. The best way to determine 73the specific encoding in use is to consult the character encoding 74record (@pxref{Character Encoding Record}), if present, and failing 75that the @code{character_code} in the machine integer info record 76(@pxref{Machine Integer Info Record}). The same encoding should be 77used for the dictionary and the data in the file, although it is 78possible to artificially synthesize files that use different encodings 79(@pxref{Character Encoding Record}). 80 81@menu 82* System File Record Structure:: 83* File Header Record:: 84* Variable Record:: 85* Value Labels Records:: 86* Document Record:: 87* Machine Integer Info Record:: 88* Machine Floating-Point Info Record:: 89* Multiple Response Sets Records:: 90* Extra Product Info Record:: 91* Variable Display Parameter Record:: 92* Long Variable Names Record:: 93* Very Long String Record:: 94* Character Encoding Record:: 95* Long String Value Labels Record:: 96* Long String Missing Values Record:: 97* Data File and Variable Attributes Records:: 98* Extended Number of Cases Record:: 99* Other Informational Records:: 100* Dictionary Termination Record:: 101* Data Record:: 102@end menu 103 104@node System File Record Structure 105@section System File Record Structure 106 107System files are divided into records with the following format: 108 109@example 110int32 type; 111char data[]; 112@end example 113 114This header does not identify the length of the @code{data} or any 115information about what it contains, so the system file reader must 116understand the format of @code{data} based on @code{type}. However, 117records with type 7, called @dfn{extension records}, have a stricter 118format: 119 120@example 121int32 type; 122int32 subtype; 123int32 size; 124int32 count; 125char data[size * count]; 126@end example 127 128@table @code 129@item int32 rec_type; 130Record type. Always set to 7. 131 132@item int32 subtype; 133Record subtype. This value identifies a particular kind of extension 134record. 135 136@item int32 size; 137The size of each piece of data that follows the header, in bytes. 138Known extension records use 1, 4, or 8, for @code{char}, @code{int32}, 139and @code{flt64} format data, respectively. 140 141@item int32 count; 142The number of pieces of data that follow the header. 143 144@item char data[size * count]; 145Data, whose format and interpretation depend on the subtype. 146@end table 147 148An extension record contains exactly @code{size * count} bytes of 149data, which allows a reader that does not understand an extension 150record to skip it. Extension records provide only nonessential 151information, so this allows for files written by newer software to 152preserve backward compatibility with older or less capable readers. 153 154Records in a system file must appear in the following order: 155 156@itemize @bullet 157@item 158File header record. 159 160@item 161Variable records. 162 163@item 164All pairs of value labels records and value label variables records, 165if present. 166 167@item 168Document record, if present. 169 170@item 171Extension (type 7) records, in ascending numerical order of their 172subtypes. 173 174System files written by SPSS include at most one of each kind of 175extension record. This is generally true of system files written by 176other software as well, with known exceptions noted below in the 177individual sections about each type of record. 178 179@item 180Dictionary termination record. 181 182@item 183Data record. 184@end itemize 185 186We advise authors of programs that read system files to tolerate 187format variations. Various kinds of misformatting and corruption have 188been observed in system files written by SPSS and other software 189alike. In particular, because extension records provide nonessential 190information, it is generally better to ignore an extension record 191entirely than to refuse to read a system file. 192 193The following sections describe the known kinds of records. 194 195@node File Header Record 196@section File Header Record 197 198A system file begins with the file header, with the following format: 199 200@example 201char rec_type[4]; 202char prod_name[60]; 203int32 layout_code; 204int32 nominal_case_size; 205int32 compression; 206int32 weight_index; 207int32 ncases; 208flt64 bias; 209char creation_date[9]; 210char creation_time[8]; 211char file_label[64]; 212char padding[3]; 213@end example 214 215@table @code 216@item char rec_type[4]; 217Record type code, either @samp{$FL2} for system files with 218uncompressed data or data compressed with simple bytecode compression, 219or @samp{$FL3} for system files with ZLIB compressed data. 220 221This is truly a character field that uses the character encoding as 222other strings. Thus, in a file with an ASCII-based character encoding 223this field contains @code{24 46 4c 32} or @code{24 46 4c 33}, and in a 224file with an EBCDIC-based encoding this field contains @code{5b c6 d3 225f2}. (No EBCDIC-based ZLIB-compressed files have been observed.) 226 227@item char prod_name[60]; 228Product identification string. This always begins with the characters 229@samp{@@(#) SPSS DATA FILE}. PSPP uses the remaining characters to 230give its version and the operating system name; for example, @samp{GNU 231pspp 0.1.4 - sparc-sun-solaris2.5.2}. The string is truncated if it 232would be longer than 60 characters; otherwise it is padded on the right 233with spaces. 234 235The product name field allow readers to behave differently based on 236quirks in the way that particular software writes system files. 237@xref{Value Labels Records}, for the detail of the quirk that the PSPP 238system file reader tolerates in files written by ReadStat, which has 239@code{https://github.com/WizardMac/ReadStat} in @code{prod_name}. 240 241@anchor{layout_code} 242@item int32 layout_code; 243Normally set to 2, although a few system files have been spotted in 244the wild with a value of 3 here. PSPP use this value to determine the 245file's integer endianness (@pxref{System File Format}). 246 247@item int32 nominal_case_size; 248Number of data elements per case. This is the number of variables, 249except that long string variables add extra data elements (one for every 2508 characters after the first 8). However, string variables do not 251contribute to this value beyond the first 255 bytes. Further, some 252software always writes -1 or 0 in this field. In general, it is 253unsafe for systems reading system files to rely upon this value. 254 255@item int32 compression; 256Set to 0 if the data in the file is not compressed, 1 if the data is 257compressed with simple bytecode compression, 2 if the data is ZLIB 258compressed. This field has value 2 if and only if @code{rec_type} is 259@samp{$FL3}. 260 261@item int32 weight_index; 262If one of the variables in the data set is used as a weighting 263variable, set to the dictionary index of that variable, plus 1 264(@pxref{Dictionary Index}). Otherwise, set to 0. 265 266@item int32 ncases; 267Set to the number of cases in the file if it is known, or -1 otherwise. 268 269In the general case it is not possible to determine the number of cases 270that will be output to a system file at the time that the header is 271written. The way that this is dealt with is by writing the entire 272system file, including the header, then seeking back to the beginning of 273the file and writing just the @code{ncases} field. For files in which 274this is not valid, the seek operation fails. In this case, 275@code{ncases} remains -1. 276 277@anchor{bias} 278@item flt64 bias; 279Compression bias, ordinarily set to 100. Only integers between 280@code{1 - bias} and @code{251 - bias} can be compressed. 281 282By assuming that its value is 100, PSPP uses @code{bias} to determine 283the file's floating-point format and endianness (@pxref{System File 284Format}). If the compression bias is not 100, PSPP cannot auto-detect 285the floating-point format and assumes that it is IEEE 754 format with 286the same endianness as the system file's integers, which is correct 287for all known system files. 288 289@item char creation_date[9]; 290Date of creation of the system file, in @samp{dd mmm yy} 291format, with the month as standard English abbreviations, using an 292initial capital letter and following with lowercase. If the date is not 293available then this field is arbitrarily set to @samp{01 Jan 70}. 294 295@item char creation_time[8]; 296Time of creation of the system file, in @samp{hh:mm:ss} 297format and using 24-hour time. If the time is not available then this 298field is arbitrarily set to @samp{00:00:00}. 299 300@item char file_label[64]; 301File label declared by the user, if any (@pxref{FILE LABEL,,,pspp, 302PSPP Users Guide}). Padded on the right with spaces. 303 304A product that identifies itself as @code{VOXCO INTERVIEWER 4.3} uses 305CR-only line ends in this field, rather than the more usual LF-only or 306CR LF line ends. 307 308@item char padding[3]; 309Ignored padding bytes to make the structure a multiple of 32 bits in 310length. Set to zeros. 311@end table 312 313@node Variable Record 314@section Variable Record 315 316There must be one variable record for each numeric variable and each 317string variable with width 8 bytes or less. String variables wider 318than 8 bytes have one variable record for each 8 bytes, rounding up. 319The first variable record for a long string specifies the variable's 320correct dictionary information. Subsequent variable records for a 321long string are filled with dummy information: a type of -1, no 322variable label or missing values, print and write formats that are 323ignored, and an empty string as name. A few system files have been 324encountered that include a variable label on dummy variable records, 325so readers should take care to parse dummy variable records in the 326same way as other variable records. 327 328@anchor{Dictionary Index} 329The @dfn{dictionary index} of a variable is a 1-based offset in the set of 330variable records, including dummy variable records for long string 331variables. The first variable record has a dictionary index of 1, the 332second has a dictionary index of 2, and so on. 333 334The system file format does not directly support string variables 335wider than 255 bytes. Such very long string variables are represented 336by a number of narrower string variables. @xref{Very Long String 337Record}, for details. 338 339A system file should contain at least one variable and thus at least 340one variable record, but system files have been observed in the wild 341without any variables (thus, no data either). 342 343@example 344int32 rec_type; 345int32 type; 346int32 has_var_label; 347int32 n_missing_values; 348int32 print; 349int32 write; 350char name[8]; 351 352/* @r{Present only if @code{has_var_label} is 1.} */ 353int32 label_len; 354char label[]; 355 356/* @r{Present only if @code{n_missing_values} is nonzero}. */ 357flt64 missing_values[]; 358@end example 359 360@table @code 361@item int32 rec_type; 362Record type code. Always set to 2. 363 364@item int32 type; 365Variable type code. Set to 0 for a numeric variable. For a short 366string variable or the first part of a long string variable, this is set 367to the width of the string. For the second and subsequent parts of a 368long string variable, set to -1, and the remaining fields in the 369structure are ignored. 370 371@item int32 has_var_label; 372If this variable has a variable label, set to 1; otherwise, set to 0. 373 374@item int32 n_missing_values; 375If the variable has no missing values, set to 0. If the variable has 376one, two, or three discrete missing values, set to 1, 2, or 3, 377respectively. If the variable has a range for missing variables, set to 378-2; if the variable has a range for missing variables plus a single 379discrete value, set to -3. 380 381A long string variable always has the value 0 here. A separate record 382indicates missing values for long string variables (@pxref{Long String 383Missing Values Record}). 384 385@item int32 print; 386Print format for this variable. See below. 387 388@item int32 write; 389Write format for this variable. See below. 390 391@item char name[8]; 392Variable name. The variable name must begin with a capital letter or 393the at-sign (@samp{@@}). Subsequent characters may also be digits, octothorpes 394(@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full 395stops (@samp{.}). The variable name is padded on the right with spaces. 396 397The @samp{name} fields should be unique within a system file. System 398files written by SPSS that contain very long string variables with 399similar names sometimes contain duplicate names that are later 400eliminated by resolving the very long string names (@pxref{Very Long 401String Record}). PSPP handles duplicates by assigning them new, 402unique names. 403 404@item int32 label_len; 405This field is present only if @code{has_var_label} is set to 1. It is 406set to the length, in characters, of the variable label. The 407documented maximum length varies from 120 to 255 based on SPSS 408version, but some files have been seen with longer labels. PSPP 409accepts labels of any length. 410 411@item char label[]; 412This field is present only if @code{has_var_label} is set to 1. It has 413length @code{label_len}, rounded up to the nearest multiple of 32 bits. 414The first @code{label_len} characters are the variable's variable label. 415 416@item flt64 missing_values[]; 417This field is present only if @code{n_missing_values} is nonzero. It 418has the same number of 8-byte elements as the absolute value of 419@code{n_missing_values}. Each element is interpreted as a number for 420numeric variables (with HIGHEST and LOWEST indicated as described in 421the chapter introduction). For string variables of width less than 8 422bytes, elements are right-padded with spaces; for string variables 423wider than 8 bytes, only the first 8 bytes of each missing value are 424specified, with the remainder implicitly all spaces. 425 426For discrete missing values, each element represents one missing 427value. When a range is present, the first element denotes the minimum 428value in the range, and the second element denotes the maximum value 429in the range. When a range plus a value are present, the third 430element denotes the additional discrete missing value. 431@end table 432 433@anchor{System File Output Formats} 434The @code{print} and @code{write} members of sysfile_variable are output 435formats coded into @code{int32} types. The least-significant byte 436of the @code{int32} represents the number of decimal places, and the 437next two bytes in order of increasing significance represent field width 438and format type, respectively. The most-significant byte is not 439used and should be set to zero. 440 441Format types are defined as follows: 442 443@quotation 444@multitable {Value} {@code{DATETIME}} 445@headitem Value 446@tab Meaning 447@item 0 448@tab Not used. 449@item 1 450@tab @code{A} 451@item 2 452@tab @code{AHEX} 453@item 3 454@tab @code{COMMA} 455@item 4 456@tab @code{DOLLAR} 457@item 5 458@tab @code{F} 459@item 6 460@tab @code{IB} 461@item 7 462@tab @code{PIBHEX} 463@item 8 464@tab @code{P} 465@item 9 466@tab @code{PIB} 467@item 10 468@tab @code{PK} 469@item 11 470@tab @code{RB} 471@item 12 472@tab @code{RBHEX} 473@item 13 474@tab Not used. 475@item 14 476@tab Not used. 477@item 15 478@tab @code{Z} 479@item 16 480@tab @code{N} 481@item 17 482@tab @code{E} 483@item 18 484@tab Not used. 485@item 19 486@tab Not used. 487@item 20 488@tab @code{DATE} 489@item 21 490@tab @code{TIME} 491@item 22 492@tab @code{DATETIME} 493@item 23 494@tab @code{ADATE} 495@item 24 496@tab @code{JDATE} 497@item 25 498@tab @code{DTIME} 499@item 26 500@tab @code{WKDAY} 501@item 27 502@tab @code{MONTH} 503@item 28 504@tab @code{MOYR} 505@item 29 506@tab @code{QYR} 507@item 30 508@tab @code{WKYR} 509@item 31 510@tab @code{PCT} 511@item 32 512@tab @code{DOT} 513@item 33 514@tab @code{CCA} 515@item 34 516@tab @code{CCB} 517@item 35 518@tab @code{CCC} 519@item 36 520@tab @code{CCD} 521@item 37 522@tab @code{CCE} 523@item 38 524@tab @code{EDATE} 525@item 39 526@tab @code{SDATE} 527@item 40 528@tab @code{MTIME} 529@item 41 530@tab @code{YMDHMS} 531@end multitable 532@end quotation 533 534A few system files have been observed in the wild with invalid 535@code{write} fields, in particular with value 0. Readers should 536probably treat invalid @code{print} or @code{write} fields as some 537default format. 538 539@node Value Labels Records 540@section Value Labels Records 541 542The value label records documented in this section are used for 543numeric and short string variables only. Long string variables may 544have value labels, but their value labels are recorded using a 545different record type (@pxref{Long String Value Labels Record}). 546 547ReadStat (@pxref{File Header Record}) writes value labels that label a 548single value more than once. In more detail, it emits value labels 549whose values are longer than string variables' widths, that are 550identical in the actual width of the variable, e.g.@: labels for 551values @code{ABC123} and @code{ABC456} for a string variable with 552width 3. For files written by this software, PSPP ignores such 553labels. 554 555The value label record has the following format: 556 557@example 558int32 rec_type; 559int32 label_count; 560 561/* @r{Repeated @code{label_cnt} times}. */ 562char value[8]; 563char label_len; 564char label[]; 565@end example 566 567@table @code 568@item int32 rec_type; 569Record type. Always set to 3. 570 571@item int32 label_count; 572Number of value labels present in this record. 573@end table 574 575The remaining fields are repeated @code{count} times. Each 576repetition specifies one value label. 577 578@table @code 579@item char value[8]; 580A numeric value or a short string value padded as necessary to 8 bytes 581in length. Its type and width cannot be determined until the 582following value label variables record (see below) is read. 583 584@item char label_len; 585The label's length, in bytes. The documented maximum length varies 586from 60 to 120 based on SPSS version. PSPP supports value labels up 587to 255 bytes long. 588 589@item char label[]; 590@code{label_len} bytes of the actual label, followed by up to 7 bytes 591of padding to bring @code{label} and @code{label_len} together to a 592multiple of 8 bytes in length. 593@end table 594 595The value label record is always immediately followed by a value label 596variables record with the following format: 597 598@example 599int32 rec_type; 600int32 var_count; 601int32 vars[]; 602@end example 603 604@table @code 605@item int32 rec_type; 606Record type. Always set to 4. 607 608@item int32 var_count; 609Number of variables that the associated value labels from the value 610label record are to be applied. 611 612@item int32 vars[]; 613A list of 1-based dictionary indexes of variables to which to apply the value 614labels (@pxref{Dictionary Index}). There are @code{var_count} 615elements. 616 617String variables wider than 8 bytes may not be specified in this list. 618@end table 619 620@node Document Record 621@section Document Record 622 623The document record, if present, has the following format: 624 625@example 626int32 rec_type; 627int32 n_lines; 628char lines[][80]; 629@end example 630 631@table @code 632@item int32 rec_type; 633Record type. Always set to 6. 634 635@item int32 n_lines; 636Number of lines of documents present. This should be greater than 637zero, but ReadStats writes system files with zero @code{n_lines}. 638 639@item char lines[][80]; 640Document lines. The number of elements is defined by @code{n_lines}. 641Lines shorter than 80 characters are padded on the right with spaces. 642@end table 643 644@node Machine Integer Info Record 645@section Machine Integer Info Record 646 647The integer info record, if present, has the following format: 648 649@example 650/* @r{Header.} */ 651int32 rec_type; 652int32 subtype; 653int32 size; 654int32 count; 655 656/* @r{Data.} */ 657int32 version_major; 658int32 version_minor; 659int32 version_revision; 660int32 machine_code; 661int32 floating_point_rep; 662int32 compression_code; 663int32 endianness; 664int32 character_code; 665@end example 666 667@table @code 668@item int32 rec_type; 669Record type. Always set to 7. 670 671@item int32 subtype; 672Record subtype. Always set to 3. 673 674@item int32 size; 675Size of each piece of data in the data part, in bytes. Always set to 4. 676 677@item int32 count; 678Number of pieces of data in the data part. Always set to 8. 679 680@item int32 version_major; 681PSPP major version number. In version @var{x}.@var{y}.@var{z}, this 682is @var{x}. 683 684@item int32 version_minor; 685PSPP minor version number. In version @var{x}.@var{y}.@var{z}, this 686is @var{y}. 687 688@item int32 version_revision; 689PSPP version revision number. In version @var{x}.@var{y}.@var{z}, 690this is @var{z}. 691 692@item int32 machine_code; 693Machine code. PSPP always set this field to value to -1, but other 694values may appear. 695 696@item int32 floating_point_rep; 697Floating point representation code. For IEEE 754 systems this is 1. 698IBM 370 sets this to 2, and DEC VAX E to 3. 699 700@item int32 compression_code; 701Compression code. Always set to 1, regardless of whether or how the 702file is compressed. 703 704@item int32 endianness; 705Machine endianness. 1 indicates big-endian, 2 indicates little-endian. 706 707@item int32 character_code; 708@anchor{character-code} Character code. The following values have 709been actually observed in system files: 710 711@table @asis 712@item 1 713EBCDIC. 714 715@item 2 7167-bit ASCII. 717 718@item 1250 719The @code{windows-1250} code page for Central European and Eastern 720European languages. 721 722@item 1252 723The @code{windows-1252} code page for Western European languages. 724 725@item 28591 726ISO 8859-1. 727 728@item 65001 729UTF-8. 730@end table 731 732The following additional values are known to be defined: 733 734@table @asis 735@item 3 7368-bit ``ASCII''. 737 738@item 4 739DEC Kanji. 740@end table 741 742Other Windows code page numbers are known to be generally valid. 743 744Old versions of SPSS for Unix and Windows always wrote value 2 in this 745field, regardless of the encoding in use. Newer versions also write 746the character encoding as a string (see @ref{Character Encoding 747Record}). 748@end table 749 750@node Machine Floating-Point Info Record 751@section Machine Floating-Point Info Record 752 753The floating-point info record, if present, has the following format: 754 755@example 756/* @r{Header.} */ 757int32 rec_type; 758int32 subtype; 759int32 size; 760int32 count; 761 762/* @r{Data.} */ 763flt64 sysmis; 764flt64 highest; 765flt64 lowest; 766@end example 767 768@table @code 769@item int32 rec_type; 770Record type. Always set to 7. 771 772@item int32 subtype; 773Record subtype. Always set to 4. 774 775@item int32 size; 776Size of each piece of data in the data part, in bytes. Always set to 8. 777 778@item int32 count; 779Number of pieces of data in the data part. Always set to 3. 780 781@item flt64 sysmis; 782@itemx flt64 highest; 783@itemx flt64 lowest; 784The system missing value, the value used for HIGHEST in missing 785values, and the value used for LOWEST in missing values, respectively. 786@xref{System File Format}, for more information. 787 788The SPSSWriter library in PHP, which identifies itself as @code{FOM 789SPSS 1.0.0} in the file header record @code{prod_name} field, writes 790unexpected values to these fields, but it uses the same values 791consistently throughout the rest of the file. 792@end table 793 794@node Multiple Response Sets Records 795@section Multiple Response Sets Records 796 797The system file format has two different types of records that 798represent multiple response sets (@pxref{MRSETS,,,pspp, PSPP Users 799Guide}). The first type of record describes multiple response sets 800that can be understood by SPSS before version 14. The second type of 801record, with a closely related format, is used for multiple dichotomy 802sets that use the CATEGORYLABELS=COUNTEDVALUES feature added in 803version 14. 804 805@example 806/* @r{Header.} */ 807int32 rec_type; 808int32 subtype; 809int32 size; 810int32 count; 811 812/* @r{Exactly @code{count} bytes of data.} */ 813char mrsets[]; 814@end example 815 816@table @code 817@item int32 rec_type; 818Record type. Always set to 7. 819 820@item int32 subtype; 821Record subtype. Set to 7 for records that describe multiple response 822sets understood by SPSS before version 14, or to 19 for records that 823describe dichotomy sets that use the CATEGORYLABELS=COUNTEDVALUES 824feature added in version 14. 825 826@item int32 size; 827The size of each element in the @code{mrsets} member. Always set to 1. 828 829@item int32 count; 830The total number of bytes in @code{mrsets}. 831 832@item char mrsets[]; 833Zero or more line feeds (byte 0x0a), followed by a series of multiple 834response sets, each of which consists of the following: 835 836@itemize @bullet 837@item 838The set's name (an identifier that begins with @samp{$}), in mixed 839upper and lower case. 840 841@item 842An equals sign (@samp{=}). 843 844@item 845@samp{C} for a multiple category set, @samp{D} for a multiple 846dichotomy set with CATEGORYLABELS=VARLABELS, or @samp{E} for a 847multiple dichotomy set with CATEGORYLABELS=COUNTEDVALUES. 848 849@item 850For a multiple dichotomy set with CATEGORYLABELS=COUNTEDVALUES, a 851space, followed by a number expressed as decimal digits, followed by a 852space. If LABELSOURCE=VARLABEL was specified on MRSETS, then the 853number is 11; otherwise it is 1.@footnote{This part of the format may 854not be fully understood, because only a single example of each 855possibility has been examined.} 856 857@item 858For either kind of multiple dichotomy set, the counted value, as a 859positive integer count specified as decimal digits, followed by a 860space, followed by as many string bytes as specified in the count. If 861the set contains numeric variables, the string consists of the counted 862integer value expressed as decimal digits. If the set contains string 863variables, the string contains the counted string value. Either way, 864the string may be padded on the right with spaces (older versions of 865SPSS seem to always pad to a width of 8 bytes; newer versions don't). 866 867@item 868A space. 869 870@item 871The multiple response set's label, using the same format as for the 872counted value for multiple dichotomy sets. A string of length 0 means 873that the set does not have a label. A string of length 0 is also 874written if LABELSOURCE=VARLABEL was specified. 875 876@item 877A space. 878 879@item 880The short names of the variables in the set, converted to lowercase, 881each separated from the previous by a single space. 882 883Even though a multiple response set must have at least two variables, 884some system files contain multiple response sets with no variables or 885one variable. The source and meaning of these multiple response sets is 886unknown. (Perhaps they arise from creating a multiple response set 887then deleting all the variables that it contains?) 888 889@item 890One line feed (byte 0x0a). Sometimes multiple, even hundreds, of line 891feeds are present. 892@end itemize 893@end table 894 895Example: Given appropriate variable definitions, consider the 896following MRSETS command: 897 898@example 899MRSETS /MCGROUP NAME=$a LABEL='my mcgroup' VARIABLES=a b c 900 /MDGROUP NAME=$b VARIABLES=g e f d VALUE=55 901 /MDGROUP NAME=$c LABEL='mdgroup #2' VARIABLES=h i j VALUE='Yes' 902 /MDGROUP NAME=$d LABEL='third mdgroup' CATEGORYLABELS=COUNTEDVALUES 903 VARIABLES=k l m VALUE=34 904 /MDGROUP NAME=$e CATEGORYLABELS=COUNTEDVALUES LABELSOURCE=VARLABEL 905 VARIABLES=n o p VALUE='choice'. 906@end example 907 908The above would generate the following multiple response set record of 909subtype 7: 910 911@example 912$a=C 10 my mcgroup a b c 913$b=D2 55 0 g e f d 914$c=D3 Yes 10 mdgroup #2 h i j 915@end example 916 917It would also generate the following multiple response set record with 918subtype 19: 919 920@example 921$d=E 1 2 34 13 third mdgroup k l m 922$e=E 11 6 choice 0 n o p 923@end example 924 925@node Extra Product Info Record 926@section Extra Product Info Record 927 928This optional record appears to contain a text string that describes 929the program that wrote the file and the source of the data. (This is 930redundant with the file label and product info found in the file 931header record.) 932 933@example 934/* @r{Header.} */ 935int32 rec_type; 936int32 subtype; 937int32 size; 938int32 count; 939 940/* @r{Exactly @code{count} bytes of data.} */ 941char info[]; 942@end example 943 944@table @code 945@item int32 rec_type; 946Record type. Always set to 7. 947 948@item int32 subtype; 949Record subtype. Always set to 10. 950 951@item int32 size; 952The size of each element in the @code{info} member. Always set to 1. 953 954@item int32 count; 955The total number of bytes in @code{info}. 956 957@item char info[]; 958A text string. A product that identifies itself as @code{VOXCO 959INTERVIEWER 4.3} uses CR-only line ends in this field, rather than the 960more usual LF-only or CR LF line ends. 961@end table 962 963@node Variable Display Parameter Record 964@section Variable Display Parameter Record 965 966The variable display parameter record, if present, has the following 967format: 968 969@example 970/* @r{Header.} */ 971int32 rec_type; 972int32 subtype; 973int32 size; 974int32 count; 975 976/* @r{Repeated @code{count} times}. */ 977int32 measure; 978int32 width; /* @r{Not always present.} */ 979int32 alignment; 980@end example 981 982@table @code 983@item int32 rec_type; 984Record type. Always set to 7. 985 986@item int32 subtype; 987Record subtype. Always set to 11. 988 989@item int32 size; 990The size of @code{int32}. Always set to 4. 991 992@item int32 count; 993The number of sets of variable display parameters (ordinarily the 994number of variables in the dictionary), times 2 or 3. 995@end table 996 997The remaining members are repeated @code{count} times, in the same 998order as the variable records. No element corresponds to variable 999records that continue long string variables. The meanings of these 1000members are as follows: 1001 1002@table @code 1003@item int32 measure; 1004The measurement type of the variable: 1005@table @asis 1006@item 1 1007Nominal Scale 1008@item 2 1009Ordinal Scale 1010@item 3 1011Continuous Scale 1012@end table 1013 1014SPSS sometimes writes a @code{measure} of 0. PSPP interprets this as 1015nominal scale. 1016 1017@item int32 width; 1018The width of the display column for the variable in characters. 1019 1020This field is present if @var{count} is 3 times the number of 1021variables in the dictionary. It is omitted if @var{count} is 2 times 1022the number of variables. 1023 1024@item int32 alignment; 1025The alignment of the variable for display purposes: 1026 1027@table @asis 1028@item 0 1029Left aligned 1030@item 1 1031Right aligned 1032@item 2 1033Centre aligned 1034@end table 1035@end table 1036 1037@node Long Variable Names Record 1038@section Long Variable Names Record 1039 1040If present, the long variable names record has the following format: 1041 1042@example 1043/* @r{Header.} */ 1044int32 rec_type; 1045int32 subtype; 1046int32 size; 1047int32 count; 1048 1049/* @r{Exactly @code{count} bytes of data.} */ 1050char var_name_pairs[]; 1051@end example 1052 1053@table @code 1054@item int32 rec_type; 1055Record type. Always set to 7. 1056 1057@item int32 subtype; 1058Record subtype. Always set to 13. 1059 1060@item int32 size; 1061The size of each element in the @code{var_name_pairs} member. Always set to 1. 1062 1063@item int32 count; 1064The total number of bytes in @code{var_name_pairs}. 1065 1066@item char var_name_pairs[]; 1067A list of @var{key}--@var{value} tuples, where @var{key} is the name 1068of a variable, and @var{value} is its long variable name. 1069The @var{key} field is at most 8 bytes long and must match the 1070name of a variable which appears in the variable record (@pxref{Variable 1071Record}). 1072The @var{value} field is at most 64 bytes long. 1073The @var{key} and @var{value} fields are separated by a @samp{=} byte. 1074Each tuple is separated by a byte whose value is 09. There is no 1075trailing separator following the last tuple. 1076The total length is @code{count} bytes. 1077@end table 1078 1079@node Very Long String Record 1080@section Very Long String Record 1081 1082Old versions of SPSS limited string variables to a width of 255 bytes. 1083For backward compatibility with these older versions, the system file 1084format represents a string longer than 255 bytes, called a @dfn{very 1085long string}, as a collection of strings no longer than 255 bytes 1086each. The strings concatenated to make a very long string are called 1087its @dfn{segments}; for consistency, variables other than very long 1088strings are considered to have a single segment. 1089 1090A very long string with a width of @var{w} has @var{n} = 1091(@var{w} + 251) / 252 segments, that is, one segment for every 1092252 bytes of width, rounding up. It would be logical, then, for each 1093of the segments except the last to have a width of 252 and the last 1094segment to have the remainder, but this is not the case. In fact, 1095each segment except the last has a width of 255 bytes. The last 1096segment has width @var{w} - (@var{n} - 1) * 252; some versions 1097of SPSS make it slightly wider, but not wide enough to make the last 1098segment require another 8 bytes of data. 1099 1100Data is packed tightly into segments of a very long string, 255 bytes 1101per segment. Because 255 bytes of segment data are allocated for 1102every 252 bytes of the very long string's width (approximately), some 1103unused space is left over at the end of the allocated segments. Data 1104in unused space is ignored. 1105 1106Example: Consider a very long string of width 20,000. Such a very 1107long string has 20,000 / 252 = 80 (rounding up) segments. The first 110879 segments have width 255; the last segment has width 20,000 - 79 * 1109252 = 92 or slightly wider (up to 96 bytes, the next multiple of 8). 1110The very long string's data is actually stored in the 19,890 bytes in 1111the first 78 segments, plus the first 110 bytes of the 79th segment 1112(19,890 + 110 = 20,000). The remaining 145 bytes of the 79th segment 1113and all 92 bytes of the 80th segment are unused. 1114 1115The very long string record explains how to stitch together segments 1116to obtain very long string data. For each of the very long string 1117variables in the dictionary, it specifies the name of its first 1118segment's variable and the very long string variable's actual width. 1119The remaining segments immediately follow the named variable in the 1120system file's dictionary. 1121 1122The very long string record, which is present only if the system file 1123contains very long string variables, has the following format: 1124 1125@example 1126/* @r{Header.} */ 1127int32 rec_type; 1128int32 subtype; 1129int32 size; 1130int32 count; 1131 1132/* @r{Exactly @code{count} bytes of data.} */ 1133char string_lengths[]; 1134@end example 1135 1136@table @code 1137@item int32 rec_type; 1138Record type. Always set to 7. 1139 1140@item int32 subtype; 1141Record subtype. Always set to 14. 1142 1143@item int32 size; 1144The size of each element in the @code{string_lengths} member. Always set to 1. 1145 1146@item int32 count; 1147The total number of bytes in @code{string_lengths}. 1148 1149@item char string_lengths[]; 1150A list of @var{key}--@var{value} tuples, where @var{key} is the name 1151of a variable, and @var{value} is its length. 1152The @var{key} field is at most 8 bytes long and must match the 1153name of a variable which appears in the variable record (@pxref{Variable 1154Record}). 1155The @var{value} field is exactly 5 bytes long. It is a zero-padded, 1156ASCII-encoded string that is the length of the variable. 1157The @var{key} and @var{value} fields are separated by a @samp{=} byte. 1158Tuples are delimited by a two-byte sequence @{00, 09@}. 1159After the last tuple, there may be a single byte 00, or @{00, 09@}. 1160The total length is @code{count} bytes. 1161@end table 1162 1163@node Character Encoding Record 1164@section Character Encoding Record 1165 1166This record, if present, indicates the character encoding for string data, 1167long variable names, variable labels, value labels and other strings in the 1168file. 1169 1170@example 1171/* @r{Header.} */ 1172int32 rec_type; 1173int32 subtype; 1174int32 size; 1175int32 count; 1176 1177/* @r{Exactly @code{count} bytes of data.} */ 1178char encoding[]; 1179@end example 1180 1181@table @code 1182@item int32 rec_type; 1183Record type. Always set to 7. 1184 1185@item int32 subtype; 1186Record subtype. Always set to 20. 1187 1188@item int32 size; 1189The size of each element in the @code{encoding} member. Always set to 1. 1190 1191@item int32 count; 1192The total number of bytes in @code{encoding}. 1193 1194@item char encoding[]; 1195The name of the character encoding. Normally this will be an official 1196IANA character set name or alias. 1197See @url{http://www.iana.org/assignments/character-sets}. 1198Character set names are not case-sensitive, but SPSS appears to write 1199them in all-uppercase. 1200@end table 1201 1202This record is not present in files generated by older software. See 1203also the @code{character_code} field in the machine integer info 1204record (@pxref{character-code}). 1205 1206When the character encoding record and the machine integer info record 1207are both present, all system files observed in practice indicate the 1208same character encoding, e.g.@: 1252 as @code{character_code} and 1209@code{windows-1252} as @code{encoding}, 65001 and @code{UTF-8}, etc. 1210 1211If, for testing purposes, a file is crafted with different 1212@code{character_code} and @code{encoding}, it seems that 1213@code{character_code} controls the encoding for all strings in the 1214system file before the dictionary termination record, including 1215strings in data (e.g.@: string missing values), and @code{encoding} 1216controls the encoding for strings following the dictionary termination 1217record. 1218 1219@node Long String Value Labels Record 1220@section Long String Value Labels Record 1221 1222This record, if present, specifies value labels for long string 1223variables. 1224 1225@example 1226/* @r{Header.} */ 1227int32 rec_type; 1228int32 subtype; 1229int32 size; 1230int32 count; 1231 1232/* @r{Repeated up to exactly @code{count} bytes.} */ 1233int32 var_name_len; 1234char var_name[]; 1235int32 var_width; 1236int32 n_labels; 1237long_string_label labels[]; 1238@end example 1239 1240@table @code 1241@item int32 rec_type; 1242Record type. Always set to 7. 1243 1244@item int32 subtype; 1245Record subtype. Always set to 21. 1246 1247@item int32 size; 1248Always set to 1. 1249 1250@item int32 count; 1251The number of bytes following the header until the next header. 1252 1253@item int32 var_name_len; 1254@itemx char var_name[]; 1255The number of bytes in the name of the variable that has long string 1256value labels, plus the variable name itself, which consists of exactly 1257@code{var_name_len} bytes. The variable name is not padded to any 1258particular boundary, nor is it null-terminated. 1259 1260@item int32 var_width; 1261The width of the variable, in bytes, which will be between 9 and 126232767. 1263 1264@item int32 n_labels; 1265@itemx long_string_label labels[]; 1266The long string labels themselves. The @code{labels} array contains 1267exactly @code{n_labels} elements, each of which has the following 1268substructure: 1269 1270@example 1271int32 value_len; 1272char value[]; 1273int32 label_len; 1274char label[]; 1275@end example 1276 1277@table @code 1278@item int32 value_len; 1279@itemx char value[]; 1280The string value being labeled. @code{value_len} is the number of 1281bytes in @code{value}; it is equal to @code{var_width}. The 1282@code{value} array is not padded or null-terminated. 1283 1284@item int32 label_len; 1285@itemx char label[]; 1286The label for the string value. @code{label_len}, which must be 1287between 0 and 120, is the number of bytes in @code{label}. The 1288@code{label} array is not padded or null-terminated. 1289@end table 1290@end table 1291 1292@node Long String Missing Values Record 1293@section Long String Missing Values Record 1294 1295This record, if present, specifies missing values for long string 1296variables. 1297 1298@example 1299/* @r{Header.} */ 1300int32 rec_type; 1301int32 subtype; 1302int32 size; 1303int32 count; 1304 1305/* @r{Repeated up to exactly @code{count} bytes.} */ 1306int32 var_name_len; 1307char var_name[]; 1308char n_missing_values; 1309long_string_missing_value values[]; 1310@end example 1311 1312@table @code 1313@item int32 rec_type; 1314Record type. Always set to 7. 1315 1316@item int32 subtype; 1317Record subtype. Always set to 22. 1318 1319@item int32 size; 1320Always set to 1. 1321 1322@item int32 count; 1323The number of bytes following the header until the next header. 1324 1325@item int32 var_name_len; 1326@itemx char var_name[]; 1327The number of bytes in the name of the long string variable that has 1328missing values, plus the variable name itself, which consists of 1329exactly @code{var_name_len} bytes. The variable name is not padded to 1330any particular boundary, nor is it null-terminated. 1331 1332@item char n_missing_values; 1333The number of missing values, either 1, 2, or 3. (This is, unusually, 1334a single byte instead of a 32-bit number.) 1335 1336@item long_string_missing_value values[]; 1337The missing values themselves. This array contains exactly 1338@code{n_missing_values} elements, each of which has the following 1339substructure: 1340 1341@example 1342int32 value_len; 1343char value[]; 1344@end example 1345 1346@table @code 1347@item int32 value_len; 1348The length of the missing value string, in bytes. This value should 1349be 8, because long string variables are at least 8 bytes wide (by 1350definition), only the first 8 bytes of a long string variable's 1351missing values are allowed to be non-spaces, and any spaces within the 1352first 8 bytes are included in the missing value here. 1353 1354@item char value[]; 1355The missing value string, exactly @code{value_len} bytes, without 1356any padding or null terminator. 1357@end table 1358@end table 1359 1360@node Data File and Variable Attributes Records 1361@section Data File and Variable Attributes Records 1362 1363The data file and variable attributes records represent custom 1364attributes for the system file or for individual variables in the 1365system file, as defined on the DATAFILE ATTRIBUTE (@pxref{DATAFILE 1366ATTRIBUTE,,,pspp, PSPP Users Guide}) and VARIABLE ATTRIBUTE commands 1367(@pxref{VARIABLE ATTRIBUTE,,,pspp, PSPP Users Guide}), respectively. 1368 1369@example 1370/* @r{Header.} */ 1371int32 rec_type; 1372int32 subtype; 1373int32 size; 1374int32 count; 1375 1376/* @r{Exactly @code{count} bytes of data.} */ 1377char attributes[]; 1378@end example 1379 1380@table @code 1381@item int32 rec_type; 1382Record type. Always set to 7. 1383 1384@item int32 subtype; 1385Record subtype. Always set to 17 for a data file attribute record or 1386to 18 for a variable attributes record. 1387 1388@item int32 size; 1389The size of each element in the @code{attributes} member. Always set to 1. 1390 1391@item int32 count; 1392The total number of bytes in @code{attributes}. 1393 1394@item char attributes[]; 1395The attributes, in a text-based format. 1396 1397In record subtype 17, this field contains a single attribute set. An 1398attribute set is a sequence of one or more attributes concatenated 1399together. Each attribute consists of a name, which has the same 1400syntax as a variable name, followed by, inside parentheses, a sequence 1401of one or more values. Each value consists of a string enclosed in 1402single quotes (@code{'}) followed by a line feed (byte 0x0a). A value 1403may contain single quote characters, which are not themselves escaped 1404or quoted or required to be present in pairs. There is no apparent 1405way to embed a line feed in a value. There is no distinction between 1406an attribute with a single value and an attribute array with one 1407element. 1408 1409In record subtype 18, this field contains a sequence of one or more 1410variable attribute sets. If more than one variable attribute set is 1411present, each one after the first is delimited from the previous by 1412@code{/}. Each variable attribute set consists of a long 1413variable name, 1414followed by @code{:}, followed by an attribute set with the same 1415syntax as on record subtype 17. 1416 1417System files written by @code{Stata 14.1/-savespss- 1.77 by 1418S.Radyakin} may include multiple records with subtype 18, one per 1419variable that has variable attributes. 1420 1421The total length is @code{count} bytes. 1422@end table 1423 1424@subheading Example 1425 1426A system file produced with the following VARIABLE ATTRIBUTE commands 1427in effect: 1428 1429@example 1430VARIABLE ATTRIBUTE VARIABLES=dummy ATTRIBUTE=fred[1]('23') fred[2]('34'). 1431VARIABLE ATTRIBUTE VARIABLES=dummy ATTRIBUTE=bert('123'). 1432@end example 1433 1434@noindent 1435will contain a variable attribute record with the following contents: 1436 1437@example 14380000 07 00 00 00 12 00 00 00 01 00 00 00 22 00 00 00 |............"...| 14390010 64 75 6d 6d 79 3a 66 72 65 64 28 27 32 33 27 0a |dummy:fred('23'.| 14400020 27 33 34 27 0a 29 62 65 72 74 28 27 31 32 33 27 |'34'.)bert('123'| 14410030 0a 29 |.) | 1442@end example 1443 1444@menu 1445* Variable Roles:: 1446@end menu 1447 1448@node Variable Roles 1449@subsection Variable Roles 1450 1451A variable's role is represented as an attribute named @code{$@@Role}. 1452This attribute has a single element whose values and their meanings 1453are: 1454 1455@table @code 1456@item 0 1457Input. This, the default, is the most common role. 1458@item 1 1459Output. 1460@item 2 1461Both. 1462@item 3 1463None. 1464@item 4 1465Partition. 1466@item 5 1467Split. 1468@end table 1469 1470@node Extended Number of Cases Record 1471@section Extended Number of Cases Record 1472 1473The file header record expresses the number of cases in the system 1474file as an int32 (@pxref{File Header Record}). This record allows the 1475number of cases in the system file to be expressed as a 64-bit number. 1476 1477@example 1478int32 rec_type; 1479int32 subtype; 1480int32 size; 1481int32 count; 1482int64 unknown; 1483int64 ncases64; 1484@end example 1485 1486@table @code 1487@item int32 rec_type; 1488Record type. Always set to 7. 1489 1490@item int32 subtype; 1491Record subtype. Always set to 16. 1492 1493@item int32 size; 1494Size of each element. Always set to 8. 1495 1496@item int32 count; 1497Number of pieces of data in the data part. Alway set to 2. 1498 1499@item int64 unknown; 1500Meaning unknown. Always set to 1. 1501 1502@item int64 ncases64; 1503Number of cases in the file as a 64-bit integer. Presumably this 1504could be -1 to indicate that the number of cases is unknown, for the 1505same reason as @code{ncases} in the file header record, but this has 1506not been observed in the wild. 1507@end table 1508 1509@node Other Informational Records 1510@section Other Informational Records 1511 1512This chapter documents many specific types of extension records are 1513documented here, but others are known to exist. PSPP ignores unknown 1514extension records when reading system files. 1515 1516The following extension record subtypes have also been observed, with 1517the following believed meanings: 1518 1519@table @asis 1520@item 5 1521A set of grouped variables (according to Aapi H@"am@"al@"ainen). 1522 1523@item 6 1524Date info, probably related to USE (according to Aapi H@"am@"al@"ainen). 1525 1526@item 12 1527A UUID in the format described in RFC 4122. Only two examples 1528observed, both written by SPSS 13, and in each case the UUID contained 1529both upper and lower case. 1530 1531@item 24 1532XML that describes how data in the file should be displayed on-screen. 1533@end table 1534 1535@node Dictionary Termination Record 1536@section Dictionary Termination Record 1537 1538The dictionary termination record separates all other records from the 1539data records. 1540 1541@example 1542int32 rec_type; 1543int32 filler; 1544@end example 1545 1546@table @code 1547@item int32 rec_type; 1548Record type. Always set to 999. 1549 1550@item int32 filler; 1551Ignored padding. Should be set to 0. 1552@end table 1553 1554@node Data Record 1555@section Data Record 1556 1557The data record must follow all other records in the system file. 1558Every system file must have a data record that specifies data for at 1559least one case. The format of the data record varies depending on the 1560value of @code{compression} in the file header record: 1561 1562@table @asis 1563@item 0: no compression 1564Data is arranged as a series of 8-byte elements. 1565Each element corresponds to 1566the variable declared in the respective variable record (@pxref{Variable 1567Record}). Numeric values are given in @code{flt64} format; string 1568values are literal characters string, padded on the right when 1569necessary to fill out 8-byte units. 1570 1571@item 1: bytecode compression 1572The first 8 bytes 1573of the data record is divided into a series of 1-byte command 1574codes. These codes have meanings as described below: 1575 1576@table @asis 1577@item 0 1578Ignored. If the program writing the system file accumulates compressed 1579data in blocks of fixed length, 0 bytes can be used to pad out extra 1580bytes remaining at the end of a fixed-size block. 1581 1582@item 1 through 251 1583A number with 1584value @var{code} - @var{bias}, where 1585@var{code} is the value of the compression code and @var{bias} is the 1586variable @code{bias} from the file header. For example, 1587code 105 with bias 100.0 (the normal value) indicates a numeric variable 1588of value 5. 1589 1590A code of 0 (after subtracting the bias) in a string field encodes 1591null bytes. This is unusual, since a string field normally encodes 1592text data, but it exists in real system files. 1593 1594@item 252 1595End of file. This code may or may not appear at the end of the data 1596stream. PSPP always outputs this code but its use is not required. 1597 1598@item 253 1599A numeric or string value that is not 1600compressible. The value is stored in the 8 bytes following the 1601current block of command bytes. If this value appears twice in a block 1602of command bytes, then it indicates the second group of 8 bytes following the 1603command bytes, and so on. 1604 1605@item 254 1606An 8-byte string value that is all spaces. 1607 1608@item 255 1609The system-missing value. 1610@end table 1611 1612The end of the 8-byte group of bytecodes is followed by any 8-byte 1613blocks of non-compressible values indicated by code 253. After that 1614follows another 8-byte group of bytecodes, then those bytecodes' 1615non-compressible values. The pattern repeats to the end of the file 1616or a code with value 252. 1617 1618@item 2: ZLIB compression 1619The data record consists of the following, in order: 1620 1621@itemize @bullet 1622@item 1623ZLIB data header, 24 bytes long. 1624 1625@item 1626One or more variable-length blocks of ZLIB compressed data. 1627 1628@item 1629ZLIB data trailer, with a 24-byte fixed header plus an additional 24 1630bytes for each preceding ZLIB compressed data block. 1631@end itemize 1632 1633The ZLIB data header has the following format: 1634 1635@example 1636int64 zheader_ofs; 1637int64 ztrailer_ofs; 1638int64 ztrailer_len; 1639@end example 1640 1641@table @code 1642@item int64 zheader_ofs; 1643The offset, in bytes, of the beginning of this structure within the 1644system file. 1645 1646@item int64 ztrailer_ofs; 1647The offset, in bytes, of the first byte of the ZLIB data trailer. 1648 1649@item int64 ztrailer_len; 1650The number of bytes in the ZLIB data trailer. This and the previous 1651field sum to the size of the system file in bytes. 1652@end table 1653 1654The data header is followed by @code{(ztrailer_len - 24) / 24} ZLIB 1655compressed data blocks. Each ZLIB compressed data block begins with a 1656ZLIB header as specified in RFC@tie{}1950, e.g.@: hex bytes @code{78 165701} (the only header yet observed in practice). Each block 1658decompresses to a fixed number of bytes (in practice only 1659@code{0x3ff000}-byte blocks have been observed), except that the last 1660block of data may be shorter. The last ZLIB compressed data block 1661gends just before offset @code{ztrailer_ofs}. 1662 1663The result of ZLIB decompression is bytecode compressed data as 1664described above for compression format 1. 1665 1666The ZLIB data trailer begins with the following 24-byte fixed header: 1667 1668@example 1669int64 bias; 1670int64 zero; 1671int32 block_size; 1672int32 n_blocks; 1673@end example 1674 1675@table @code 1676@item int64 int_bias; 1677The compression bias as a negative integer, e.g.@: if @code{bias} in 1678the file header record is 100.0, then @code{int_bias} is @minus{}100 1679(this is the only value yet observed in practice). 1680 1681@item int64 zero; 1682Always observed to be zero. 1683 1684@item int32 block_size; 1685The number of bytes in each ZLIB compressed data block, except 1686possibly the last, following decompression. Only @code{0x3ff000} has 1687been observed so far. 1688 1689@item int32 n_blocks; 1690The number of ZLIB compressed data blocks, always exactly 1691@code{(ztrailer_len - 24) / 24}. 1692@end table 1693 1694The fixed header is followed by @code{n_blocks} 24-byte ZLIB data 1695block descriptors, each of which describes the compressed data block 1696corresponding to its offset. Each block descriptor has the following 1697format: 1698 1699@example 1700int64 uncompressed_ofs; 1701int64 compressed_ofs; 1702int32 uncompressed_size; 1703int32 compressed_size; 1704@end example 1705 1706@table @code 1707@item int64 uncompressed_ofs; 1708The offset, in bytes, that this block of data would have in a similar 1709system file that uses compression format 1. This is 1710@code{zheader_ofs} in the first block descriptor, and in each 1711succeeding block descriptor it is the sum of the previous desciptor's 1712@code{uncompressed_ofs} and @code{uncompressed_size}. 1713 1714@item int64 compressed_ofs; 1715The offset, in bytes, of the actual beginning of this compressed data 1716block. This is @code{zheader_ofs + 24} in the first block descriptor, 1717and in each succeeding block descriptor it is the sum of the previous 1718descriptor's @code{compressed_ofs} and @code{compressed_size}. The 1719final block descriptor's @code{compressed_ofs} and 1720@code{compressed_size} sum to @code{ztrailer_ofs}. 1721 1722@item int32 uncompressed_size; 1723The number of bytes in this data block, after decompression. This is 1724@code{block_size} in every data block except the last, which may be 1725smaller. 1726 1727@item int32 compressed_size; 1728The number of bytes in this data block, as stored compressed in this 1729system file. 1730@end table 1731@end table 1732 1733@setfilename ignored 1734