1@node Objects In Memory 2@comment node-name, next, previous, up 3@chapter Objects In Memory 4 5@menu 6* Type tags:: 7* Heap Object Layout:: 8@end menu 9 10@node Type tags 11@section Type tags 12 13The in-memory representation of Lisp data includes type information 14about each object. This type information takes the form of a lowtag 15in the low bits of each pointer to heap space, a widetag for each 16boxed immediate value and a header (also with a widetag) at the start 17of the allocated space for each object. These tags are used to inform 18both the GC and Lisp code about the type and allocated size of Lisp 19objects. 20 21@c FIXME: Add diagrams showing tag allocation to next two sections. 22 23@subsection Lowtags 24 25Objects allocated on the Lisp heap are aligned to a double-word 26boundary, leaving the low-order bits (which would normally identify a 27particular octet within the first two words) available for use to hold 28type information. This turns out to be three bits on 32-bit systems 29and four bits on 64-bit systems. 30 31Of these 8 or 16 tags, we have some constraints for allocation: 32 33@itemize 34@item 35We need 6 of the low 8 bits of the word for widetags, meaning that one 36out of every four lowtags must be an @code{other-immediate} lowtag. 37@item 38We have four pointer types. Instance (struct and CLOS) pointers, 39function pointers, list pointers, and other pointers. 40@item 41@code{fixnum}s are required to have their lowtags be comprised 42entirely of zeros. 43@item 44There are additional constraints around the ordering of the pointer 45types, particularly with respect to list pointers (the NIL-cons hack). 46@end itemize 47 48Complicating this issue is that while the lowtag @emph{space} is three 49or four bits wide, some of the lowtags are effectively narrower. The 50@code{other-immediate} tags effectively have a two-bit lowtag, and 51@code{fixnum}s have historically been one bit narrower than the other 52lowtags (thus @code{even-fixnum-lowtag} and @code{odd-fixnum-lowtag}) 53though with the recent work on wider fixnums on 64-bit systems this is 54no longer necessarily so. 55 56The lowtags are specified in 57@file{src/compiler/generic/early-objdef.lisp}. 58 59@c 32-bit lowtag assignment 60@c x00 -- Fixnum 61@c x10 -- Other-immediate 62@c xx1 -- Pointer 63@c 001 -- Instance-pointer 64@c 011 -- List-pointer 65@c 101 -- Function-pointer 66@c 111 -- Other-pointer 67 68@c 32-bit lowtag assignment (proposed wider-fixnum branch) 69@c xxx0 -- Fixnum 70@c xx01 -- Other-immediate 71@c xx11 -- Pointer 72@c x011 -- List-pointer 73@c 0111 -- (Either instance or function pointer) 74@c 1111 -- Other-pointer 75 76@c 64-bit lowtag assignment (pre-wider-fixnums) 77@c x000 -- Fixnum 78@c xx10 -- Other-immediate 79@c xyyy -- Unused (where 011 <= yyy <= 110) 80@c xyy1 -- Pointer (where yy is either 00 or 11) 81@c 0001 -- Instance-pointer 82@c 0111 -- List-pointer 83@c 1001 -- Function-pointer 84@c 1111 -- Other-pointer 85 86@c 64-bit lowtag assignment (wider-fixnums) 87@c xyz0 -- Fixnum (where z or yz may also be 0 depending on n-fixnum-tag-bits) 88@c xx01 -- Other-immediate 89@c xx11 -- Pointer 90@c 0011 -- Instance-pointer 91@c 0111 -- List-pointer 92@c 1011 -- Function-pointer 93@c 1111 -- Other-pointer 94 95@subsubsection Fixnums 96 97@code{Fixnum}s are signed integers represented as immediate values. 98In SBCL, these integers are @code{(- n-word-bits n-fixnum-tag-bits)} 99bits wide, stored in the most-significant section of a machine word. 100 101The reason that @code{fixnum} tags are required to have the low 102@code{n-fixnum-tag-bits} as zeros is that it allows for addition and 103subtraction to be performed using native machine instructions 104directly, and multiplication and division can be performed likewise 105using a simple shift instruction to compensate for the effect of the 106tag. 107 108@subsubsection Other-immediates 109 110@code{Other-immediate}s are the lowtag part of widetag values. Due to 111the constraints of widetag allocation, one out of every four lowtags 112must be a widetag (alternately, the width of the 113@code{other-immediate} lowtag is two bits). 114 115@subsubsection Pointers 116 117There are four different pointer lowtags, largely for optimization 118purposes. 119 120@itemize 121@item 122We have a distinct list pointer tag so that we can do a listp test by 123simply checking the pointer tag instead of needing to retrieve a 124header word for each @code{cons} cell. This effectively halves the 125memory cost of @code{cons} cells. 126@item 127We have a distinct instance pointer tag so that we do not need to 128check a header word for each instance when doing a type check. This 129saves a memory access for retrieving the class of an instance. 130@item 131We have a distinct function pointer tag so that we do not need to 132check a header word to determine if a given pointer is directly 133funcallable (that is, if the pointer is to a closure, a simple-fun, or 134a funcallable-instance). This saves a memory access in the type test 135prior to @code{funcall} or @code{apply} of a function object. 136@item 137We have one last pointer tag for everything else. Obtaining further 138type information from these pointers requires fetching the header word 139and dispatching on the widetag. 140@end itemize 141 142@subsection Widetags 143 144Widetags are used for three purposes. First, to provide type 145information for immediate (non-pointer) data such as characters. 146Second, to provide ``marker'' values for things such as unbound slots. 147Third, to provide type information for objects stored on the heap. 148 149Because widetags are used for immediate data they must have a lowtag 150component. This ends up being the @code{other-immediate} lowtags. 151For various reasons it was deemed convenient for widetags to be no 152more than eight bits wide, and with 27 or more distinct array types 153(depending on build-time configuration), seven numeric types, markers, 154and non-numeric heap object headers there ends up being more than 32 155widetags required (though less than 64). This combination of factors 156leads to the requirement that one out of every four lowtags be an 157@code{other-immediate} lowtag. 158 159As widetags are involved in type tests for non-CLOS objects, their 160allocation is carefully arranged to allow for certain type tests to be 161cheaper than they might otherwise be. 162 163@itemize 164@item 165The numeric types are arranged to make @code{rational}, @code{float}, 166@code{real}, @code{complex} and @code{number} type tests become range 167tests on the widetag. 168@item 169The array types are arranged to make various type tests become range 170tests on the widetag. 171@item 172The string types have disjoint ranges, but have been arranged so that 173their ranges differ only by one bit, allowing the @code{stringp} type 174test to become a masking operation followed by a range test or a 175masking operation followed by a simple comparison. 176@item 177There may be other clevernesses, these are just what can be found 178through reading the comments above the widetag definition. 179@end itemize 180 181The widetags are specified in 182@file{src/compiler/generic/early-objdef.lisp}. 183 184@node Heap Object Layout 185@section Heap Object Layout 186 187Objects stored in the heap are of two kinds: those with headers, and 188cons cells. If the first word of an object has a header widetag then 189the object has the type and layout associated with that widetag. 190Otherwise, the object is assumed to be a @code{cons} cell. 191 192Some objects have ``unboxed'' words without any associated type 193information as well as the more usual ``boxed'' words with lowtags. 194Obvious cases include the specialized array types, some of the numeric 195types, @code{system-area-pointer}s, and so on. 196 197The primitive object layouts are specified in 198@file{src/compiler/generic/objdef.lisp}. 199 200@subsection Header Values 201 202As a widetag is only eight bits wide but a heap object header takes a 203full machine word, there is an extra 24 or 56 bits of space available 204for unboxed data storage in each heap object. This space is called 205the ``header value'', and is used for various purposes depending on 206the type of heap object in question. 207 208@subsection Symbols 209 210In contrast to the simple model of symbols provided in the Common Lisp 211standard, symbol objects in SBCL do not have a function cell. 212Instead, the mapping from symbols to functions is done via the 213compiler globaldb. 214 215There are two additional slots associated with symbols. One is a hash 216value for the symbol (based on the symbol name), which avoids having 217to recompute the hash from the name every time it is required. 218 219The other additional slot, on threaded systems only, is the TLS index, 220which is either @code{no-tls-value-marker-widetag} or an unboxed byte 221offset within the TLS area to the TLS slot associated with the symbol. 222Because the unboxed offset is aligned to a word boundary it appears as 223a @code{fixnum} when viewed as boxed data. It is not, in general, 224safe to increment this value as a @code{fixnum}, however, in case 225@code{n-fixnum-tag-bits} changes@footnote{This is not as unlikely as 226it might seem at first; while historically @code{n-fixnum-tag-bits} 227has always been the same as @code{word-shift} there is a branch where 228it is permitted to vary at build time from @code{word-shift} to as low 229as 1 on 64-bit ports, and a proposed scheme to allow the same on 23032-bit ports}. 231 232@subsection The NIL-cons Hack 233 234As an ``optimization'', the symbol @code{nil} has 235@code{list-pointer-lowtag} rather than @code{other-pointer-lowtag}, 236and is aligned in memory so that the value and hash slots are the 237@code{car} and @code{cdr} of the @code{cons}, with both slots 238containing @code{nil}. This allows for @code{car} and @code{cdr} to 239simply do a lowtag test and slot access instead of having to 240explicitly test for @code{nil}, at the cost of requiring all symbol 241type tests and slot accesses to test for @code{nil}. 242 243@subsection Functions and Code Components 244 245@c This could do with a diagram showing a code-component with a couple 246@c of simple-fun entry points. 247 248All compiled code resides in @code{code-component} objects. These 249objects consist of a header, some number of boxed literal values, a 250``data block'' containing machine code and @code{simple-fun} headers, 251and a ``trace table'' which is currently unused@footnote{Trace tables 252were originally used to support garbage collection using gengc in 253CMUCL. As there is still vestigial support for carrying them around 254at the end of @code{code-component}s, they may end up being used for 255something else in the future.}. 256 257The @code{simple-fun} headers represent simple function objects (not 258@code{funcallable-instance}s or closures), and each 259@code{code-component} will typically have one for the main entry point 260and one per closure entry point (as the function underlying the 261closure, not the closure object proper). In a compiler trace-file, 262the @code{simple-fun} headers are all listed as entries in the IR2 263component. 264 265The @code{simple-fun} headers are held in a linked list per 266@code{code-component} in order to allow the garbage collector to find 267them during relocation. In order to be able to find the start of a 268@code{code-component} from a @code{simple-fun}, the header value is 269the offset in words from the start of the @code{code-component} to the 270start of the @code{simple-fun}. 271