1@node Objects In Memory
2@comment  node-name,  next,  previous,  up
3@chapter Objects In Memory
4
5@menu
6* Type tags::
7* Heap Object Layout::
8@end menu
9
10@node Type tags
11@section Type tags
12
13The in-memory representation of Lisp data includes type information
14about each object.  This type information takes the form of a lowtag
15in the low bits of each pointer to heap space, a widetag for each
16boxed immediate value and a header (also with a widetag) at the start
17of the allocated space for each object.  These tags are used to inform
18both the GC and Lisp code about the type and allocated size of Lisp
19objects.
20
21@c FIXME: Add diagrams showing tag allocation to next two sections.
22
23@subsection Lowtags
24
25Objects allocated on the Lisp heap are aligned to a double-word
26boundary, leaving the low-order bits (which would normally identify a
27particular octet within the first two words) available for use to hold
28type information.  This turns out to be three bits on 32-bit systems
29and four bits on 64-bit systems.
30
31Of these 8 or 16 tags, we have some constraints for allocation:
32
33@itemize
34@item
35We need 6 of the low 8 bits of the word for widetags, meaning that one
36out of every four lowtags must be an @code{other-immediate} lowtag.
37@item
38We have four pointer types.  Instance (struct and CLOS) pointers,
39function pointers, list pointers, and other pointers.
40@item
41@code{fixnum}s are required to have their lowtags be comprised
42entirely of zeros.
43@item
44There are additional constraints around the ordering of the pointer
45types, particularly with respect to list pointers (the NIL-cons hack).
46@end itemize
47
48Complicating this issue is that while the lowtag @emph{space} is three
49or four bits wide, some of the lowtags are effectively narrower.  The
50@code{other-immediate} tags effectively have a two-bit lowtag, and
51@code{fixnum}s have historically been one bit narrower than the other
52lowtags (thus @code{even-fixnum-lowtag} and @code{odd-fixnum-lowtag})
53though with the recent work on wider fixnums on 64-bit systems this is
54no longer necessarily so.
55
56The lowtags are specified in
57@file{src/compiler/generic/early-objdef.lisp}.
58
59@c 32-bit lowtag assignment
60@c x00 -- Fixnum
61@c x10 -- Other-immediate
62@c xx1 -- Pointer
63@c 001 --   Instance-pointer
64@c 011 --   List-pointer
65@c 101 --   Function-pointer
66@c 111 --   Other-pointer
67
68@c 32-bit lowtag assignment (proposed wider-fixnum branch)
69@c xxx0 -- Fixnum
70@c xx01 -- Other-immediate
71@c xx11 -- Pointer
72@c x011 --   List-pointer
73@c 0111 --   (Either instance or function pointer)
74@c 1111 --   Other-pointer
75
76@c 64-bit lowtag assignment (pre-wider-fixnums)
77@c x000 -- Fixnum
78@c xx10 -- Other-immediate
79@c xyyy -- Unused (where 011 <= yyy <= 110)
80@c xyy1 -- Pointer (where yy is either 00 or 11)
81@c 0001 --   Instance-pointer
82@c 0111 --   List-pointer
83@c 1001 --   Function-pointer
84@c 1111 --   Other-pointer
85
86@c 64-bit lowtag assignment (wider-fixnums)
87@c xyz0 -- Fixnum (where z or yz may also be 0 depending on n-fixnum-tag-bits)
88@c xx01 -- Other-immediate
89@c xx11 -- Pointer
90@c 0011 --   Instance-pointer
91@c 0111 --   List-pointer
92@c 1011 --   Function-pointer
93@c 1111 --   Other-pointer
94
95@subsubsection Fixnums
96
97@code{Fixnum}s are signed integers represented as immediate values.
98In SBCL, these integers are @code{(- n-word-bits n-fixnum-tag-bits)}
99bits wide, stored in the most-significant section of a machine word.
100
101The reason that @code{fixnum} tags are required to have the low
102@code{n-fixnum-tag-bits} as zeros is that it allows for addition and
103subtraction to be performed using native machine instructions
104directly, and multiplication and division can be performed likewise
105using a simple shift instruction to compensate for the effect of the
106tag.
107
108@subsubsection Other-immediates
109
110@code{Other-immediate}s are the lowtag part of widetag values.  Due to
111the constraints of widetag allocation, one out of every four lowtags
112must be a widetag (alternately, the width of the
113@code{other-immediate} lowtag is two bits).
114
115@subsubsection Pointers
116
117There are four different pointer lowtags, largely for optimization
118purposes.
119
120@itemize
121@item
122We have a distinct list pointer tag so that we can do a listp test by
123simply checking the pointer tag instead of needing to retrieve a
124header word for each @code{cons} cell.  This effectively halves the
125memory cost of @code{cons} cells.
126@item
127We have a distinct instance pointer tag so that we do not need to
128check a header word for each instance when doing a type check.  This
129saves a memory access for retrieving the class of an instance.
130@item
131We have a distinct function pointer tag so that we do not need to
132check a header word to determine if a given pointer is directly
133funcallable (that is, if the pointer is to a closure, a simple-fun, or
134a funcallable-instance).  This saves a memory access in the type test
135prior to @code{funcall} or @code{apply} of a function object.
136@item
137We have one last pointer tag for everything else.  Obtaining further
138type information from these pointers requires fetching the header word
139and dispatching on the widetag.
140@end itemize
141
142@subsection Widetags
143
144Widetags are used for three purposes.  First, to provide type
145information for immediate (non-pointer) data such as characters.
146Second, to provide ``marker'' values for things such as unbound slots.
147Third, to provide type information for objects stored on the heap.
148
149Because widetags are used for immediate data they must have a lowtag
150component.  This ends up being the @code{other-immediate} lowtags.
151For various reasons it was deemed convenient for widetags to be no
152more than eight bits wide, and with 27 or more distinct array types
153(depending on build-time configuration), seven numeric types, markers,
154and non-numeric heap object headers there ends up being more than 32
155widetags required (though less than 64).  This combination of factors
156leads to the requirement that one out of every four lowtags be an
157@code{other-immediate} lowtag.
158
159As widetags are involved in type tests for non-CLOS objects, their
160allocation is carefully arranged to allow for certain type tests to be
161cheaper than they might otherwise be.
162
163@itemize
164@item
165The numeric types are arranged to make @code{rational}, @code{float},
166@code{real}, @code{complex} and @code{number} type tests become range
167tests on the widetag.
168@item
169The array types are arranged to make various type tests become range
170tests on the widetag.
171@item
172The string types have disjoint ranges, but have been arranged so that
173their ranges differ only by one bit, allowing the @code{stringp} type
174test to become a masking operation followed by a range test or a
175masking operation followed by a simple comparison.
176@item
177There may be other clevernesses, these are just what can be found
178through reading the comments above the widetag definition.
179@end itemize
180
181The widetags are specified in
182@file{src/compiler/generic/early-objdef.lisp}.
183
184@node Heap Object Layout
185@section Heap Object Layout
186
187Objects stored in the heap are of two kinds: those with headers, and
188cons cells.  If the first word of an object has a header widetag then
189the object has the type and layout associated with that widetag.
190Otherwise, the object is assumed to be a @code{cons} cell.
191
192Some objects have ``unboxed'' words without any associated type
193information as well as the more usual ``boxed'' words with lowtags.
194Obvious cases include the specialized array types, some of the numeric
195types, @code{system-area-pointer}s, and so on.
196
197The primitive object layouts are specified in
198@file{src/compiler/generic/objdef.lisp}.
199
200@subsection Header Values
201
202As a widetag is only eight bits wide but a heap object header takes a
203full machine word, there is an extra 24 or 56 bits of space available
204for unboxed data storage in each heap object.  This space is called
205the ``header value'', and is used for various purposes depending on
206the type of heap object in question.
207
208@subsection Symbols
209
210In contrast to the simple model of symbols provided in the Common Lisp
211standard, symbol objects in SBCL do not have a function cell.
212Instead, the mapping from symbols to functions is done via the
213compiler globaldb.
214
215There are two additional slots associated with symbols.  One is a hash
216value for the symbol (based on the symbol name), which avoids having
217to recompute the hash from the name every time it is required.
218
219The other additional slot, on threaded systems only, is the TLS index,
220which is either @code{no-tls-value-marker-widetag} or an unboxed byte
221offset within the TLS area to the TLS slot associated with the symbol.
222Because the unboxed offset is aligned to a word boundary it appears as
223a @code{fixnum} when viewed as boxed data.  It is not, in general,
224safe to increment this value as a @code{fixnum}, however, in case
225@code{n-fixnum-tag-bits} changes@footnote{This is not as unlikely as
226it might seem at first; while historically @code{n-fixnum-tag-bits}
227has always been the same as @code{word-shift} there is a branch where
228it is permitted to vary at build time from @code{word-shift} to as low
229as 1 on 64-bit ports, and a proposed scheme to allow the same on
23032-bit ports}.
231
232@subsection The NIL-cons Hack
233
234As an ``optimization'', the symbol @code{nil} has
235@code{list-pointer-lowtag} rather than @code{other-pointer-lowtag},
236and is aligned in memory so that the value and hash slots are the
237@code{car} and @code{cdr} of the @code{cons}, with both slots
238containing @code{nil}.  This allows for @code{car} and @code{cdr} to
239simply do a lowtag test and slot access instead of having to
240explicitly test for @code{nil}, at the cost of requiring all symbol
241type tests and slot accesses to test for @code{nil}.
242
243@subsection Functions and Code Components
244
245@c This could do with a diagram showing a code-component with a couple
246@c of simple-fun entry points.
247
248All compiled code resides in @code{code-component} objects.  These
249objects consist of a header, some number of boxed literal values, a
250``data block'' containing machine code and @code{simple-fun} headers,
251and a ``trace table'' which is currently unused@footnote{Trace tables
252were originally used to support garbage collection using gengc in
253CMUCL.  As there is still vestigial support for carrying them around
254at the end of @code{code-component}s, they may end up being used for
255something else in the future.}.
256
257The @code{simple-fun} headers represent simple function objects (not
258@code{funcallable-instance}s or closures), and each
259@code{code-component} will typically have one for the main entry point
260and one per closure entry point (as the function underlying the
261closure, not the closure object proper).  In a compiler trace-file,
262the @code{simple-fun} headers are all listed as entries in the IR2
263component.
264
265The @code{simple-fun} headers are held in a linked list per
266@code{code-component} in order to allow the garbage collector to find
267them during relocation.  In order to be able to find the start of a
268@code{code-component} from a @code{simple-fun}, the header value is
269the offset in words from the start of the @code{code-component} to the
270start of the @code{simple-fun}.
271