1===========================
2TableGen Language Reference
3===========================
4
5.. contents::
6   :local:
7
8.. warning::
9   This document is extremely rough. If you find something lacking, please
10   fix it, file a documentation bug, or ask about it on llvmdev.
11
12Introduction
13============
14
15This document is meant to be a normative spec about the TableGen language
16in and of itself (i.e. how to understand a given construct in terms of how
17it affects the final set of records represented by the TableGen file). If
18you are unsure if this document is really what you are looking for, please
19read the :doc:`introduction to TableGen <index>` first.
20
21Notation
22========
23
24The lexical and syntax notation used here is intended to imitate
25`Python's`_. In particular, for lexical definitions, the productions
26operate at the character level and there is no implied whitespace between
27elements. The syntax definitions operate at the token level, so there is
28implied whitespace between tokens.
29
30.. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation
31
32Lexical Analysis
33================
34
35TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``)
36comments.
37
38The following is a listing of the basic punctuation tokens::
39
40   - + [ ] { } ( ) < > : ; .  = ? #
41
42Numeric literals take one of the following forms:
43
44.. TableGen actually will lex some pretty strange sequences an interpret
45   them as numbers. What is shown here is an attempt to approximate what it
46   "should" accept.
47
48.. productionlist::
49   TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger`
50   DecimalInteger: ["+" | "-"] ("0"..."9")+
51   HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+
52   BinInteger: "0b" ("0" | "1")+
53
54One aspect to note is that the :token:`DecimalInteger` token *includes* the
55``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as
56most languages do.
57
58Also note that :token:`BinInteger` creates a value of type ``bits<n>``
59(where ``n`` is the number of bits).  This will implicitly convert to
60integers when needed.
61
62TableGen has identifier-like tokens:
63
64.. productionlist::
65   ualpha: "a"..."z" | "A"..."Z" | "_"
66   TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")*
67   TokVarName: "$" `ualpha` (`ualpha` |  "0"..."9")*
68
69Note that unlike most languages, TableGen allows :token:`TokIdentifier` to
70begin with a number. In case of ambiguity, a token will be interpreted as a
71numeric literal rather than an identifier.
72
73TableGen also has two string-like literals:
74
75.. productionlist::
76   TokString: '"' <non-'"' characters and C-like escapes> '"'
77   TokCodeFragment: "[{" <shortest text not containing "}]"> "}]"
78
79:token:`TokCodeFragment` is essentially a multiline string literal
80delimited by ``[{`` and ``}]``.
81
82.. note::
83   The current implementation accepts the following C-like escapes::
84
85      \\ \' \" \t \n
86
87TableGen also has the following keywords::
88
89   bit   bits      class   code         dag
90   def   foreach   defm    field        in
91   int   let       list    multiclass   string
92
93TableGen also has "bang operators" which have a
94wide variety of meanings:
95
96.. productionlist::
97   BangOperator: one of
98               :!eq     !if      !head    !tail      !con
99               :!add    !shl     !sra     !srl       !and
100               :!cast   !empty   !subst   !foreach   !listconcat   !strconcat
101
102Syntax
103======
104
105TableGen has an ``include`` mechanism. It does not play a role in the
106syntax per se, since it is lexically replaced with the contents of the
107included file.
108
109.. productionlist::
110   IncludeDirective: "include" `TokString`
111
112TableGen's top-level production consists of "objects".
113
114.. productionlist::
115   TableGenFile: `Object`*
116   Object: `Class` | `Def` | `Defm` | `Let` | `MultiClass` | `Foreach`
117
118``class``\es
119------------
120
121.. productionlist::
122   Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody`
123
124A ``class`` declaration creates a record which other records can inherit
125from. A class can be parametrized by a list of "template arguments", whose
126values can be used in the class body.
127
128A given class can only be defined once. A ``class`` declaration is
129considered to define the class if any of the following is true:
130
131.. break ObjectBody into its consituents so that they are present here?
132
133#. The :token:`TemplateArgList` is present.
134#. The :token:`Body` in the :token:`ObjectBody` is present and is not empty.
135#. The :token:`BaseClassList` in the :token:`ObjectBody` is present.
136
137You can declare an empty class by giving and empty :token:`TemplateArgList`
138and an empty :token:`ObjectBody`. This can serve as a restricted form of
139forward declaration: note that records deriving from the forward-declared
140class will inherit no fields from it since the record expansion is done
141when the record is parsed.
142
143.. productionlist::
144   TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">"
145
146Declarations
147------------
148
149.. Omitting mention of arcane "field" prefix to discourage its use.
150
151The declaration syntax is pretty much what you would expect as a C++
152programmer.
153
154.. productionlist::
155   Declaration: `Type` `TokIdentifier` ["=" `Value`]
156
157It assigns the value to the identifer.
158
159Types
160-----
161
162.. productionlist::
163   Type: "string" | "code" | "bit" | "int" | "dag"
164       :| "bits" "<" `TokInteger` ">"
165       :| "list" "<" `Type` ">"
166       :| `ClassID`
167   ClassID: `TokIdentifier`
168
169Both ``string`` and ``code`` correspond to the string type; the difference
170is purely to indicate programmer intention.
171
172The :token:`ClassID` must identify a class that has been previously
173declared or defined.
174
175Values
176------
177
178.. productionlist::
179   Value: `SimpleValue` `ValueSuffix`*
180   ValueSuffix: "{" `RangeList` "}"
181              :| "[" `RangeList` "]"
182              :| "." `TokIdentifier`
183   RangeList: `RangePiece` ("," `RangePiece`)*
184   RangePiece: `TokInteger`
185             :| `TokInteger` "-" `TokInteger`
186             :| `TokInteger` `TokInteger`
187
188The peculiar last form of :token:`RangePiece` is due to the fact that the
189"``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as
190two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``,
191instead of "1", "-", and "5".
192The :token:`RangeList` can be thought of as specifying "list slice" in some
193contexts.
194
195
196:token:`SimpleValue` has a number of forms:
197
198
199.. productionlist::
200   SimpleValue: `TokIdentifier`
201
202The value will be the variable referenced by the identifier. It can be one
203of:
204
205.. The code for this is exceptionally abstruse. These examples are a
206   best-effort attempt.
207
208* name of a ``def``, such as the use of ``Bar`` in::
209
210     def Bar : SomeClass {
211       int X = 5;
212     }
213
214     def Foo {
215       SomeClass Baz = Bar;
216     }
217
218* value local to a ``def``, such as the use of ``Bar`` in::
219
220     def Foo {
221       int Bar = 5;
222       int Baz = Bar;
223     }
224
225* a template arg of a ``class``, such as the use of ``Bar`` in::
226
227     class Foo<int Bar> {
228       int Baz = Bar;
229     }
230
231* value local to a ``multiclass``, such as the use of ``Bar`` in::
232
233     multiclass Foo {
234       int Bar = 5;
235       int Baz = Bar;
236     }
237
238* a template arg to a ``multiclass``, such as the use of ``Bar`` in::
239
240     multiclass Foo<int Bar> {
241       int Baz = Bar;
242     }
243
244.. productionlist::
245   SimpleValue: `TokInteger`
246
247This represents the numeric value of the integer.
248
249.. productionlist::
250   SimpleValue: `TokString`+
251
252Multiple adjacent string literals are concatenated like in C/C++. The value
253is the concatenation of the strings.
254
255.. productionlist::
256   SimpleValue: `TokCodeFragment`
257
258The value is the string value of the code fragment.
259
260.. productionlist::
261   SimpleValue: "?"
262
263``?`` represents an "unset" initializer.
264
265.. productionlist::
266   SimpleValue: "{" `ValueList` "}"
267   ValueList: [`ValueListNE`]
268   ValueListNE: `Value` ("," `Value`)*
269
270This represents a sequence of bits, as would be used to initialize a
271``bits<n>`` field (where ``n`` is the number of bits).
272
273.. productionlist::
274   SimpleValue: `ClassID` "<" `ValueListNE` ">"
275
276This generates a new anonymous record definition (as would be created by an
277unnamed ``def`` inheriting from the given class with the given template
278arguments) and the value is the value of that record definition.
279
280.. productionlist::
281   SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"]
282
283A list initializer. The optional :token:`Type` can be used to indicate a
284specific element type, otherwise the element type will be deduced from the
285given values.
286
287.. The initial `DagArg` of the dag must start with an identifier or
288   !cast, but this is more of an implementation detail and so for now just
289   leave it out.
290
291.. productionlist::
292   SimpleValue: "(" `DagArg` `DagArgList` ")"
293   DagArgList: `DagArg` ("," `DagArg`)*
294   DagArg: `Value` [":" `TokVarName`] | `TokVarName`
295
296The initial :token:`DagArg` is called the "operator" of the dag.
297
298.. productionlist::
299   SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")"
300
301Bodies
302------
303
304.. productionlist::
305   ObjectBody: `BaseClassList` `Body`
306   BaseClassList: [":" `BaseClassListNE`]
307   BaseClassListNE: `SubClassRef` ("," `SubClassRef`)*
308   SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"]
309   DefmID: `TokIdentifier`
310
311The version with the :token:`MultiClassID` is only valid in the
312:token:`BaseClassList` of a ``defm``.
313The :token:`MultiClassID` should be the name of a ``multiclass``.
314
315.. put this somewhere else
316
317It is after parsing the base class list that the "let stack" is applied.
318
319.. productionlist::
320   Body: ";" | "{" BodyList "}"
321   BodyList: BodyItem*
322   BodyItem: `Declaration` ";"
323           :| "let" `TokIdentifier` [`RangeList`] "=" `Value` ";"
324
325The ``let`` form allows overriding the value of an inherited field.
326
327``def``
328-------
329
330.. TODO::
331   There can be pastes in the names here, like ``#NAME#``. Look into that
332   and document it (it boils down to ParseIDValue with IDParseMode ==
333   ParseNameMode). ParseObjectName calls into the general ParseValue, with
334   the only different from "arbitrary expression parsing" being IDParseMode
335   == Mode.
336
337.. productionlist::
338   Def: "def" `TokIdentifier` `ObjectBody`
339
340Defines a record whose name is given by the :token:`TokIdentifier`. The
341fields of the record are inherited from the base classes and defined in the
342body.
343
344Special handling occurs if this ``def`` appears inside a ``multiclass`` or
345a ``foreach``.
346
347``defm``
348--------
349
350.. productionlist::
351   Defm: "defm" `TokIdentifier` ":" `BaseClassListNE` ";"
352
353Note that in the :token:`BaseClassList`, all of the ``multiclass``'s must
354precede any ``class``'s that appear.
355
356``foreach``
357-----------
358
359.. productionlist::
360   Foreach: "foreach" `Declaration` "in" "{" `Object`* "}"
361          :| "foreach" `Declaration` "in" `Object`
362
363The value assigned to the variable in the declaration is iterated over and
364the object or object list is reevaluated with the variable set at each
365iterated value.
366
367Top-Level ``let``
368-----------------
369
370.. productionlist::
371   Let:  "let" `LetList` "in" "{" `Object`* "}"
372      :| "let" `LetList` "in" `Object`
373   LetList: `LetItem` ("," `LetItem`)*
374   LetItem: `TokIdentifier` [`RangeList`] "=" `Value`
375
376This is effectively equivalent to ``let`` inside the body of a record
377except that it applies to multiple records at a time. The bindings are
378applied at the end of parsing the base classes of a record.
379
380``multiclass``
381--------------
382
383.. productionlist::
384   MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`]
385             : [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}"
386   BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)*
387   MultiClassID: `TokIdentifier`
388   MultiClassObject: `Def` | `Defm` | `Let` | `Foreach`
389