xref: /qemu/docs/devel/decodetree.rst (revision 10be627d)
1========================
2Decodetree Specification
3========================
4
5A *decodetree* is built from instruction *patterns*.  A pattern may
6represent a single architectural instruction or a group of same, depending
7on what is convenient for further processing.
8
9Each pattern has both *fixedbits* and *fixedmask*, the combination of which
10describes the condition under which the pattern is matched::
11
12  (insn & fixedmask) == fixedbits
13
14Each pattern may have *fields*, which are extracted from the insn and
15passed along to the translator.  Examples of such are registers,
16immediates, and sub-opcodes.
17
18In support of patterns, one may declare *fields*, *argument sets*, and
19*formats*, each of which may be re-used to simplify further definitions.
20
21Fields
22======
23
24Syntax::
25
26  field_def     := '%' identifier ( field )* ( !function=identifier )?
27  field         := unnamed_field | named_field
28  unnamed_field := number ':' ( 's' ) number
29  named_field   := identifier ':' ( 's' ) number
30
31For *unnamed_field*, the first number is the least-significant bit position
32of the field and the second number is the length of the field.  If the 's' is
33present, the field is considered signed.
34
35A *named_field* refers to some other field in the instruction pattern
36or format. Regardless of the length of the other field where it is
37defined, it will be inserted into this field with the specified
38signedness and bit width.
39
40Field definitions that involve loops (i.e. where a field is defined
41directly or indirectly in terms of itself) are errors.
42
43A format can include fields that refer to named fields that are
44defined in the instruction pattern(s) that use the format.
45Conversely, an instruction pattern can include fields that refer to
46named fields that are defined in the format it uses. However you
47cannot currently do both at once (i.e. pattern P uses format F; F has
48a field A that refers to a named field B that is defined in P, and P
49has a field C that refers to a named field D that is defined in F).
50
51If multiple ``fields`` are present, they are concatenated.
52In this way one can define disjoint fields.
53
54If ``!function`` is specified, the concatenated result is passed through the
55named function, taking and returning an integral value.
56
57One may use ``!function`` with zero ``fields``.  This case is called
58a *parameter*, and the named function is only passed the ``DisasContext``
59and returns an integral value extracted from there.
60
61A field with no ``fields`` and no ``!function`` is in error.
62
63Field examples:
64
65+---------------------------+---------------------------------------------+
66| Input                     | Generated code                              |
67+===========================+=============================================+
68| %disp   0:s16             | sextract(i, 0, 16)                          |
69+---------------------------+---------------------------------------------+
70| %imm9   16:6 10:3         | extract(i, 16, 6) << 3 | extract(i, 10, 3)  |
71+---------------------------+---------------------------------------------+
72| %disp12 0:s1 1:1 2:10     | sextract(i, 0, 1) << 11 |                   |
73|                           |    extract(i, 1, 1) << 10 |                 |
74|                           |    extract(i, 2, 10)                        |
75+---------------------------+---------------------------------------------+
76| %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      |
77|   !function=expand_shimm8 |               extract(i, 13, 1))            |
78+---------------------------+---------------------------------------------+
79| %sz_imm 10:2 sz:3         | expand_sz_imm(extract(i, 10, 2) << 3 |      |
80|   !function=expand_sz_imm |               extract(a->sz, 0, 3))         |
81+---------------------------+---------------------------------------------+
82
83Argument Sets
84=============
85
86Syntax::
87
88  args_def    := '&' identifier ( args_elt )+ ( !extern )?
89  args_elt    := identifier (':' identifier)?
90
91Each *args_elt* defines an argument within the argument set.
92If the form of the *args_elt* contains a colon, the first
93identifier is the argument name and the second identifier is
94the argument type.  If the colon is missing, the argument
95type will be ``int``.
96
97Each argument set will be rendered as a C structure "arg_$name"
98with each of the fields being one of the member arguments.
99
100If ``!extern`` is specified, the backing structure is assumed
101to have been already declared, typically via a second decoder.
102
103Argument sets are useful when one wants to define helper functions
104for the translator functions that can perform operations on a common
105set of arguments.  This can ensure, for instance, that the ``AND``
106pattern and the ``OR`` pattern put their operands into the same named
107structure, so that a common ``gen_logic_insn`` may be able to handle
108the operations common between the two.
109
110Argument set examples::
111
112  &reg3       ra rb rc
113  &loadstore  reg base offset
114  &longldst   reg base offset:int64_t
115
116
117Formats
118=======
119
120Syntax::
121
122  fmt_def      := '@' identifier ( fmt_elt )+
123  fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
124  fixedbit_elt := [01.-]+
125  field_elt    := identifier ':' 's'? number
126  field_ref    := '%' identifier | identifier '=' '%' identifier
127  args_ref     := '&' identifier
128
129Defining a format is a handy way to avoid replicating groups of fields
130across many instruction patterns.
131
132A *fixedbit_elt* describes a contiguous sequence of bits that must
133be 1, 0, or don't care.  The difference between '.' and '-'
134is that '.' means that the bit will be covered with a field or a
135final 0 or 1 from the pattern, and '-' means that the bit is really
136ignored by the cpu and will not be specified.
137
138A *field_elt* describes a simple field only given a width; the position of
139the field is implied by its position with respect to other *fixedbit_elt*
140and *field_elt*.
141
142If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
143Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.
144
145A *field_ref* incorporates a field by reference.  This is the only way to
146add a complex field to a format.  A field may be renamed in the process
147via assignment to another identifier.  This is intended to allow the
148same argument set be used with disjoint named fields.
149
150A single *args_ref* may specify an argument set to use for the format.
151The set of fields in the format must be a subset of the arguments in
152the argument set.  If an argument set is not specified, one will be
153inferred from the set of fields.
154
155It is recommended, but not required, that all *field_ref* and *args_ref*
156appear at the end of the line, not interleaving with *fixedbit_elf* or
157*field_elt*.
158
159Format examples::
160
161  @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
162  @opi    ...... ra:5 lit:8    1 ....... rc:5
163
164Patterns
165========
166
167Syntax::
168
169  pat_def      := identifier ( pat_elt )+
170  pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
171  fmt_ref      := '@' identifier
172  const_elt    := identifier '=' number
173
174The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
175A pattern that does not specify a named format will have one inferred
176from a referenced argument set (if present) and the set of fields.
177
178A *const_elt* allows a argument to be set to a constant value.  This may
179come in handy when fields overlap between patterns and one has to
180include the values in the *fixedbit_elt* instead.
181
182The decoder will call a translator function for each pattern matched.
183
184Pattern examples::
185
186  addl_r   010000 ..... ..... .... 0000000 ..... @opr
187  addl_i   010000 ..... ..... .... 0000000 ..... @opi
188
189which will, in part, invoke::
190
191  trans_addl_r(ctx, &arg_opr, insn)
192
193and::
194
195  trans_addl_i(ctx, &arg_opi, insn)
196
197Pattern Groups
198==============
199
200Syntax::
201
202  group            := overlap_group | no_overlap_group
203  overlap_group    := '{' ( pat_def | group )+ '}'
204  no_overlap_group := '[' ( pat_def | group )+ ']'
205
206A *group* begins with a lone open-brace or open-bracket, with all
207subsequent lines indented two spaces, and ending with a lone
208close-brace or close-bracket.  Groups may be nested, increasing the
209required indentation of the lines within the nested group to two
210spaces per nesting level.
211
212Patterns within overlap groups are allowed to overlap.  Conflicts are
213resolved by selecting the patterns in order.  If all of the fixedbits
214for a pattern match, its translate function will be called.  If the
215translate function returns false, then subsequent patterns within the
216group will be matched.
217
218Patterns within no-overlap groups are not allowed to overlap, just
219the same as ungrouped patterns.  Thus no-overlap groups are intended
220to be nested inside overlap groups.
221
222The following example from PA-RISC shows specialization of the *or*
223instruction::
224
225  {
226    {
227      nop   000010 ----- ----- 0000 001001 0 00000
228      copy  000010 00000 r1:5  0000 001001 0 rt:5
229    }
230    or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5
231  }
232
233When the *cf* field is zero, the instruction has no side effects,
234and may be specialized.  When the *rt* field is zero, the output
235is discarded and so the instruction has no effect.  When the *rt2*
236field is zero, the operation is ``reg[r1] | 0`` and so encodes
237the canonical register copy operation.
238
239The output from the generator might look like::
240
241  switch (insn & 0xfc000fe0) {
242  case 0x08000240:
243    /* 000010.. ........ ....0010 010..... */
244    if ((insn & 0x0000f000) == 0x00000000) {
245        /* 000010.. ........ 00000010 010..... */
246        if ((insn & 0x0000001f) == 0x00000000) {
247            /* 000010.. ........ 00000010 01000000 */
248            extract_decode_Fmt_0(&u.f_decode0, insn);
249            if (trans_nop(ctx, &u.f_decode0)) return true;
250        }
251        if ((insn & 0x03e00000) == 0x00000000) {
252            /* 00001000 000..... 00000010 010..... */
253            extract_decode_Fmt_1(&u.f_decode1, insn);
254            if (trans_copy(ctx, &u.f_decode1)) return true;
255        }
256    }
257    extract_decode_Fmt_2(&u.f_decode2, insn);
258    if (trans_or(ctx, &u.f_decode2)) return true;
259    return false;
260  }
261