xref: /qemu/docs/devel/decodetree.rst (revision a81df1b6)
1========================
2Decodetree Specification
3========================
4
5A *decodetree* is built from instruction *patterns*.  A pattern may
6represent a single architectural instruction or a group of same, depending
7on what is convenient for further processing.
8
9Each pattern has both *fixedbits* and *fixedmask*, the combination of which
10describes the condition under which the pattern is matched::
11
12  (insn & fixedmask) == fixedbits
13
14Each pattern may have *fields*, which are extracted from the insn and
15passed along to the translator.  Examples of such are registers,
16immediates, and sub-opcodes.
17
18In support of patterns, one may declare *fields*, *argument sets*, and
19*formats*, each of which may be re-used to simplify further definitions.
20
21Fields
22======
23
24Syntax::
25
26  field_def     := '%' identifier ( unnamed_field )* ( !function=identifier )?
27  unnamed_field := number ':' ( 's' ) number
28
29For *unnamed_field*, the first number is the least-significant bit position
30of the field and the second number is the length of the field.  If the 's' is
31present, the field is considered signed.  If multiple ``unnamed_fields`` are
32present, they are concatenated.  In this way one can define disjoint fields.
33
34If ``!function`` is specified, the concatenated result is passed through the
35named function, taking and returning an integral value.
36
37One may use ``!function`` with zero ``unnamed_fields``.  This case is called
38a *parameter*, and the named function is only passed the ``DisasContext``
39and returns an integral value extracted from there.
40
41A field with no ``unnamed_fields`` and no ``!function`` is in error.
42
43FIXME: the fields of the structure into which this result will be stored
44is restricted to ``int``.  Which means that we cannot expand 64-bit items.
45
46Field examples:
47
48+---------------------------+---------------------------------------------+
49| Input                     | Generated code                              |
50+===========================+=============================================+
51| %disp   0:s16             | sextract(i, 0, 16)                          |
52+---------------------------+---------------------------------------------+
53| %imm9   16:6 10:3         | extract(i, 16, 6) << 3 | extract(i, 10, 3)  |
54+---------------------------+---------------------------------------------+
55| %disp12 0:s1 1:1 2:10     | sextract(i, 0, 1) << 11 |                   |
56|                           |    extract(i, 1, 1) << 10 |                 |
57|                           |    extract(i, 2, 10)                        |
58+---------------------------+---------------------------------------------+
59| %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      |
60|   !function=expand_shimm8 |               extract(i, 13, 1))            |
61+---------------------------+---------------------------------------------+
62
63Argument Sets
64=============
65
66Syntax::
67
68  args_def    := '&' identifier ( args_elt )+ ( !extern )?
69  args_elt    := identifier
70
71Each *args_elt* defines an argument within the argument set.
72Each argument set will be rendered as a C structure "arg_$name"
73with each of the fields being one of the member arguments.
74
75If ``!extern`` is specified, the backing structure is assumed
76to have been already declared, typically via a second decoder.
77
78Argument sets are useful when one wants to define helper functions
79for the translator functions that can perform operations on a common
80set of arguments.  This can ensure, for instance, that the ``AND``
81pattern and the ``OR`` pattern put their operands into the same named
82structure, so that a common ``gen_logic_insn`` may be able to handle
83the operations common between the two.
84
85Argument set examples::
86
87  &reg3       ra rb rc
88  &loadstore  reg base offset
89
90
91Formats
92=======
93
94Syntax::
95
96  fmt_def      := '@' identifier ( fmt_elt )+
97  fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
98  fixedbit_elt := [01.-]+
99  field_elt    := identifier ':' 's'? number
100  field_ref    := '%' identifier | identifier '=' '%' identifier
101  args_ref     := '&' identifier
102
103Defining a format is a handy way to avoid replicating groups of fields
104across many instruction patterns.
105
106A *fixedbit_elt* describes a contiguous sequence of bits that must
107be 1, 0, or don't care.  The difference between '.' and '-'
108is that '.' means that the bit will be covered with a field or a
109final 0 or 1 from the pattern, and '-' means that the bit is really
110ignored by the cpu and will not be specified.
111
112A *field_elt* describes a simple field only given a width; the position of
113the field is implied by its position with respect to other *fixedbit_elt*
114and *field_elt*.
115
116If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
117Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.
118
119A *field_ref* incorporates a field by reference.  This is the only way to
120add a complex field to a format.  A field may be renamed in the process
121via assignment to another identifier.  This is intended to allow the
122same argument set be used with disjoint named fields.
123
124A single *args_ref* may specify an argument set to use for the format.
125The set of fields in the format must be a subset of the arguments in
126the argument set.  If an argument set is not specified, one will be
127inferred from the set of fields.
128
129It is recommended, but not required, that all *field_ref* and *args_ref*
130appear at the end of the line, not interleaving with *fixedbit_elf* or
131*field_elt*.
132
133Format examples::
134
135  @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
136  @opi    ...... ra:5 lit:8    1 ....... rc:5
137
138Patterns
139========
140
141Syntax::
142
143  pat_def      := identifier ( pat_elt )+
144  pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
145  fmt_ref      := '@' identifier
146  const_elt    := identifier '=' number
147
148The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
149A pattern that does not specify a named format will have one inferred
150from a referenced argument set (if present) and the set of fields.
151
152A *const_elt* allows a argument to be set to a constant value.  This may
153come in handy when fields overlap between patterns and one has to
154include the values in the *fixedbit_elt* instead.
155
156The decoder will call a translator function for each pattern matched.
157
158Pattern examples::
159
160  addl_r   010000 ..... ..... .... 0000000 ..... @opr
161  addl_i   010000 ..... ..... .... 0000000 ..... @opi
162
163which will, in part, invoke::
164
165  trans_addl_r(ctx, &arg_opr, insn)
166
167and::
168
169  trans_addl_i(ctx, &arg_opi, insn)
170
171Pattern Groups
172==============
173
174Syntax::
175
176  group            := overlap_group | no_overlap_group
177  overlap_group    := '{' ( pat_def | group )+ '}'
178  no_overlap_group := '[' ( pat_def | group )+ ']'
179
180A *group* begins with a lone open-brace or open-bracket, with all
181subsequent lines indented two spaces, and ending with a lone
182close-brace or close-bracket.  Groups may be nested, increasing the
183required indentation of the lines within the nested group to two
184spaces per nesting level.
185
186Patterns within overlap groups are allowed to overlap.  Conflicts are
187resolved by selecting the patterns in order.  If all of the fixedbits
188for a pattern match, its translate function will be called.  If the
189translate function returns false, then subsequent patterns within the
190group will be matched.
191
192Patterns within no-overlap groups are not allowed to overlap, just
193the same as ungrouped patterns.  Thus no-overlap groups are intended
194to be nested inside overlap groups.
195
196The following example from PA-RISC shows specialization of the *or*
197instruction::
198
199  {
200    {
201      nop   000010 ----- ----- 0000 001001 0 00000
202      copy  000010 00000 r1:5  0000 001001 0 rt:5
203    }
204    or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5
205  }
206
207When the *cf* field is zero, the instruction has no side effects,
208and may be specialized.  When the *rt* field is zero, the output
209is discarded and so the instruction has no effect.  When the *rt2*
210field is zero, the operation is ``reg[r1] | 0`` and so encodes
211the canonical register copy operation.
212
213The output from the generator might look like::
214
215  switch (insn & 0xfc000fe0) {
216  case 0x08000240:
217    /* 000010.. ........ ....0010 010..... */
218    if ((insn & 0x0000f000) == 0x00000000) {
219        /* 000010.. ........ 00000010 010..... */
220        if ((insn & 0x0000001f) == 0x00000000) {
221            /* 000010.. ........ 00000010 01000000 */
222            extract_decode_Fmt_0(&u.f_decode0, insn);
223            if (trans_nop(ctx, &u.f_decode0)) return true;
224        }
225        if ((insn & 0x03e00000) == 0x00000000) {
226            /* 00001000 000..... 00000010 010..... */
227            extract_decode_Fmt_1(&u.f_decode1, insn);
228            if (trans_copy(ctx, &u.f_decode1)) return true;
229        }
230    }
231    extract_decode_Fmt_2(&u.f_decode2, insn);
232    if (trans_or(ctx, &u.f_decode2)) return true;
233    return false;
234  }
235