1:mod:`uctypes` -- access binary data in a structured way
2========================================================
3
4.. module:: uctypes
5   :synopsis: access binary data in a structured way
6
7This module implements "foreign data interface" for MicroPython. The idea
8behind it is similar to CPython's ``ctypes`` modules, but the actual API is
9different, streamlined and optimized for small size. The basic idea of the
10module is to define data structure layout with about the same power as the
11C language allows, and then access it using familiar dot-syntax to reference
12sub-fields.
13
14.. warning::
15
16    ``uctypes`` module allows access to arbitrary memory addresses of the
17    machine (including I/O and control registers). Uncareful usage of it
18    may lead to crashes, data loss, and even hardware malfunction.
19
20.. seealso::
21
22    Module :mod:`struct`
23        Standard Python way to access binary data structures (doesn't scale
24        well to large and complex structures).
25
26Usage examples::
27
28    import uctypes
29
30    # Example 1: Subset of ELF file header
31    # https://wikipedia.org/wiki/Executable_and_Linkable_Format#File_header
32    ELF_HEADER = {
33        "EI_MAG": (0x0 | uctypes.ARRAY, 4 | uctypes.UINT8),
34        "EI_DATA": 0x5 | uctypes.UINT8,
35        "e_machine": 0x12 | uctypes.UINT16,
36    }
37
38    # "f" is an ELF file opened in binary mode
39    buf = f.read(uctypes.sizeof(ELF_HEADER, uctypes.LITTLE_ENDIAN))
40    header = uctypes.struct(uctypes.addressof(buf), ELF_HEADER, uctypes.LITTLE_ENDIAN)
41    assert header.EI_MAG == b"\x7fELF"
42    assert header.EI_DATA == 1, "Oops, wrong endianness. Could retry with uctypes.BIG_ENDIAN."
43    print("machine:", hex(header.e_machine))
44
45
46    # Example 2: In-memory data structure, with pointers
47    COORD = {
48        "x": 0 | uctypes.FLOAT32,
49        "y": 4 | uctypes.FLOAT32,
50    }
51
52    STRUCT1 = {
53        "data1": 0 | uctypes.UINT8,
54        "data2": 4 | uctypes.UINT32,
55        "ptr": (8 | uctypes.PTR, COORD),
56    }
57
58    # Suppose you have address of a structure of type STRUCT1 in "addr"
59    # uctypes.NATIVE is optional (used by default)
60    struct1 = uctypes.struct(addr, STRUCT1, uctypes.NATIVE)
61    print("x:", struct1.ptr[0].x)
62
63
64    # Example 3: Access to CPU registers. Subset of STM32F4xx WWDG block
65    WWDG_LAYOUT = {
66        "WWDG_CR": (0, {
67            # BFUINT32 here means size of the WWDG_CR register
68            "WDGA": 7 << uctypes.BF_POS | 1 << uctypes.BF_LEN | uctypes.BFUINT32,
69            "T": 0 << uctypes.BF_POS | 7 << uctypes.BF_LEN | uctypes.BFUINT32,
70        }),
71        "WWDG_CFR": (4, {
72            "EWI": 9 << uctypes.BF_POS | 1 << uctypes.BF_LEN | uctypes.BFUINT32,
73            "WDGTB": 7 << uctypes.BF_POS | 2 << uctypes.BF_LEN | uctypes.BFUINT32,
74            "W": 0 << uctypes.BF_POS | 7 << uctypes.BF_LEN | uctypes.BFUINT32,
75        }),
76    }
77
78    WWDG = uctypes.struct(0x40002c00, WWDG_LAYOUT)
79
80    WWDG.WWDG_CFR.WDGTB = 0b10
81    WWDG.WWDG_CR.WDGA = 1
82    print("Current counter:", WWDG.WWDG_CR.T)
83
84Defining structure layout
85-------------------------
86
87Structure layout is defined by a "descriptor" - a Python dictionary which
88encodes field names as keys and other properties required to access them as
89associated values::
90
91    {
92        "field1": <properties>,
93        "field2": <properties>,
94        ...
95    }
96
97Currently, ``uctypes`` requires explicit specification of offsets for each
98field. Offset are given in bytes from the structure start.
99
100Following are encoding examples for various field types:
101
102* Scalar types::
103
104    "field_name": offset | uctypes.UINT32
105
106  in other words, the value is a scalar type identifier ORed with a field offset
107  (in bytes) from the start of the structure.
108
109* Recursive structures::
110
111    "sub": (offset, {
112        "b0": 0 | uctypes.UINT8,
113        "b1": 1 | uctypes.UINT8,
114    })
115
116  i.e. value is a 2-tuple, first element of which is an offset, and second is
117  a structure descriptor dictionary (note: offsets in recursive descriptors
118  are relative to the structure it defines). Of course, recursive structures
119  can be specified not just by a literal dictionary, but by referring to a
120  structure descriptor dictionary (defined earlier) by name.
121
122* Arrays of primitive types::
123
124      "arr": (offset | uctypes.ARRAY, size | uctypes.UINT8),
125
126  i.e. value is a 2-tuple, first element of which is ARRAY flag ORed
127  with offset, and second is scalar element type ORed number of elements
128  in the array.
129
130* Arrays of aggregate types::
131
132    "arr2": (offset | uctypes.ARRAY, size, {"b": 0 | uctypes.UINT8}),
133
134  i.e. value is a 3-tuple, first element of which is ARRAY flag ORed
135  with offset, second is a number of elements in the array, and third is
136  a descriptor of element type.
137
138* Pointer to a primitive type::
139
140    "ptr": (offset | uctypes.PTR, uctypes.UINT8),
141
142  i.e. value is a 2-tuple, first element of which is PTR flag ORed
143  with offset, and second is a scalar element type.
144
145* Pointer to an aggregate type::
146
147    "ptr2": (offset | uctypes.PTR, {"b": 0 | uctypes.UINT8}),
148
149  i.e. value is a 2-tuple, first element of which is PTR flag ORed
150  with offset, second is a descriptor of type pointed to.
151
152* Bitfields::
153
154    "bitf0": offset | uctypes.BFUINT16 | lsbit << uctypes.BF_POS | bitsize << uctypes.BF_LEN,
155
156  i.e. value is a type of scalar value containing given bitfield (typenames are
157  similar to scalar types, but prefixes with ``BF``), ORed with offset for
158  scalar value containing the bitfield, and further ORed with values for
159  bit position and bit length of the bitfield within the scalar value, shifted by
160  BF_POS and BF_LEN bits, respectively. A bitfield position is counted
161  from the least significant bit of the scalar (having position of 0), and
162  is the number of right-most bit of a field (in other words, it's a number
163  of bits a scalar needs to be shifted right to extract the bitfield).
164
165  In the example above, first a UINT16 value will be extracted at offset 0
166  (this detail may be important when accessing hardware registers, where
167  particular access size and alignment are required), and then bitfield
168  whose rightmost bit is *lsbit* bit of this UINT16, and length
169  is *bitsize* bits, will be extracted. For example, if *lsbit* is 0 and
170  *bitsize* is 8, then effectively it will access least-significant byte
171  of UINT16.
172
173  Note that bitfield operations are independent of target byte endianness,
174  in particular, example above will access least-significant byte of UINT16
175  in both little- and big-endian structures. But it depends on the least
176  significant bit being numbered 0. Some targets may use different
177  numbering in their native ABI, but ``uctypes`` always uses the normalized
178  numbering described above.
179
180Module contents
181---------------
182
183.. class:: struct(addr, descriptor, layout_type=NATIVE, /)
184
185   Instantiate a "foreign data structure" object based on structure address in
186   memory, descriptor (encoded as a dictionary), and layout type (see below).
187
188.. data:: LITTLE_ENDIAN
189
190   Layout type for a little-endian packed structure. (Packed means that every
191   field occupies exactly as many bytes as defined in the descriptor, i.e.
192   the alignment is 1).
193
194.. data:: BIG_ENDIAN
195
196   Layout type for a big-endian packed structure.
197
198.. data:: NATIVE
199
200   Layout type for a native structure - with data endianness and alignment
201   conforming to the ABI of the system on which MicroPython runs.
202
203.. function:: sizeof(struct, layout_type=NATIVE, /)
204
205   Return size of data structure in bytes. The *struct* argument can be
206   either a structure class or a specific instantiated structure object
207   (or its aggregate field).
208
209.. function:: addressof(obj)
210
211   Return address of an object. Argument should be bytes, bytearray or
212   other object supporting buffer protocol (and address of this buffer
213   is what actually returned).
214
215.. function:: bytes_at(addr, size)
216
217   Capture memory at the given address and size as bytes object. As bytes
218   object is immutable, memory is actually duplicated and copied into
219   bytes object, so if memory contents change later, created object
220   retains original value.
221
222.. function:: bytearray_at(addr, size)
223
224   Capture memory at the given address and size as bytearray object.
225   Unlike bytes_at() function above, memory is captured by reference,
226   so it can be both written too, and you will access current value
227   at the given memory address.
228
229.. data:: UINT8
230          INT8
231          UINT16
232          INT16
233          UINT32
234          INT32
235          UINT64
236          INT64
237
238   Integer types for structure descriptors. Constants for 8, 16, 32,
239   and 64 bit types are provided, both signed and unsigned.
240
241.. data:: FLOAT32
242          FLOAT64
243
244   Floating-point types for structure descriptors.
245
246.. data:: VOID
247
248   ``VOID`` is an alias for ``UINT8``, and is provided to conveniently define
249   C's void pointers: ``(uctypes.PTR, uctypes.VOID)``.
250
251.. data:: PTR
252          ARRAY
253
254   Type constants for pointers and arrays. Note that there is no explicit
255   constant for structures, it's implicit: an aggregate type without ``PTR``
256   or ``ARRAY`` flags is a structure.
257
258Structure descriptors and instantiating structure objects
259---------------------------------------------------------
260
261Given a structure descriptor dictionary and its layout type, you can
262instantiate a specific structure instance at a given memory address
263using :class:`uctypes.struct()` constructor. Memory address usually comes from
264following sources:
265
266* Predefined address, when accessing hardware registers on a baremetal
267  system. Lookup these addresses in datasheet for a particular MCU/SoC.
268* As a return value from a call to some FFI (Foreign Function Interface)
269  function.
270* From `uctypes.addressof()`, when you want to pass arguments to an FFI
271  function, or alternatively, to access some data for I/O (for example,
272  data read from a file or network socket).
273
274Structure objects
275-----------------
276
277Structure objects allow accessing individual fields using standard dot
278notation: ``my_struct.substruct1.field1``. If a field is of scalar type,
279getting it will produce a primitive value (Python integer or float)
280corresponding to the value contained in a field. A scalar field can also
281be assigned to.
282
283If a field is an array, its individual elements can be accessed with
284the standard subscript operator ``[]`` - both read and assigned to.
285
286If a field is a pointer, it can be dereferenced using ``[0]`` syntax
287(corresponding to C ``*`` operator, though ``[0]`` works in C too).
288Subscripting a pointer with other integer values but 0 are also supported,
289with the same semantics as in C.
290
291Summing up, accessing structure fields generally follows the C syntax,
292except for pointer dereference, when you need to use ``[0]`` operator
293instead of ``*``.
294
295Limitations
296-----------
297
2981. Accessing non-scalar fields leads to allocation of intermediate objects
299to represent them. This means that special care should be taken to
300layout a structure which needs to be accessed when memory allocation
301is disabled (e.g. from an interrupt). The recommendations are:
302
303* Avoid accessing nested structures. For example, instead of
304  ``mcu_registers.peripheral_a.register1``, define separate layout
305  descriptors for each peripheral, to be accessed as
306  ``peripheral_a.register1``. Or just cache a particular peripheral:
307  ``peripheral_a = mcu_registers.peripheral_a``. If a register
308  consists of multiple bitfields, you would need to cache references
309  to a particular register: ``reg_a = mcu_registers.peripheral_a.reg_a``.
310* Avoid other non-scalar data, like arrays. For example, instead of
311  ``peripheral_a.register[0]`` use ``peripheral_a.register0``. Again,
312  an alternative is to cache intermediate values, e.g.
313  ``register0 = peripheral_a.register[0]``.
314
3152. Range of offsets supported by the ``uctypes`` module is limited.
316The exact range supported is considered an implementation detail,
317and the general suggestion is to split structure definitions to
318cover from a few kilobytes to a few dozen of kilobytes maximum.
319In most cases, this is a natural situation anyway, e.g. it doesn't make
320sense to define all registers of an MCU (spread over 32-bit address
321space) in one structure, but rather a peripheral block by peripheral
322block. In some extreme cases, you may need to split a structure in
323several parts artificially (e.g. if accessing native data structure
324with multi-megabyte array in the middle, though that would be a very
325synthetic case).
326