1.. highlight:: c
2
3.. _unicodeobjects:
4
5Unicode Objects and Codecs
6--------------------------
7
8.. sectionauthor:: Marc-André Lemburg <mal@lemburg.com>
9.. sectionauthor:: Georg Brandl <georg@python.org>
10
11Unicode Objects
12^^^^^^^^^^^^^^^
13
14Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
15use a variety of representations, in order to allow handling the complete range
16of Unicode characters while staying memory efficient.  There are special cases
17for strings where all code points are below 128, 256, or 65536; otherwise, code
18points must be below 1114112 (which is the full Unicode range).
19
20:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
21in the Unicode object.  The :c:type:`Py_UNICODE*` representation is deprecated
22and inefficient.
23
24Due to the transition between the old APIs and the new APIs, Unicode objects
25can internally be in two states depending on how they were created:
26
27* "canonical" Unicode objects are all objects created by a non-deprecated
28  Unicode API.  They use the most efficient representation allowed by the
29  implementation.
30
31* "legacy" Unicode objects have been created through one of the deprecated
32  APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
33  :c:type:`Py_UNICODE*` representation; you will have to call
34  :c:func:`PyUnicode_READY` on them before calling any other API.
35
36.. note::
37   The "legacy" Unicode object will be removed in Python 3.12 with deprecated
38   APIs. All Unicode objects will be "canonical" since then. See :pep:`623`
39   for more information.
40
41
42Unicode Type
43""""""""""""
44
45These are the basic Unicode object types used for the Unicode implementation in
46Python:
47
48.. c:type:: Py_UCS4
49            Py_UCS2
50            Py_UCS1
51
52   These types are typedefs for unsigned integer types wide enough to contain
53   characters of 32 bits, 16 bits and 8 bits, respectively.  When dealing with
54   single Unicode characters, use :c:type:`Py_UCS4`.
55
56   .. versionadded:: 3.3
57
58
59.. c:type:: Py_UNICODE
60
61   This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
62   depending on the platform.
63
64   .. versionchanged:: 3.3
65      In previous versions, this was a 16-bit type or a 32-bit type depending on
66      whether you selected a "narrow" or "wide" Unicode version of Python at
67      build time.
68
69
70.. c:type:: PyASCIIObject
71            PyCompactUnicodeObject
72            PyUnicodeObject
73
74   These subtypes of :c:type:`PyObject` represent a Python Unicode object.  In
75   almost all cases, they shouldn't be used directly, since all API functions
76   that deal with Unicode objects take and return :c:type:`PyObject` pointers.
77
78   .. versionadded:: 3.3
79
80
81.. c:var:: PyTypeObject PyUnicode_Type
82
83   This instance of :c:type:`PyTypeObject` represents the Python Unicode type.  It
84   is exposed to Python code as ``str``.
85
86
87The following APIs are really C macros and can be used to do fast checks and to
88access internal read-only data of Unicode objects:
89
90.. c:function:: int PyUnicode_Check(PyObject *o)
91
92   Return true if the object *o* is a Unicode object or an instance of a Unicode
93   subtype.
94
95
96.. c:function:: int PyUnicode_CheckExact(PyObject *o)
97
98   Return true if the object *o* is a Unicode object, but not an instance of a
99   subtype.
100
101
102.. c:function:: int PyUnicode_READY(PyObject *o)
103
104   Ensure the string object *o* is in the "canonical" representation.  This is
105   required before using any of the access macros described below.
106
107   .. XXX expand on when it is not required
108
109   Returns ``0`` on success and ``-1`` with an exception set on failure, which in
110   particular happens if memory allocation fails.
111
112   .. versionadded:: 3.3
113
114   .. deprecated-removed:: 3.10 3.12
115      This API will be removed with :c:func:`PyUnicode_FromUnicode`.
116
117
118.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
119
120   Return the length of the Unicode string, in code points.  *o* has to be a
121   Unicode object in the "canonical" representation (not checked).
122
123   .. versionadded:: 3.3
124
125
126.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
127                Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
128                Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
129
130   Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
131   integer types for direct character access.  No checks are performed if the
132   canonical representation has the correct character size; use
133   :c:func:`PyUnicode_KIND` to select the right macro.  Make sure
134   :c:func:`PyUnicode_READY` has been called before accessing this.
135
136   .. versionadded:: 3.3
137
138
139.. c:macro:: PyUnicode_WCHAR_KIND
140             PyUnicode_1BYTE_KIND
141             PyUnicode_2BYTE_KIND
142             PyUnicode_4BYTE_KIND
143
144   Return values of the :c:func:`PyUnicode_KIND` macro.
145
146   .. versionadded:: 3.3
147
148   .. deprecated-removed:: 3.10 3.12
149      ``PyUnicode_WCHAR_KIND`` is deprecated.
150
151
152.. c:function:: int PyUnicode_KIND(PyObject *o)
153
154   Return one of the PyUnicode kind constants (see above) that indicate how many
155   bytes per character this Unicode object uses to store its data.  *o* has to
156   be a Unicode object in the "canonical" representation (not checked).
157
158   .. XXX document "0" return value?
159
160   .. versionadded:: 3.3
161
162
163.. c:function:: void* PyUnicode_DATA(PyObject *o)
164
165   Return a void pointer to the raw Unicode buffer.  *o* has to be a Unicode
166   object in the "canonical" representation (not checked).
167
168   .. versionadded:: 3.3
169
170
171.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
172                                     Py_UCS4 value)
173
174   Write into a canonical representation *data* (as obtained with
175   :c:func:`PyUnicode_DATA`).  This macro does not do any sanity checks and is
176   intended for usage in loops.  The caller should cache the *kind* value and
177   *data* pointer as obtained from other macro calls.  *index* is the index in
178   the string (starts at 0) and *value* is the new code point value which should
179   be written to that location.
180
181   .. versionadded:: 3.3
182
183
184.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
185
186   Read a code point from a canonical representation *data* (as obtained with
187   :c:func:`PyUnicode_DATA`).  No checks or ready calls are performed.
188
189   .. versionadded:: 3.3
190
191
192.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
193
194   Read a character from a Unicode object *o*, which must be in the "canonical"
195   representation.  This is less efficient than :c:func:`PyUnicode_READ` if you
196   do multiple consecutive reads.
197
198   .. versionadded:: 3.3
199
200
201.. c:macro:: PyUnicode_MAX_CHAR_VALUE(o)
202
203   Return the maximum code point that is suitable for creating another string
204   based on *o*, which must be in the "canonical" representation.  This is
205   always an approximation but more efficient than iterating over the string.
206
207   .. versionadded:: 3.3
208
209
210.. c:function:: int PyUnicode_ClearFreeList()
211
212   Clear the free list. Return the total number of freed items.
213
214
215.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
216
217   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
218   code units (this includes surrogate pairs as 2 units).  *o* has to be a
219   Unicode object (not checked).
220
221   .. deprecated-removed:: 3.3 3.12
222      Part of the old-style Unicode API, please migrate to using
223      :c:func:`PyUnicode_GET_LENGTH`.
224
225
226.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
227
228   Return the size of the deprecated :c:type:`Py_UNICODE` representation in
229   bytes.  *o* has to be a Unicode object (not checked).
230
231   .. deprecated-removed:: 3.3 3.12
232      Part of the old-style Unicode API, please migrate to using
233      :c:func:`PyUnicode_GET_LENGTH`.
234
235
236.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
237                const char* PyUnicode_AS_DATA(PyObject *o)
238
239   Return a pointer to a :c:type:`Py_UNICODE` representation of the object.  The
240   returned buffer is always terminated with an extra null code point.  It
241   may also contain embedded null code points, which would cause the string
242   to be truncated when used in most C functions.  The ``AS_DATA`` form
243   casts the pointer to :c:type:`const char *`.  The *o* argument has to be
244   a Unicode object (not checked).
245
246   .. versionchanged:: 3.3
247      This macro is now inefficient -- because in many cases the
248      :c:type:`Py_UNICODE` representation does not exist and needs to be created
249      -- and can fail (return ``NULL`` with an exception set).  Try to port the
250      code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
251      :c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
252
253   .. deprecated-removed:: 3.3 3.12
254      Part of the old-style Unicode API, please migrate to using the
255      :c:func:`PyUnicode_nBYTE_DATA` family of macros.
256
257
258Unicode Character Properties
259""""""""""""""""""""""""""""
260
261Unicode provides many different character properties. The most often needed ones
262are available through these macros which are mapped to C functions depending on
263the Python configuration.
264
265
266.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
267
268   Return ``1`` or ``0`` depending on whether *ch* is a whitespace character.
269
270
271.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
272
273   Return ``1`` or ``0`` depending on whether *ch* is a lowercase character.
274
275
276.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
277
278   Return ``1`` or ``0`` depending on whether *ch* is an uppercase character.
279
280
281.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
282
283   Return ``1`` or ``0`` depending on whether *ch* is a titlecase character.
284
285
286.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
287
288   Return ``1`` or ``0`` depending on whether *ch* is a linebreak character.
289
290
291.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
292
293   Return ``1`` or ``0`` depending on whether *ch* is a decimal character.
294
295
296.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
297
298   Return ``1`` or ``0`` depending on whether *ch* is a digit character.
299
300
301.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
302
303   Return ``1`` or ``0`` depending on whether *ch* is a numeric character.
304
305
306.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
307
308   Return ``1`` or ``0`` depending on whether *ch* is an alphabetic character.
309
310
311.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
312
313   Return ``1`` or ``0`` depending on whether *ch* is an alphanumeric character.
314
315
316.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
317
318   Return ``1`` or ``0`` depending on whether *ch* is a printable character.
319   Nonprintable characters are those characters defined in the Unicode character
320   database as "Other" or "Separator", excepting the ASCII space (0x20) which is
321   considered printable.  (Note that printable characters in this context are
322   those which should not be escaped when :func:`repr` is invoked on a string.
323   It has no bearing on the handling of strings written to :data:`sys.stdout` or
324   :data:`sys.stderr`.)
325
326
327These APIs can be used for fast direct character conversions:
328
329
330.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
331
332   Return the character *ch* converted to lower case.
333
334   .. deprecated:: 3.3
335      This function uses simple case mappings.
336
337
338.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
339
340   Return the character *ch* converted to upper case.
341
342   .. deprecated:: 3.3
343      This function uses simple case mappings.
344
345
346.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
347
348   Return the character *ch* converted to title case.
349
350   .. deprecated:: 3.3
351      This function uses simple case mappings.
352
353
354.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
355
356   Return the character *ch* converted to a decimal positive integer.  Return
357   ``-1`` if this is not possible.  This macro does not raise exceptions.
358
359
360.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
361
362   Return the character *ch* converted to a single digit integer. Return ``-1`` if
363   this is not possible.  This macro does not raise exceptions.
364
365
366.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
367
368   Return the character *ch* converted to a double. Return ``-1.0`` if this is not
369   possible.  This macro does not raise exceptions.
370
371
372These APIs can be used to work with surrogates:
373
374.. c:macro:: Py_UNICODE_IS_SURROGATE(ch)
375
376   Check if *ch* is a surrogate (``0xD800 <= ch <= 0xDFFF``).
377
378.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch)
379
380   Check if *ch* is a high surrogate (``0xD800 <= ch <= 0xDBFF``).
381
382.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch)
383
384   Check if *ch* is a low surrogate (``0xDC00 <= ch <= 0xDFFF``).
385
386.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low)
387
388   Join two surrogate characters and return a single Py_UCS4 value.
389   *high* and *low* are respectively the leading and trailing surrogates in a
390   surrogate pair.
391
392
393Creating and accessing Unicode strings
394""""""""""""""""""""""""""""""""""""""
395
396To create Unicode objects and access their basic sequence properties, use these
397APIs:
398
399.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
400
401   Create a new Unicode object.  *maxchar* should be the true maximum code point
402   to be placed in the string.  As an approximation, it can be rounded up to the
403   nearest value in the sequence 127, 255, 65535, 1114111.
404
405   This is the recommended way to allocate a new Unicode object.  Objects
406   created using this function are not resizable.
407
408   .. versionadded:: 3.3
409
410
411.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
412                                                    Py_ssize_t size)
413
414   Create a new Unicode object with the given *kind* (possible values are
415   :c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
416   :c:func:`PyUnicode_KIND`).  The *buffer* must point to an array of *size*
417   units of 1, 2 or 4 bytes per character, as given by the kind.
418
419   .. versionadded:: 3.3
420
421
422.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
423
424   Create a Unicode object from the char buffer *u*.  The bytes will be
425   interpreted as being UTF-8 encoded.  The buffer is copied into the new
426   object. If the buffer is not ``NULL``, the return value might be a shared
427   object, i.e. modification of the data is not allowed.
428
429   If *u* is ``NULL``, this function behaves like :c:func:`PyUnicode_FromUnicode`
430   with the buffer set to ``NULL``.  This usage is deprecated in favor of
431   :c:func:`PyUnicode_New`, and will be removed in Python 3.12.
432
433
434.. c:function:: PyObject *PyUnicode_FromString(const char *u)
435
436   Create a Unicode object from a UTF-8 encoded null-terminated char buffer
437   *u*.
438
439
440.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
441
442   Take a C :c:func:`printf`\ -style *format* string and a variable number of
443   arguments, calculate the size of the resulting Python Unicode string and return
444   a string with the values formatted into it.  The variable arguments must be C
445   types and must correspond exactly to the format characters in the *format*
446   ASCII-encoded string. The following format characters are allowed:
447
448   .. % This should be exactly the same as the table in PyErr_Format.
449   .. % The descriptions for %zd and %zu are wrong, but the truth is complicated
450   .. % because not all compilers support the %z width modifier -- we fake it
451   .. % when necessary via interpolating PY_FORMAT_SIZE_T.
452   .. % Similar comments apply to the %ll width modifier and
453
454   .. tabularcolumns:: |l|l|L|
455
456   +-------------------+---------------------+----------------------------------+
457   | Format Characters | Type                | Comment                          |
458   +===================+=====================+==================================+
459   | :attr:`%%`        | *n/a*               | The literal % character.         |
460   +-------------------+---------------------+----------------------------------+
461   | :attr:`%c`        | int                 | A single character,              |
462   |                   |                     | represented as a C int.          |
463   +-------------------+---------------------+----------------------------------+
464   | :attr:`%d`        | int                 | Equivalent to                    |
465   |                   |                     | ``printf("%d")``. [1]_           |
466   +-------------------+---------------------+----------------------------------+
467   | :attr:`%u`        | unsigned int        | Equivalent to                    |
468   |                   |                     | ``printf("%u")``. [1]_           |
469   +-------------------+---------------------+----------------------------------+
470   | :attr:`%ld`       | long                | Equivalent to                    |
471   |                   |                     | ``printf("%ld")``. [1]_          |
472   +-------------------+---------------------+----------------------------------+
473   | :attr:`%li`       | long                | Equivalent to                    |
474   |                   |                     | ``printf("%li")``. [1]_          |
475   +-------------------+---------------------+----------------------------------+
476   | :attr:`%lu`       | unsigned long       | Equivalent to                    |
477   |                   |                     | ``printf("%lu")``. [1]_          |
478   +-------------------+---------------------+----------------------------------+
479   | :attr:`%lld`      | long long           | Equivalent to                    |
480   |                   |                     | ``printf("%lld")``. [1]_         |
481   +-------------------+---------------------+----------------------------------+
482   | :attr:`%lli`      | long long           | Equivalent to                    |
483   |                   |                     | ``printf("%lli")``. [1]_         |
484   +-------------------+---------------------+----------------------------------+
485   | :attr:`%llu`      | unsigned long long  | Equivalent to                    |
486   |                   |                     | ``printf("%llu")``. [1]_         |
487   +-------------------+---------------------+----------------------------------+
488   | :attr:`%zd`       | Py_ssize_t          | Equivalent to                    |
489   |                   |                     | ``printf("%zd")``. [1]_          |
490   +-------------------+---------------------+----------------------------------+
491   | :attr:`%zi`       | Py_ssize_t          | Equivalent to                    |
492   |                   |                     | ``printf("%zi")``. [1]_          |
493   +-------------------+---------------------+----------------------------------+
494   | :attr:`%zu`       | size_t              | Equivalent to                    |
495   |                   |                     | ``printf("%zu")``. [1]_          |
496   +-------------------+---------------------+----------------------------------+
497   | :attr:`%i`        | int                 | Equivalent to                    |
498   |                   |                     | ``printf("%i")``. [1]_           |
499   +-------------------+---------------------+----------------------------------+
500   | :attr:`%x`        | int                 | Equivalent to                    |
501   |                   |                     | ``printf("%x")``. [1]_           |
502   +-------------------+---------------------+----------------------------------+
503   | :attr:`%s`        | const char\*        | A null-terminated C character    |
504   |                   |                     | array.                           |
505   +-------------------+---------------------+----------------------------------+
506   | :attr:`%p`        | const void\*        | The hex representation of a C    |
507   |                   |                     | pointer. Mostly equivalent to    |
508   |                   |                     | ``printf("%p")`` except that     |
509   |                   |                     | it is guaranteed to start with   |
510   |                   |                     | the literal ``0x`` regardless    |
511   |                   |                     | of what the platform's           |
512   |                   |                     | ``printf`` yields.               |
513   +-------------------+---------------------+----------------------------------+
514   | :attr:`%A`        | PyObject\*          | The result of calling            |
515   |                   |                     | :func:`ascii`.                   |
516   +-------------------+---------------------+----------------------------------+
517   | :attr:`%U`        | PyObject\*          | A Unicode object.                |
518   +-------------------+---------------------+----------------------------------+
519   | :attr:`%V`        | PyObject\*,         | A Unicode object (which may be   |
520   |                   | const char\*        | ``NULL``) and a null-terminated  |
521   |                   |                     | C character array as a second    |
522   |                   |                     | parameter (which will be used,   |
523   |                   |                     | if the first parameter is        |
524   |                   |                     | ``NULL``).                       |
525   +-------------------+---------------------+----------------------------------+
526   | :attr:`%S`        | PyObject\*          | The result of calling            |
527   |                   |                     | :c:func:`PyObject_Str`.          |
528   +-------------------+---------------------+----------------------------------+
529   | :attr:`%R`        | PyObject\*          | The result of calling            |
530   |                   |                     | :c:func:`PyObject_Repr`.         |
531   +-------------------+---------------------+----------------------------------+
532
533   An unrecognized format character causes all the rest of the format string to be
534   copied as-is to the result string, and any extra arguments discarded.
535
536   .. note::
537      The width formatter unit is number of characters rather than bytes.
538      The precision formatter unit is number of bytes for ``"%s"`` and
539      ``"%V"`` (if the ``PyObject*`` argument is ``NULL``), and a number of
540      characters for ``"%A"``, ``"%U"``, ``"%S"``, ``"%R"`` and ``"%V"``
541      (if the ``PyObject*`` argument is not ``NULL``).
542
543   .. [1] For integer specifiers (d, u, ld, li, lu, lld, lli, llu, zd, zi,
544      zu, i, x): the 0-conversion flag has effect even when a precision is given.
545
546   .. versionchanged:: 3.2
547      Support for ``"%lld"`` and ``"%llu"`` added.
548
549   .. versionchanged:: 3.3
550      Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added.
551
552   .. versionchanged:: 3.4
553      Support width and precision formatter for ``"%s"``, ``"%A"``, ``"%U"``,
554      ``"%V"``, ``"%S"``, ``"%R"`` added.
555
556
557.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
558
559   Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
560   arguments.
561
562
563.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
564                               const char *encoding, const char *errors)
565
566   Decode an encoded object *obj* to a Unicode object.
567
568   :class:`bytes`, :class:`bytearray` and other
569   :term:`bytes-like objects <bytes-like object>`
570   are decoded according to the given *encoding* and using the error handling
571   defined by *errors*. Both can be ``NULL`` to have the interface use the default
572   values (see :ref:`builtincodecs` for details).
573
574   All other objects, including Unicode objects, cause a :exc:`TypeError` to be
575   set.
576
577   The API returns ``NULL`` if there was an error.  The caller is responsible for
578   decref'ing the returned objects.
579
580
581.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
582
583   Return the length of the Unicode object, in code points.
584
585   .. versionadded:: 3.3
586
587
588.. c:function:: Py_ssize_t PyUnicode_CopyCharacters(PyObject *to, \
589                                                    Py_ssize_t to_start, \
590                                                    PyObject *from, \
591                                                    Py_ssize_t from_start, \
592                                                    Py_ssize_t how_many)
593
594   Copy characters from one Unicode object into another.  This function performs
595   character conversion when necessary and falls back to :c:func:`memcpy` if
596   possible.  Returns ``-1`` and sets an exception on error, otherwise returns
597   the number of copied characters.
598
599   .. versionadded:: 3.3
600
601
602.. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
603                        Py_ssize_t length, Py_UCS4 fill_char)
604
605   Fill a string with a character: write *fill_char* into
606   ``unicode[start:start+length]``.
607
608   Fail if *fill_char* is bigger than the string maximum character, or if the
609   string has more than 1 reference.
610
611   Return the number of written character, or return ``-1`` and raise an
612   exception on error.
613
614   .. versionadded:: 3.3
615
616
617.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
618                                        Py_UCS4 character)
619
620   Write a character to a string.  The string must have been created through
621   :c:func:`PyUnicode_New`.  Since Unicode strings are supposed to be immutable,
622   the string must not be shared, or have been hashed yet.
623
624   This function checks that *unicode* is a Unicode object, that the index is
625   not out of bounds, and that the object can be modified safely (i.e. that it
626   its reference count is one).
627
628   .. versionadded:: 3.3
629
630
631.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
632
633   Read a character from a string.  This function checks that *unicode* is a
634   Unicode object and the index is not out of bounds, in contrast to the macro
635   version :c:func:`PyUnicode_READ_CHAR`.
636
637   .. versionadded:: 3.3
638
639
640.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
641                                              Py_ssize_t end)
642
643   Return a substring of *str*, from character index *start* (included) to
644   character index *end* (excluded).  Negative indices are not supported.
645
646   .. versionadded:: 3.3
647
648
649.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject *u, Py_UCS4 *buffer, \
650                                          Py_ssize_t buflen, int copy_null)
651
652   Copy the string *u* into a UCS4 buffer, including a null character, if
653   *copy_null* is set.  Returns ``NULL`` and sets an exception on error (in
654   particular, a :exc:`SystemError` if *buflen* is smaller than the length of
655   *u*).  *buffer* is returned on success.
656
657   .. versionadded:: 3.3
658
659
660.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
661
662   Copy the string *u* into a new UCS4 buffer that is allocated using
663   :c:func:`PyMem_Malloc`.  If this fails, ``NULL`` is returned with a
664   :exc:`MemoryError` set.  The returned buffer always has an extra
665   null code point appended.
666
667   .. versionadded:: 3.3
668
669
670Deprecated Py_UNICODE APIs
671""""""""""""""""""""""""""
672
673.. deprecated-removed:: 3.3 3.12
674
675These API functions are deprecated with the implementation of :pep:`393`.
676Extension modules can continue using them, as they will not be removed in Python
6773.x, but need to be aware that their use can now cause performance and memory hits.
678
679
680.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
681
682   Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
683   may be ``NULL`` which causes the contents to be undefined. It is the user's
684   responsibility to fill in the needed data.  The buffer is copied into the new
685   object.
686
687   If the buffer is not ``NULL``, the return value might be a shared object.
688   Therefore, modification of the resulting Unicode object is only allowed when
689   *u* is ``NULL``.
690
691   If the buffer is ``NULL``, :c:func:`PyUnicode_READY` must be called once the
692   string content has been filled before using any of the access macros such as
693   :c:func:`PyUnicode_KIND`.
694
695   .. deprecated-removed:: 3.3 3.12
696      Part of the old-style Unicode API, please migrate to using
697      :c:func:`PyUnicode_FromKindAndData`, :c:func:`PyUnicode_FromWideChar`, or
698      :c:func:`PyUnicode_New`.
699
700
701.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
702
703   Return a read-only pointer to the Unicode object's internal
704   :c:type:`Py_UNICODE` buffer, or ``NULL`` on error. This will create the
705   :c:type:`Py_UNICODE*` representation of the object if it is not yet
706   available. The buffer is always terminated with an extra null code point.
707   Note that the resulting :c:type:`Py_UNICODE` string may also contain
708   embedded null code points, which would cause the string to be truncated when
709   used in most C functions.
710
711   .. deprecated-removed:: 3.3 3.12
712      Part of the old-style Unicode API, please migrate to using
713      :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
714      :c:func:`PyUnicode_ReadChar` or similar new APIs.
715
716   .. deprecated-removed:: 3.3 3.10
717
718
719.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
720
721   Create a Unicode object by replacing all decimal digits in
722   :c:type:`Py_UNICODE` buffer of the given *size* by ASCII digits 0--9
723   according to their decimal value.  Return ``NULL`` if an exception occurs.
724
725   .. deprecated-removed:: 3.3 3.11
726      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
727      :c:func:`Py_UNICODE_TODECIMAL`.
728
729
730.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
731
732   Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
733   array length (excluding the extra null terminator) in *size*.
734   Note that the resulting :c:type:`Py_UNICODE*` string
735   may contain embedded null code points, which would cause the string to be
736   truncated when used in most C functions.
737
738   .. versionadded:: 3.3
739
740   .. deprecated-removed:: 3.3 3.12
741      Part of the old-style Unicode API, please migrate to using
742      :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
743      :c:func:`PyUnicode_ReadChar` or similar new APIs.
744
745
746.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
747
748   Create a copy of a Unicode string ending with a null code point. Return ``NULL``
749   and raise a :exc:`MemoryError` exception on memory allocation failure,
750   otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
751   the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
752   contain embedded null code points, which would cause the string to be
753   truncated when used in most C functions.
754
755   .. versionadded:: 3.2
756
757   Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
758
759
760.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
761
762   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
763   code units (this includes surrogate pairs as 2 units).
764
765   .. deprecated-removed:: 3.3 3.12
766      Part of the old-style Unicode API, please migrate to using
767      :c:func:`PyUnicode_GET_LENGTH`.
768
769
770.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
771
772   Copy an instance of a Unicode subtype to a new true Unicode object if
773   necessary. If *obj* is already a true Unicode object (not a subtype),
774   return the reference with incremented refcount.
775
776   Objects other than Unicode or its subtypes will cause a :exc:`TypeError`.
777
778
779Locale Encoding
780"""""""""""""""
781
782The current locale encoding can be used to decode text from the operating
783system.
784
785.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, \
786                                                        Py_ssize_t len, \
787                                                        const char *errors)
788
789   Decode a string from UTF-8 on Android and VxWorks, or from the current
790   locale encoding on other platforms. The supported
791   error handlers are ``"strict"`` and ``"surrogateescape"``
792   (:pep:`383`). The decoder uses ``"strict"`` error handler if
793   *errors* is ``NULL``.  *str* must end with a null character but
794   cannot contain embedded null characters.
795
796   Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
797   :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
798   Python startup).
799
800   This function ignores the Python UTF-8 mode.
801
802   .. seealso::
803
804      The :c:func:`Py_DecodeLocale` function.
805
806   .. versionadded:: 3.3
807
808   .. versionchanged:: 3.7
809      The function now also uses the current locale encoding for the
810      ``surrogateescape`` error handler, except on Android. Previously, :c:func:`Py_DecodeLocale`
811      was used for the ``surrogateescape``, and the current locale encoding was
812      used for ``strict``.
813
814
815.. c:function:: PyObject* PyUnicode_DecodeLocale(const char *str, const char *errors)
816
817   Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string
818   length using :c:func:`strlen`.
819
820   .. versionadded:: 3.3
821
822
823.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject *unicode, const char *errors)
824
825   Encode a Unicode object to UTF-8 on Android and VxWorks, or to the current
826   locale encoding on other platforms. The
827   supported error handlers are ``"strict"`` and ``"surrogateescape"``
828   (:pep:`383`). The encoder uses ``"strict"`` error handler if
829   *errors* is ``NULL``. Return a :class:`bytes` object. *unicode* cannot
830   contain embedded null characters.
831
832   Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
833   :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
834   Python startup).
835
836   This function ignores the Python UTF-8 mode.
837
838   .. seealso::
839
840      The :c:func:`Py_EncodeLocale` function.
841
842   .. versionadded:: 3.3
843
844   .. versionchanged:: 3.7
845      The function now also uses the current locale encoding for the
846      ``surrogateescape`` error handler, except on Android. Previously,
847      :c:func:`Py_EncodeLocale`
848      was used for the ``surrogateescape``, and the current locale encoding was
849      used for ``strict``.
850
851
852File System Encoding
853""""""""""""""""""""
854
855To encode and decode file names and other environment strings,
856:c:data:`Py_FileSystemDefaultEncoding` should be used as the encoding, and
857:c:data:`Py_FileSystemDefaultEncodeErrors` should be used as the error handler
858(:pep:`383` and :pep:`529`). To encode file names to :class:`bytes` during
859argument parsing, the ``"O&"`` converter should be used, passing
860:c:func:`PyUnicode_FSConverter` as the conversion function:
861
862.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
863
864   ParseTuple converter: encode :class:`str` objects -- obtained directly or
865   through the :class:`os.PathLike` interface -- to :class:`bytes` using
866   :c:func:`PyUnicode_EncodeFSDefault`; :class:`bytes` objects are output as-is.
867   *result* must be a :c:type:`PyBytesObject*` which must be released when it is
868   no longer used.
869
870   .. versionadded:: 3.1
871
872   .. versionchanged:: 3.6
873      Accepts a :term:`path-like object`.
874
875To decode file names to :class:`str` during argument parsing, the ``"O&"``
876converter should be used, passing :c:func:`PyUnicode_FSDecoder` as the
877conversion function:
878
879.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
880
881   ParseTuple converter: decode :class:`bytes` objects -- obtained either
882   directly or indirectly through the :class:`os.PathLike` interface -- to
883   :class:`str` using :c:func:`PyUnicode_DecodeFSDefaultAndSize`; :class:`str`
884   objects are output as-is. *result* must be a :c:type:`PyUnicodeObject*` which
885   must be released when it is no longer used.
886
887   .. versionadded:: 3.2
888
889   .. versionchanged:: 3.6
890      Accepts a :term:`path-like object`.
891
892
893.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
894
895   Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the
896   :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
897
898   If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
899   locale encoding.
900
901   :c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
902   locale encoding and cannot be modified later. If you need to decode a string
903   from the current locale encoding, use
904   :c:func:`PyUnicode_DecodeLocaleAndSize`.
905
906   .. seealso::
907
908      The :c:func:`Py_DecodeLocale` function.
909
910   .. versionchanged:: 3.6
911      Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
912
913
914.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
915
916   Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding`
917   and the :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
918
919   If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
920   locale encoding.
921
922   Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
923
924   .. versionchanged:: 3.6
925      Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
926
927
928.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
929
930   Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
931   :c:data:`Py_FileSystemDefaultEncodeErrors` error handler, and return
932   :class:`bytes`. Note that the resulting :class:`bytes` object may contain
933   null bytes.
934
935   If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
936   locale encoding.
937
938   :c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
939   locale encoding and cannot be modified later. If you need to encode a string
940   to the current locale encoding, use :c:func:`PyUnicode_EncodeLocale`.
941
942   .. seealso::
943
944      The :c:func:`Py_EncodeLocale` function.
945
946   .. versionadded:: 3.2
947
948   .. versionchanged:: 3.6
949      Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
950
951wchar_t Support
952"""""""""""""""
953
954:c:type:`wchar_t` support for platforms which support it:
955
956.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
957
958   Create a Unicode object from the :c:type:`wchar_t` buffer *w* of the given *size*.
959   Passing ``-1`` as the *size* indicates that the function must itself compute the length,
960   using wcslen.
961   Return ``NULL`` on failure.
962
963
964.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyObject *unicode, wchar_t *w, Py_ssize_t size)
965
966   Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*.  At most
967   *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
968   null termination character).  Return the number of :c:type:`wchar_t` characters
969   copied or ``-1`` in case of an error.  Note that the resulting :c:type:`wchar_t*`
970   string may or may not be null-terminated.  It is the responsibility of the caller
971   to make sure that the :c:type:`wchar_t*` string is null-terminated in case this is
972   required by the application. Also, note that the :c:type:`wchar_t*` string
973   might contain null characters, which would cause the string to be truncated
974   when used with most C functions.
975
976
977.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
978
979   Convert the Unicode object to a wide character string. The output string
980   always ends with a null character. If *size* is not ``NULL``, write the number
981   of wide characters (excluding the trailing null termination character) into
982   *\*size*. Note that the resulting :c:type:`wchar_t` string might contain
983   null characters, which would cause the string to be truncated when used with
984   most C functions. If *size* is ``NULL`` and the :c:type:`wchar_t*` string
985   contains null characters a :exc:`ValueError` is raised.
986
987   Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
988   :c:func:`PyMem_Free` to free it) on success. On error, returns ``NULL``
989   and *\*size* is undefined. Raises a :exc:`MemoryError` if memory allocation
990   is failed.
991
992   .. versionadded:: 3.2
993
994   .. versionchanged:: 3.7
995      Raises a :exc:`ValueError` if *size* is ``NULL`` and the :c:type:`wchar_t*`
996      string contains null characters.
997
998
999.. _builtincodecs:
1000
1001Built-in Codecs
1002^^^^^^^^^^^^^^^
1003
1004Python provides a set of built-in codecs which are written in C for speed. All of
1005these codecs are directly usable via the following functions.
1006
1007Many of the following APIs take two arguments encoding and errors, and they
1008have the same semantics as the ones of the built-in :func:`str` string object
1009constructor.
1010
1011Setting encoding to ``NULL`` causes the default encoding to be used
1012which is ASCII.  The file system calls should use
1013:c:func:`PyUnicode_FSConverter` for encoding file names. This uses the
1014variable :c:data:`Py_FileSystemDefaultEncoding` internally. This
1015variable should be treated as read-only: on some systems, it will be a
1016pointer to a static string, on others, it will change at run-time
1017(such as when the application invokes setlocale).
1018
1019Error handling is set by errors which may also be set to ``NULL`` meaning to use
1020the default handling defined for the codec.  Default error handling for all
1021built-in codecs is "strict" (:exc:`ValueError` is raised).
1022
1023The codecs all use a similar interface.  Only deviation from the following
1024generic ones are documented for simplicity.
1025
1026
1027Generic Codecs
1028""""""""""""""
1029
1030These are the generic codec APIs:
1031
1032
1033.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
1034                              const char *encoding, const char *errors)
1035
1036   Create a Unicode object by decoding *size* bytes of the encoded string *s*.
1037   *encoding* and *errors* have the same meaning as the parameters of the same name
1038   in the :func:`str` built-in function.  The codec to be used is looked up
1039   using the Python codec registry.  Return ``NULL`` if an exception was raised by
1040   the codec.
1041
1042
1043.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
1044                              const char *encoding, const char *errors)
1045
1046   Encode a Unicode object and return the result as Python bytes object.
1047   *encoding* and *errors* have the same meaning as the parameters of the same
1048   name in the Unicode :meth:`~str.encode` method. The codec to be used is looked up
1049   using the Python codec registry. Return ``NULL`` if an exception was raised by
1050   the codec.
1051
1052
1053.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
1054                              const char *encoding, const char *errors)
1055
1056   Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python
1057   bytes object.  *encoding* and *errors* have the same meaning as the
1058   parameters of the same name in the Unicode :meth:`~str.encode` method.  The codec
1059   to be used is looked up using the Python codec registry.  Return ``NULL`` if an
1060   exception was raised by the codec.
1061
1062   .. deprecated-removed:: 3.3 3.11
1063      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1064      :c:func:`PyUnicode_AsEncodedString`.
1065
1066
1067UTF-8 Codecs
1068""""""""""""
1069
1070These are the UTF-8 codec APIs:
1071
1072
1073.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
1074
1075   Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string
1076   *s*. Return ``NULL`` if an exception was raised by the codec.
1077
1078
1079.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
1080                              const char *errors, Py_ssize_t *consumed)
1081
1082   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF8`. If
1083   *consumed* is not ``NULL``, trailing incomplete UTF-8 byte sequences will not be
1084   treated as an error. Those bytes will not be decoded and the number of bytes
1085   that have been decoded will be stored in *consumed*.
1086
1087
1088.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
1089
1090   Encode a Unicode object using UTF-8 and return the result as Python bytes
1091   object.  Error handling is "strict".  Return ``NULL`` if an exception was
1092   raised by the codec.
1093
1094
1095.. c:function:: const char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
1096
1097   Return a pointer to the UTF-8 encoding of the Unicode object, and
1098   store the size of the encoded representation (in bytes) in *size*.  The
1099   *size* argument can be ``NULL``; in this case no size will be stored.  The
1100   returned buffer always has an extra null byte appended (not included in
1101   *size*), regardless of whether there are any other null code points.
1102
1103   In the case of an error, ``NULL`` is returned with an exception set and no
1104   *size* is stored.
1105
1106   This caches the UTF-8 representation of the string in the Unicode object, and
1107   subsequent calls will return a pointer to the same buffer.  The caller is not
1108   responsible for deallocating the buffer.
1109
1110   .. versionadded:: 3.3
1111
1112   .. versionchanged:: 3.7
1113      The return type is now ``const char *`` rather of ``char *``.
1114
1115
1116.. c:function:: const char* PyUnicode_AsUTF8(PyObject *unicode)
1117
1118   As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
1119
1120   .. versionadded:: 3.3
1121
1122   .. versionchanged:: 3.7
1123      The return type is now ``const char *`` rather of ``char *``.
1124
1125
1126.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
1127
1128   Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* using UTF-8 and
1129   return a Python bytes object.  Return ``NULL`` if an exception was raised by
1130   the codec.
1131
1132   .. deprecated-removed:: 3.3 3.11
1133      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1134      :c:func:`PyUnicode_AsUTF8String`, :c:func:`PyUnicode_AsUTF8AndSize` or
1135      :c:func:`PyUnicode_AsEncodedString`.
1136
1137
1138UTF-32 Codecs
1139"""""""""""""
1140
1141These are the UTF-32 codec APIs:
1142
1143
1144.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
1145                              const char *errors, int *byteorder)
1146
1147   Decode *size* bytes from a UTF-32 encoded buffer string and return the
1148   corresponding Unicode object.  *errors* (if non-``NULL``) defines the error
1149   handling. It defaults to "strict".
1150
1151   If *byteorder* is non-``NULL``, the decoder starts decoding using the given byte
1152   order::
1153
1154      *byteorder == -1: little endian
1155      *byteorder == 0:  native order
1156      *byteorder == 1:  big endian
1157
1158   If ``*byteorder`` is zero, and the first four bytes of the input data are a
1159   byte order mark (BOM), the decoder switches to this byte order and the BOM is
1160   not copied into the resulting Unicode string.  If ``*byteorder`` is ``-1`` or
1161   ``1``, any byte order mark is copied to the output.
1162
1163   After completion, *\*byteorder* is set to the current byte order at the end
1164   of input data.
1165
1166   If *byteorder* is ``NULL``, the codec starts in native order mode.
1167
1168   Return ``NULL`` if an exception was raised by the codec.
1169
1170
1171.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
1172                              const char *errors, int *byteorder, Py_ssize_t *consumed)
1173
1174   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF32`. If
1175   *consumed* is not ``NULL``, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
1176   trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
1177   by four) as an error. Those bytes will not be decoded and the number of bytes
1178   that have been decoded will be stored in *consumed*.
1179
1180
1181.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
1182
1183   Return a Python byte string using the UTF-32 encoding in native byte
1184   order. The string always starts with a BOM mark.  Error handling is "strict".
1185   Return ``NULL`` if an exception was raised by the codec.
1186
1187
1188.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
1189                              const char *errors, int byteorder)
1190
1191   Return a Python bytes object holding the UTF-32 encoded value of the Unicode
1192   data in *s*.  Output is written according to the following byte order::
1193
1194      byteorder == -1: little endian
1195      byteorder == 0:  native byte order (writes a BOM mark)
1196      byteorder == 1:  big endian
1197
1198   If byteorder is ``0``, the output string will always start with the Unicode BOM
1199   mark (U+FEFF). In the other two modes, no BOM mark is prepended.
1200
1201   If ``Py_UNICODE_WIDE`` is not defined, surrogate pairs will be output
1202   as a single code point.
1203
1204   Return ``NULL`` if an exception was raised by the codec.
1205
1206   .. deprecated-removed:: 3.3 3.11
1207      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1208      :c:func:`PyUnicode_AsUTF32String` or :c:func:`PyUnicode_AsEncodedString`.
1209
1210
1211UTF-16 Codecs
1212"""""""""""""
1213
1214These are the UTF-16 codec APIs:
1215
1216
1217.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
1218                              const char *errors, int *byteorder)
1219
1220   Decode *size* bytes from a UTF-16 encoded buffer string and return the
1221   corresponding Unicode object.  *errors* (if non-``NULL``) defines the error
1222   handling. It defaults to "strict".
1223
1224   If *byteorder* is non-``NULL``, the decoder starts decoding using the given byte
1225   order::
1226
1227      *byteorder == -1: little endian
1228      *byteorder == 0:  native order
1229      *byteorder == 1:  big endian
1230
1231   If ``*byteorder`` is zero, and the first two bytes of the input data are a
1232   byte order mark (BOM), the decoder switches to this byte order and the BOM is
1233   not copied into the resulting Unicode string.  If ``*byteorder`` is ``-1`` or
1234   ``1``, any byte order mark is copied to the output (where it will result in
1235   either a ``\ufeff`` or a ``\ufffe`` character).
1236
1237   After completion, *\*byteorder* is set to the current byte order at the end
1238   of input data.
1239
1240   If *byteorder* is ``NULL``, the codec starts in native order mode.
1241
1242   Return ``NULL`` if an exception was raised by the codec.
1243
1244
1245.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
1246                              const char *errors, int *byteorder, Py_ssize_t *consumed)
1247
1248   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF16`. If
1249   *consumed* is not ``NULL``, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
1250   trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
1251   split surrogate pair) as an error. Those bytes will not be decoded and the
1252   number of bytes that have been decoded will be stored in *consumed*.
1253
1254
1255.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
1256
1257   Return a Python byte string using the UTF-16 encoding in native byte
1258   order. The string always starts with a BOM mark.  Error handling is "strict".
1259   Return ``NULL`` if an exception was raised by the codec.
1260
1261
1262.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
1263                              const char *errors, int byteorder)
1264
1265   Return a Python bytes object holding the UTF-16 encoded value of the Unicode
1266   data in *s*.  Output is written according to the following byte order::
1267
1268      byteorder == -1: little endian
1269      byteorder == 0:  native byte order (writes a BOM mark)
1270      byteorder == 1:  big endian
1271
1272   If byteorder is ``0``, the output string will always start with the Unicode BOM
1273   mark (U+FEFF). In the other two modes, no BOM mark is prepended.
1274
1275   If ``Py_UNICODE_WIDE`` is defined, a single :c:type:`Py_UNICODE` value may get
1276   represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
1277   values is interpreted as a UCS-2 character.
1278
1279   Return ``NULL`` if an exception was raised by the codec.
1280
1281   .. deprecated-removed:: 3.3 3.11
1282      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1283      :c:func:`PyUnicode_AsUTF16String` or :c:func:`PyUnicode_AsEncodedString`.
1284
1285
1286UTF-7 Codecs
1287""""""""""""
1288
1289These are the UTF-7 codec APIs:
1290
1291
1292.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char *s, Py_ssize_t size, const char *errors)
1293
1294   Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string
1295   *s*.  Return ``NULL`` if an exception was raised by the codec.
1296
1297
1298.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
1299                              const char *errors, Py_ssize_t *consumed)
1300
1301   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF7`.  If
1302   *consumed* is not ``NULL``, trailing incomplete UTF-7 base-64 sections will not
1303   be treated as an error.  Those bytes will not be decoded and the number of
1304   bytes that have been decoded will be stored in *consumed*.
1305
1306
1307.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
1308                              int base64SetO, int base64WhiteSpace, const char *errors)
1309
1310   Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
1311   return a Python bytes object.  Return ``NULL`` if an exception was raised by
1312   the codec.
1313
1314   If *base64SetO* is nonzero, "Set O" (punctuation that has no otherwise
1315   special meaning) will be encoded in base-64.  If *base64WhiteSpace* is
1316   nonzero, whitespace will be encoded in base-64.  Both are set to zero for the
1317   Python "utf-7" codec.
1318
1319   .. deprecated-removed:: 3.3 3.11
1320      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1321      :c:func:`PyUnicode_AsEncodedString`.
1322
1323
1324Unicode-Escape Codecs
1325"""""""""""""""""""""
1326
1327These are the "Unicode Escape" codec APIs:
1328
1329
1330.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
1331                              Py_ssize_t size, const char *errors)
1332
1333   Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
1334   string *s*.  Return ``NULL`` if an exception was raised by the codec.
1335
1336
1337.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
1338
1339   Encode a Unicode object using Unicode-Escape and return the result as a
1340   bytes object.  Error handling is "strict".  Return ``NULL`` if an exception was
1341   raised by the codec.
1342
1343
1344.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
1345
1346   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Unicode-Escape and
1347   return a bytes object.  Return ``NULL`` if an exception was raised by the codec.
1348
1349   .. deprecated-removed:: 3.3 3.11
1350      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1351      :c:func:`PyUnicode_AsUnicodeEscapeString`.
1352
1353
1354Raw-Unicode-Escape Codecs
1355"""""""""""""""""""""""""
1356
1357These are the "Raw Unicode Escape" codec APIs:
1358
1359
1360.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
1361                              Py_ssize_t size, const char *errors)
1362
1363   Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
1364   encoded string *s*.  Return ``NULL`` if an exception was raised by the codec.
1365
1366
1367.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
1368
1369   Encode a Unicode object using Raw-Unicode-Escape and return the result as
1370   a bytes object.  Error handling is "strict".  Return ``NULL`` if an exception
1371   was raised by the codec.
1372
1373
1374.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
1375                              Py_ssize_t size)
1376
1377   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
1378   and return a bytes object.  Return ``NULL`` if an exception was raised by the codec.
1379
1380   .. deprecated-removed:: 3.3 3.11
1381      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1382      :c:func:`PyUnicode_AsRawUnicodeEscapeString` or
1383      :c:func:`PyUnicode_AsEncodedString`.
1384
1385
1386Latin-1 Codecs
1387""""""""""""""
1388
1389These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
1390ordinals and only these are accepted by the codecs during encoding.
1391
1392
1393.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
1394
1395   Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string
1396   *s*.  Return ``NULL`` if an exception was raised by the codec.
1397
1398
1399.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
1400
1401   Encode a Unicode object using Latin-1 and return the result as Python bytes
1402   object.  Error handling is "strict".  Return ``NULL`` if an exception was
1403   raised by the codec.
1404
1405
1406.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
1407
1408   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Latin-1 and
1409   return a Python bytes object.  Return ``NULL`` if an exception was raised by
1410   the codec.
1411
1412   .. deprecated-removed:: 3.3 3.11
1413      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1414      :c:func:`PyUnicode_AsLatin1String` or
1415      :c:func:`PyUnicode_AsEncodedString`.
1416
1417
1418ASCII Codecs
1419""""""""""""
1420
1421These are the ASCII codec APIs.  Only 7-bit ASCII data is accepted. All other
1422codes generate errors.
1423
1424
1425.. c:function:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
1426
1427   Create a Unicode object by decoding *size* bytes of the ASCII encoded string
1428   *s*.  Return ``NULL`` if an exception was raised by the codec.
1429
1430
1431.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
1432
1433   Encode a Unicode object using ASCII and return the result as Python bytes
1434   object.  Error handling is "strict".  Return ``NULL`` if an exception was
1435   raised by the codec.
1436
1437
1438.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
1439
1440   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using ASCII and
1441   return a Python bytes object.  Return ``NULL`` if an exception was raised by
1442   the codec.
1443
1444   .. deprecated-removed:: 3.3 3.11
1445      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1446      :c:func:`PyUnicode_AsASCIIString` or
1447      :c:func:`PyUnicode_AsEncodedString`.
1448
1449
1450Character Map Codecs
1451""""""""""""""""""""
1452
1453This codec is special in that it can be used to implement many different codecs
1454(and this is in fact what was done to obtain most of the standard codecs
1455included in the :mod:`encodings` package). The codec uses mapping to encode and
1456decode characters.  The mapping objects provided must support the
1457:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
1458
1459These are the mapping codec APIs:
1460
1461.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, \
1462                              PyObject *mapping, const char *errors)
1463
1464   Create a Unicode object by decoding *size* bytes of the encoded string *s*
1465   using the given *mapping* object.  Return ``NULL`` if an exception was raised
1466   by the codec.
1467
1468   If *mapping* is ``NULL``, Latin-1 decoding will be applied.  Else
1469   *mapping* must map bytes ordinals (integers in the range from 0 to 255)
1470   to Unicode strings, integers (which are then interpreted as Unicode
1471   ordinals) or ``None``.  Unmapped data bytes -- ones which cause a
1472   :exc:`LookupError`, as well as ones which get mapped to ``None``,
1473   ``0xFFFE`` or ``'\ufffe'``, are treated as undefined mappings and cause
1474   an error.
1475
1476
1477.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
1478
1479   Encode a Unicode object using the given *mapping* object and return the
1480   result as a bytes object.  Error handling is "strict".  Return ``NULL`` if an
1481   exception was raised by the codec.
1482
1483   The *mapping* object must map Unicode ordinal integers to bytes objects,
1484   integers in the range from 0 to 255 or ``None``.  Unmapped character
1485   ordinals (ones which cause a :exc:`LookupError`) as well as mapped to
1486   ``None`` are treated as "undefined mapping" and cause an error.
1487
1488
1489.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1490                              PyObject *mapping, const char *errors)
1491
1492   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
1493   *mapping* object and return the result as a bytes object.  Return ``NULL`` if
1494   an exception was raised by the codec.
1495
1496   .. deprecated-removed:: 3.3 3.11
1497      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1498      :c:func:`PyUnicode_AsCharmapString` or
1499      :c:func:`PyUnicode_AsEncodedString`.
1500
1501
1502The following codec API is special in that maps Unicode to Unicode.
1503
1504.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
1505
1506   Translate a string by applying a character mapping table to it and return the
1507   resulting Unicode object. Return ``NULL`` if an exception was raised by the
1508   codec.
1509
1510   The mapping table must map Unicode ordinal integers to Unicode ordinal integers
1511   or ``None`` (causing deletion of the character).
1512
1513   Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
1514   and sequences work well.  Unmapped character ordinals (ones which cause a
1515   :exc:`LookupError`) are left untouched and are copied as-is.
1516
1517   *errors* has the usual meaning for codecs. It may be ``NULL`` which indicates to
1518   use the default error handling.
1519
1520
1521.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1522                              PyObject *mapping, const char *errors)
1523
1524   Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
1525   character *mapping* table to it and return the resulting Unicode object.
1526   Return ``NULL`` when an exception was raised by the codec.
1527
1528   .. deprecated-removed:: 3.3 3.11
1529      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1530      :c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
1531      <codec-registry>`
1532
1533
1534MBCS codecs for Windows
1535"""""""""""""""""""""""
1536
1537These are the MBCS codec APIs. They are currently only available on Windows and
1538use the Win32 MBCS converters to implement the conversions.  Note that MBCS (or
1539DBCS) is a class of encodings, not just one.  The target encoding is defined by
1540the user settings on the machine running the codec.
1541
1542.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
1543
1544   Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
1545   Return ``NULL`` if an exception was raised by the codec.
1546
1547
1548.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, Py_ssize_t size, \
1549                              const char *errors, Py_ssize_t *consumed)
1550
1551   If *consumed* is ``NULL``, behave like :c:func:`PyUnicode_DecodeMBCS`. If
1552   *consumed* is not ``NULL``, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
1553   trailing lead byte and the number of bytes that have been decoded will be stored
1554   in *consumed*.
1555
1556
1557.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
1558
1559   Encode a Unicode object using MBCS and return the result as Python bytes
1560   object.  Error handling is "strict".  Return ``NULL`` if an exception was
1561   raised by the codec.
1562
1563
1564.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject *unicode, const char *errors)
1565
1566   Encode the Unicode object using the specified code page and return a Python
1567   bytes object.  Return ``NULL`` if an exception was raised by the codec. Use
1568   :c:data:`CP_ACP` code page to get the MBCS encoder.
1569
1570   .. versionadded:: 3.3
1571
1572
1573.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
1574
1575   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using MBCS and return
1576   a Python bytes object.  Return ``NULL`` if an exception was raised by the
1577   codec.
1578
1579   .. deprecated-removed:: 3.3 4.0
1580      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1581      :c:func:`PyUnicode_AsMBCSString`, :c:func:`PyUnicode_EncodeCodePage` or
1582      :c:func:`PyUnicode_AsEncodedString`.
1583
1584
1585Methods & Slots
1586"""""""""""""""
1587
1588
1589.. _unicodemethodsandslots:
1590
1591Methods and Slot Functions
1592^^^^^^^^^^^^^^^^^^^^^^^^^^
1593
1594The following APIs are capable of handling Unicode objects and strings on input
1595(we refer to them as strings in the descriptions) and return Unicode objects or
1596integers as appropriate.
1597
1598They all return ``NULL`` or ``-1`` if an exception occurs.
1599
1600
1601.. c:function:: PyObject* PyUnicode_Concat(PyObject *left, PyObject *right)
1602
1603   Concat two strings giving a new Unicode string.
1604
1605
1606.. c:function:: PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
1607
1608   Split a string giving a list of Unicode strings.  If *sep* is ``NULL``, splitting
1609   will be done at all whitespace substrings.  Otherwise, splits occur at the given
1610   separator.  At most *maxsplit* splits will be done.  If negative, no limit is
1611   set.  Separators are not included in the resulting list.
1612
1613
1614.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
1615
1616   Split a Unicode string at line breaks, returning a list of Unicode strings.
1617   CRLF is considered to be one line break.  If *keepend* is ``0``, the Line break
1618   characters are not included in the resulting strings.
1619
1620
1621.. c:function:: PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq)
1622
1623   Join a sequence of strings using the given *separator* and return the resulting
1624   Unicode string.
1625
1626
1627.. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \
1628                        Py_ssize_t start, Py_ssize_t end, int direction)
1629
1630   Return ``1`` if *substr* matches ``str[start:end]`` at the given tail end
1631   (*direction* == ``-1`` means to do a prefix match, *direction* == ``1`` a suffix match),
1632   ``0`` otherwise. Return ``-1`` if an error occurred.
1633
1634
1635.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, \
1636                               Py_ssize_t start, Py_ssize_t end, int direction)
1637
1638   Return the first position of *substr* in ``str[start:end]`` using the given
1639   *direction* (*direction* == ``1`` means to do a forward search, *direction* == ``-1`` a
1640   backward search).  The return value is the index of the first match; a value of
1641   ``-1`` indicates that no match was found, and ``-2`` indicates that an error
1642   occurred and an exception has been set.
1643
1644
1645.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
1646                               Py_ssize_t start, Py_ssize_t end, int direction)
1647
1648   Return the first position of the character *ch* in ``str[start:end]`` using
1649   the given *direction* (*direction* == ``1`` means to do a forward search,
1650   *direction* == ``-1`` a backward search).  The return value is the index of the
1651   first match; a value of ``-1`` indicates that no match was found, and ``-2``
1652   indicates that an error occurred and an exception has been set.
1653
1654   .. versionadded:: 3.3
1655
1656   .. versionchanged:: 3.7
1657      *start* and *end* are now adjusted to behave like ``str[start:end]``.
1658
1659
1660.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, \
1661                               Py_ssize_t start, Py_ssize_t end)
1662
1663   Return the number of non-overlapping occurrences of *substr* in
1664   ``str[start:end]``.  Return ``-1`` if an error occurred.
1665
1666
1667.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, \
1668                              PyObject *replstr, Py_ssize_t maxcount)
1669
1670   Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
1671   return the resulting Unicode object. *maxcount* == ``-1`` means replace all
1672   occurrences.
1673
1674
1675.. c:function:: int PyUnicode_Compare(PyObject *left, PyObject *right)
1676
1677   Compare two strings and return ``-1``, ``0``, ``1`` for less than, equal, and greater than,
1678   respectively.
1679
1680   This function returns ``-1`` upon failure, so one should call
1681   :c:func:`PyErr_Occurred` to check for errors.
1682
1683
1684.. c:function:: int PyUnicode_CompareWithASCIIString(PyObject *uni, const char *string)
1685
1686   Compare a Unicode object, *uni*, with *string* and return ``-1``, ``0``, ``1`` for less
1687   than, equal, and greater than, respectively. It is best to pass only
1688   ASCII-encoded strings, but the function interprets the input string as
1689   ISO-8859-1 if it contains non-ASCII characters.
1690
1691   This function does not raise exceptions.
1692
1693
1694.. c:function:: PyObject* PyUnicode_RichCompare(PyObject *left,  PyObject *right,  int op)
1695
1696   Rich compare two Unicode strings and return one of the following:
1697
1698   * ``NULL`` in case an exception was raised
1699   * :const:`Py_True` or :const:`Py_False` for successful comparisons
1700   * :const:`Py_NotImplemented` in case the type combination is unknown
1701
1702   Possible values for *op* are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
1703   :const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
1704
1705
1706.. c:function:: PyObject* PyUnicode_Format(PyObject *format, PyObject *args)
1707
1708   Return a new string object from *format* and *args*; this is analogous to
1709   ``format % args``.
1710
1711
1712.. c:function:: int PyUnicode_Contains(PyObject *container, PyObject *element)
1713
1714   Check whether *element* is contained in *container* and return true or false
1715   accordingly.
1716
1717   *element* has to coerce to a one element Unicode string. ``-1`` is returned
1718   if there was an error.
1719
1720
1721.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
1722
1723   Intern the argument *\*string* in place.  The argument must be the address of a
1724   pointer variable pointing to a Python Unicode string object.  If there is an
1725   existing interned string that is the same as *\*string*, it sets *\*string* to
1726   it (decrementing the reference count of the old string object and incrementing
1727   the reference count of the interned string object), otherwise it leaves
1728   *\*string* alone and interns it (incrementing its reference count).
1729   (Clarification: even though there is a lot of talk about reference counts, think
1730   of this function as reference-count-neutral; you own the object after the call
1731   if and only if you owned it before the call.)
1732
1733
1734.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
1735
1736   A combination of :c:func:`PyUnicode_FromString` and
1737   :c:func:`PyUnicode_InternInPlace`, returning either a new Unicode string
1738   object that has been interned, or a new ("owned") reference to an earlier
1739   interned string object with the same value.
1740