1'''"Executable documentation" for the pickle module.
2
3Extensive comments about the pickle protocols and pickle-machine opcodes
4can be found here.  Some functions meant for external use:
5
6genops(pickle)
7   Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
8
9dis(pickle, out=None, memo=None, indentlevel=4)
10   Print a symbolic disassembly of a pickle.
11'''
12
13import codecs
14import io
15import pickle
16import re
17import sys
18
19__all__ = ['dis', 'genops', 'optimize']
20
21bytes_types = pickle.bytes_types
22
23# Other ideas:
24#
25# - A pickle verifier:  read a pickle and check it exhaustively for
26#   well-formedness.  dis() does a lot of this already.
27#
28# - A protocol identifier:  examine a pickle and return its protocol number
29#   (== the highest .proto attr value among all the opcodes in the pickle).
30#   dis() already prints this info at the end.
31#
32# - A pickle optimizer:  for example, tuple-building code is sometimes more
33#   elaborate than necessary, catering for the possibility that the tuple
34#   is recursive.  Or lots of times a PUT is generated that's never accessed
35#   by a later GET.
36
37
38# "A pickle" is a program for a virtual pickle machine (PM, but more accurately
39# called an unpickling machine).  It's a sequence of opcodes, interpreted by the
40# PM, building an arbitrarily complex Python object.
41#
42# For the most part, the PM is very simple:  there are no looping, testing, or
43# conditional instructions, no arithmetic and no function calls.  Opcodes are
44# executed once each, from first to last, until a STOP opcode is reached.
45#
46# The PM has two data areas, "the stack" and "the memo".
47#
48# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
49# integer object on the stack, whose value is gotten from a decimal string
50# literal immediately following the INT opcode in the pickle bytestream.  Other
51# opcodes take Python objects off the stack.  The result of unpickling is
52# whatever object is left on the stack when the final STOP opcode is executed.
53#
54# The memo is simply an array of objects, or it can be implemented as a dict
55# mapping little integers to objects.  The memo serves as the PM's "long term
56# memory", and the little integers indexing the memo are akin to variable
57# names.  Some opcodes pop a stack object into the memo at a given index,
58# and others push a memo object at a given index onto the stack again.
59#
60# At heart, that's all the PM has.  Subtleties arise for these reasons:
61#
62# + Object identity.  Objects can be arbitrarily complex, and subobjects
63#   may be shared (for example, the list [a, a] refers to the same object a
64#   twice).  It can be vital that unpickling recreate an isomorphic object
65#   graph, faithfully reproducing sharing.
66#
67# + Recursive objects.  For example, after "L = []; L.append(L)", L is a
68#   list, and L[0] is the same list.  This is related to the object identity
69#   point, and some sequences of pickle opcodes are subtle in order to
70#   get the right result in all cases.
71#
72# + Things pickle doesn't know everything about.  Examples of things pickle
73#   does know everything about are Python's builtin scalar and container
74#   types, like ints and tuples.  They generally have opcodes dedicated to
75#   them.  For things like module references and instances of user-defined
76#   classes, pickle's knowledge is limited.  Historically, many enhancements
77#   have been made to the pickle protocol in order to do a better (faster,
78#   and/or more compact) job on those.
79#
80# + Backward compatibility and micro-optimization.  As explained below,
81#   pickle opcodes never go away, not even when better ways to do a thing
82#   get invented.  The repertoire of the PM just keeps growing over time.
83#   For example, protocol 0 had two opcodes for building Python integers (INT
84#   and LONG), protocol 1 added three more for more-efficient pickling of short
85#   integers, and protocol 2 added two more for more-efficient pickling of
86#   long integers (before protocol 2, the only ways to pickle a Python long
87#   took time quadratic in the number of digits, for both pickling and
88#   unpickling).  "Opcode bloat" isn't so much a subtlety as a source of
89#   wearying complication.
90#
91#
92# Pickle protocols:
93#
94# For compatibility, the meaning of a pickle opcode never changes.  Instead new
95# pickle opcodes get added, and each version's unpickler can handle all the
96# pickle opcodes in all protocol versions to date.  So old pickles continue to
97# be readable forever.  The pickler can generally be told to restrict itself to
98# the subset of opcodes available under previous protocol versions too, so that
99# users can create pickles under the current version readable by older
100# versions.  However, a pickle does not contain its version number embedded
101# within it.  If an older unpickler tries to read a pickle using a later
102# protocol, the result is most likely an exception due to seeing an unknown (in
103# the older unpickler) opcode.
104#
105# The original pickle used what's now called "protocol 0", and what was called
106# "text mode" before Python 2.3.  The entire pickle bytestream is made up of
107# printable 7-bit ASCII characters, plus the newline character, in protocol 0.
108# That's why it was called text mode.  Protocol 0 is small and elegant, but
109# sometimes painfully inefficient.
110#
111# The second major set of additions is now called "protocol 1", and was called
112# "binary mode" before Python 2.3.  This added many opcodes with arguments
113# consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
114# bytes.  Binary mode pickles can be substantially smaller than equivalent
115# text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
116# int as 4 bytes following the opcode, which is cheaper to unpickle than the
117# (perhaps) 11-character decimal string attached to INT.  Protocol 1 also added
118# a number of opcodes that operate on many stack elements at once (like APPENDS
119# and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
120#
121# The third major set of additions came in Python 2.3, and is called "protocol
122# 2".  This added:
123#
124# - A better way to pickle instances of new-style classes (NEWOBJ).
125#
126# - A way for a pickle to identify its protocol (PROTO).
127#
128# - Time- and space- efficient pickling of long ints (LONG{1,4}).
129#
130# - Shortcuts for small tuples (TUPLE{1,2,3}}.
131#
132# - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
133#
134# - The "extension registry", a vector of popular objects that can be pushed
135#   efficiently by index (EXT{1,2,4}).  This is akin to the memo and GET, but
136#   the registry contents are predefined (there's nothing akin to the memo's
137#   PUT).
138#
139# Another independent change with Python 2.3 is the abandonment of any
140# pretense that it might be safe to load pickles received from untrusted
141# parties -- no sufficient security analysis has been done to guarantee
142# this and there isn't a use case that warrants the expense of such an
143# analysis.
144#
145# To this end, all tests for __safe_for_unpickling__ or for
146# copyreg.safe_constructors are removed from the unpickling code.
147# References to these variables in the descriptions below are to be seen
148# as describing unpickling in Python 2.2 and before.
149
150
151# Meta-rule:  Descriptions are stored in instances of descriptor objects,
152# with plain constructors.  No meta-language is defined from which
153# descriptors could be constructed.  If you want, e.g., XML, write a little
154# program to generate XML from the objects.
155
156##############################################################################
157# Some pickle opcodes have an argument, following the opcode in the
158# bytestream.  An argument is of a specific type, described by an instance
159# of ArgumentDescriptor.  These are not to be confused with arguments taken
160# off the stack -- ArgumentDescriptor applies only to arguments embedded in
161# the opcode stream, immediately following an opcode.
162
163# Represents the number of bytes consumed by an argument delimited by the
164# next newline character.
165UP_TO_NEWLINE = -1
166
167# Represents the number of bytes consumed by a two-argument opcode where
168# the first argument gives the number of bytes in the second argument.
169TAKEN_FROM_ARGUMENT1  = -2   # num bytes is 1-byte unsigned int
170TAKEN_FROM_ARGUMENT4  = -3   # num bytes is 4-byte signed little-endian int
171TAKEN_FROM_ARGUMENT4U = -4   # num bytes is 4-byte unsigned little-endian int
172TAKEN_FROM_ARGUMENT8U = -5   # num bytes is 8-byte unsigned little-endian int
173
174class ArgumentDescriptor(object):
175    __slots__ = (
176        # name of descriptor record, also a module global name; a string
177        'name',
178
179        # length of argument, in bytes; an int; UP_TO_NEWLINE and
180        # TAKEN_FROM_ARGUMENT{1,4,8} are negative values for variable-length
181        # cases
182        'n',
183
184        # a function taking a file-like object, reading this kind of argument
185        # from the object at the current position, advancing the current
186        # position by n bytes, and returning the value of the argument
187        'reader',
188
189        # human-readable docs for this arg descriptor; a string
190        'doc',
191    )
192
193    def __init__(self, name, n, reader, doc):
194        assert isinstance(name, str)
195        self.name = name
196
197        assert isinstance(n, int) and (n >= 0 or
198                                       n in (UP_TO_NEWLINE,
199                                             TAKEN_FROM_ARGUMENT1,
200                                             TAKEN_FROM_ARGUMENT4,
201                                             TAKEN_FROM_ARGUMENT4U,
202                                             TAKEN_FROM_ARGUMENT8U))
203        self.n = n
204
205        self.reader = reader
206
207        assert isinstance(doc, str)
208        self.doc = doc
209
210from struct import unpack as _unpack
211
212def read_uint1(f):
213    r"""
214    >>> import io
215    >>> read_uint1(io.BytesIO(b'\xff'))
216    255
217    """
218
219    data = f.read(1)
220    if data:
221        return data[0]
222    raise ValueError("not enough data in stream to read uint1")
223
224uint1 = ArgumentDescriptor(
225            name='uint1',
226            n=1,
227            reader=read_uint1,
228            doc="One-byte unsigned integer.")
229
230
231def read_uint2(f):
232    r"""
233    >>> import io
234    >>> read_uint2(io.BytesIO(b'\xff\x00'))
235    255
236    >>> read_uint2(io.BytesIO(b'\xff\xff'))
237    65535
238    """
239
240    data = f.read(2)
241    if len(data) == 2:
242        return _unpack("<H", data)[0]
243    raise ValueError("not enough data in stream to read uint2")
244
245uint2 = ArgumentDescriptor(
246            name='uint2',
247            n=2,
248            reader=read_uint2,
249            doc="Two-byte unsigned integer, little-endian.")
250
251
252def read_int4(f):
253    r"""
254    >>> import io
255    >>> read_int4(io.BytesIO(b'\xff\x00\x00\x00'))
256    255
257    >>> read_int4(io.BytesIO(b'\x00\x00\x00\x80')) == -(2**31)
258    True
259    """
260
261    data = f.read(4)
262    if len(data) == 4:
263        return _unpack("<i", data)[0]
264    raise ValueError("not enough data in stream to read int4")
265
266int4 = ArgumentDescriptor(
267           name='int4',
268           n=4,
269           reader=read_int4,
270           doc="Four-byte signed integer, little-endian, 2's complement.")
271
272
273def read_uint4(f):
274    r"""
275    >>> import io
276    >>> read_uint4(io.BytesIO(b'\xff\x00\x00\x00'))
277    255
278    >>> read_uint4(io.BytesIO(b'\x00\x00\x00\x80')) == 2**31
279    True
280    """
281
282    data = f.read(4)
283    if len(data) == 4:
284        return _unpack("<I", data)[0]
285    raise ValueError("not enough data in stream to read uint4")
286
287uint4 = ArgumentDescriptor(
288            name='uint4',
289            n=4,
290            reader=read_uint4,
291            doc="Four-byte unsigned integer, little-endian.")
292
293
294def read_uint8(f):
295    r"""
296    >>> import io
297    >>> read_uint8(io.BytesIO(b'\xff\x00\x00\x00\x00\x00\x00\x00'))
298    255
299    >>> read_uint8(io.BytesIO(b'\xff' * 8)) == 2**64-1
300    True
301    """
302
303    data = f.read(8)
304    if len(data) == 8:
305        return _unpack("<Q", data)[0]
306    raise ValueError("not enough data in stream to read uint8")
307
308uint8 = ArgumentDescriptor(
309            name='uint8',
310            n=8,
311            reader=read_uint8,
312            doc="Eight-byte unsigned integer, little-endian.")
313
314
315def read_stringnl(f, decode=True, stripquotes=True):
316    r"""
317    >>> import io
318    >>> read_stringnl(io.BytesIO(b"'abcd'\nefg\n"))
319    'abcd'
320
321    >>> read_stringnl(io.BytesIO(b"\n"))
322    Traceback (most recent call last):
323    ...
324    ValueError: no string quotes around b''
325
326    >>> read_stringnl(io.BytesIO(b"\n"), stripquotes=False)
327    ''
328
329    >>> read_stringnl(io.BytesIO(b"''\n"))
330    ''
331
332    >>> read_stringnl(io.BytesIO(b'"abcd"'))
333    Traceback (most recent call last):
334    ...
335    ValueError: no newline found when trying to read stringnl
336
337    Embedded escapes are undone in the result.
338    >>> read_stringnl(io.BytesIO(br"'a\n\\b\x00c\td'" + b"\n'e'"))
339    'a\n\\b\x00c\td'
340    """
341
342    data = f.readline()
343    if not data.endswith(b'\n'):
344        raise ValueError("no newline found when trying to read stringnl")
345    data = data[:-1]    # lose the newline
346
347    if stripquotes:
348        for q in (b'"', b"'"):
349            if data.startswith(q):
350                if not data.endswith(q):
351                    raise ValueError("strinq quote %r not found at both "
352                                     "ends of %r" % (q, data))
353                data = data[1:-1]
354                break
355        else:
356            raise ValueError("no string quotes around %r" % data)
357
358    if decode:
359        data = codecs.escape_decode(data)[0].decode("ascii")
360    return data
361
362stringnl = ArgumentDescriptor(
363               name='stringnl',
364               n=UP_TO_NEWLINE,
365               reader=read_stringnl,
366               doc="""A newline-terminated string.
367
368                   This is a repr-style string, with embedded escapes, and
369                   bracketing quotes.
370                   """)
371
372def read_stringnl_noescape(f):
373    return read_stringnl(f, stripquotes=False)
374
375stringnl_noescape = ArgumentDescriptor(
376                        name='stringnl_noescape',
377                        n=UP_TO_NEWLINE,
378                        reader=read_stringnl_noescape,
379                        doc="""A newline-terminated string.
380
381                        This is a str-style string, without embedded escapes,
382                        or bracketing quotes.  It should consist solely of
383                        printable ASCII characters.
384                        """)
385
386def read_stringnl_noescape_pair(f):
387    r"""
388    >>> import io
389    >>> read_stringnl_noescape_pair(io.BytesIO(b"Queue\nEmpty\njunk"))
390    'Queue Empty'
391    """
392
393    return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
394
395stringnl_noescape_pair = ArgumentDescriptor(
396                             name='stringnl_noescape_pair',
397                             n=UP_TO_NEWLINE,
398                             reader=read_stringnl_noescape_pair,
399                             doc="""A pair of newline-terminated strings.
400
401                             These are str-style strings, without embedded
402                             escapes, or bracketing quotes.  They should
403                             consist solely of printable ASCII characters.
404                             The pair is returned as a single string, with
405                             a single blank separating the two strings.
406                             """)
407
408
409def read_string1(f):
410    r"""
411    >>> import io
412    >>> read_string1(io.BytesIO(b"\x00"))
413    ''
414    >>> read_string1(io.BytesIO(b"\x03abcdef"))
415    'abc'
416    """
417
418    n = read_uint1(f)
419    assert n >= 0
420    data = f.read(n)
421    if len(data) == n:
422        return data.decode("latin-1")
423    raise ValueError("expected %d bytes in a string1, but only %d remain" %
424                     (n, len(data)))
425
426string1 = ArgumentDescriptor(
427              name="string1",
428              n=TAKEN_FROM_ARGUMENT1,
429              reader=read_string1,
430              doc="""A counted string.
431
432              The first argument is a 1-byte unsigned int giving the number
433              of bytes in the string, and the second argument is that many
434              bytes.
435              """)
436
437
438def read_string4(f):
439    r"""
440    >>> import io
441    >>> read_string4(io.BytesIO(b"\x00\x00\x00\x00abc"))
442    ''
443    >>> read_string4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
444    'abc'
445    >>> read_string4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
446    Traceback (most recent call last):
447    ...
448    ValueError: expected 50331648 bytes in a string4, but only 6 remain
449    """
450
451    n = read_int4(f)
452    if n < 0:
453        raise ValueError("string4 byte count < 0: %d" % n)
454    data = f.read(n)
455    if len(data) == n:
456        return data.decode("latin-1")
457    raise ValueError("expected %d bytes in a string4, but only %d remain" %
458                     (n, len(data)))
459
460string4 = ArgumentDescriptor(
461              name="string4",
462              n=TAKEN_FROM_ARGUMENT4,
463              reader=read_string4,
464              doc="""A counted string.
465
466              The first argument is a 4-byte little-endian signed int giving
467              the number of bytes in the string, and the second argument is
468              that many bytes.
469              """)
470
471
472def read_bytes1(f):
473    r"""
474    >>> import io
475    >>> read_bytes1(io.BytesIO(b"\x00"))
476    b''
477    >>> read_bytes1(io.BytesIO(b"\x03abcdef"))
478    b'abc'
479    """
480
481    n = read_uint1(f)
482    assert n >= 0
483    data = f.read(n)
484    if len(data) == n:
485        return data
486    raise ValueError("expected %d bytes in a bytes1, but only %d remain" %
487                     (n, len(data)))
488
489bytes1 = ArgumentDescriptor(
490              name="bytes1",
491              n=TAKEN_FROM_ARGUMENT1,
492              reader=read_bytes1,
493              doc="""A counted bytes string.
494
495              The first argument is a 1-byte unsigned int giving the number
496              of bytes, and the second argument is that many bytes.
497              """)
498
499
500def read_bytes4(f):
501    r"""
502    >>> import io
503    >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x00abc"))
504    b''
505    >>> read_bytes4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
506    b'abc'
507    >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
508    Traceback (most recent call last):
509    ...
510    ValueError: expected 50331648 bytes in a bytes4, but only 6 remain
511    """
512
513    n = read_uint4(f)
514    assert n >= 0
515    if n > sys.maxsize:
516        raise ValueError("bytes4 byte count > sys.maxsize: %d" % n)
517    data = f.read(n)
518    if len(data) == n:
519        return data
520    raise ValueError("expected %d bytes in a bytes4, but only %d remain" %
521                     (n, len(data)))
522
523bytes4 = ArgumentDescriptor(
524              name="bytes4",
525              n=TAKEN_FROM_ARGUMENT4U,
526              reader=read_bytes4,
527              doc="""A counted bytes string.
528
529              The first argument is a 4-byte little-endian unsigned int giving
530              the number of bytes, and the second argument is that many bytes.
531              """)
532
533
534def read_bytes8(f):
535    r"""
536    >>> import io, struct, sys
537    >>> read_bytes8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc"))
538    b''
539    >>> read_bytes8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef"))
540    b'abc'
541    >>> bigsize8 = struct.pack("<Q", sys.maxsize//3)
542    >>> read_bytes8(io.BytesIO(bigsize8 + b"abcdef"))  #doctest: +ELLIPSIS
543    Traceback (most recent call last):
544    ...
545    ValueError: expected ... bytes in a bytes8, but only 6 remain
546    """
547
548    n = read_uint8(f)
549    assert n >= 0
550    if n > sys.maxsize:
551        raise ValueError("bytes8 byte count > sys.maxsize: %d" % n)
552    data = f.read(n)
553    if len(data) == n:
554        return data
555    raise ValueError("expected %d bytes in a bytes8, but only %d remain" %
556                     (n, len(data)))
557
558bytes8 = ArgumentDescriptor(
559              name="bytes8",
560              n=TAKEN_FROM_ARGUMENT8U,
561              reader=read_bytes8,
562              doc="""A counted bytes string.
563
564              The first argument is an 8-byte little-endian unsigned int giving
565              the number of bytes, and the second argument is that many bytes.
566              """)
567
568
569def read_bytearray8(f):
570    r"""
571    >>> import io, struct, sys
572    >>> read_bytearray8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc"))
573    bytearray(b'')
574    >>> read_bytearray8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef"))
575    bytearray(b'abc')
576    >>> bigsize8 = struct.pack("<Q", sys.maxsize//3)
577    >>> read_bytearray8(io.BytesIO(bigsize8 + b"abcdef"))  #doctest: +ELLIPSIS
578    Traceback (most recent call last):
579    ...
580    ValueError: expected ... bytes in a bytearray8, but only 6 remain
581    """
582
583    n = read_uint8(f)
584    assert n >= 0
585    if n > sys.maxsize:
586        raise ValueError("bytearray8 byte count > sys.maxsize: %d" % n)
587    data = f.read(n)
588    if len(data) == n:
589        return bytearray(data)
590    raise ValueError("expected %d bytes in a bytearray8, but only %d remain" %
591                     (n, len(data)))
592
593bytearray8 = ArgumentDescriptor(
594              name="bytearray8",
595              n=TAKEN_FROM_ARGUMENT8U,
596              reader=read_bytearray8,
597              doc="""A counted bytearray.
598
599              The first argument is an 8-byte little-endian unsigned int giving
600              the number of bytes, and the second argument is that many bytes.
601              """)
602
603def read_unicodestringnl(f):
604    r"""
605    >>> import io
606    >>> read_unicodestringnl(io.BytesIO(b"abc\\uabcd\njunk")) == 'abc\uabcd'
607    True
608    """
609
610    data = f.readline()
611    if not data.endswith(b'\n'):
612        raise ValueError("no newline found when trying to read "
613                         "unicodestringnl")
614    data = data[:-1]    # lose the newline
615    return str(data, 'raw-unicode-escape')
616
617unicodestringnl = ArgumentDescriptor(
618                      name='unicodestringnl',
619                      n=UP_TO_NEWLINE,
620                      reader=read_unicodestringnl,
621                      doc="""A newline-terminated Unicode string.
622
623                      This is raw-unicode-escape encoded, so consists of
624                      printable ASCII characters, and may contain embedded
625                      escape sequences.
626                      """)
627
628
629def read_unicodestring1(f):
630    r"""
631    >>> import io
632    >>> s = 'abcd\uabcd'
633    >>> enc = s.encode('utf-8')
634    >>> enc
635    b'abcd\xea\xaf\x8d'
636    >>> n = bytes([len(enc)])  # little-endian 1-byte length
637    >>> t = read_unicodestring1(io.BytesIO(n + enc + b'junk'))
638    >>> s == t
639    True
640
641    >>> read_unicodestring1(io.BytesIO(n + enc[:-1]))
642    Traceback (most recent call last):
643    ...
644    ValueError: expected 7 bytes in a unicodestring1, but only 6 remain
645    """
646
647    n = read_uint1(f)
648    assert n >= 0
649    data = f.read(n)
650    if len(data) == n:
651        return str(data, 'utf-8', 'surrogatepass')
652    raise ValueError("expected %d bytes in a unicodestring1, but only %d "
653                     "remain" % (n, len(data)))
654
655unicodestring1 = ArgumentDescriptor(
656                    name="unicodestring1",
657                    n=TAKEN_FROM_ARGUMENT1,
658                    reader=read_unicodestring1,
659                    doc="""A counted Unicode string.
660
661                    The first argument is a 1-byte little-endian signed int
662                    giving the number of bytes in the string, and the second
663                    argument-- the UTF-8 encoding of the Unicode string --
664                    contains that many bytes.
665                    """)
666
667
668def read_unicodestring4(f):
669    r"""
670    >>> import io
671    >>> s = 'abcd\uabcd'
672    >>> enc = s.encode('utf-8')
673    >>> enc
674    b'abcd\xea\xaf\x8d'
675    >>> n = bytes([len(enc), 0, 0, 0])  # little-endian 4-byte length
676    >>> t = read_unicodestring4(io.BytesIO(n + enc + b'junk'))
677    >>> s == t
678    True
679
680    >>> read_unicodestring4(io.BytesIO(n + enc[:-1]))
681    Traceback (most recent call last):
682    ...
683    ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
684    """
685
686    n = read_uint4(f)
687    assert n >= 0
688    if n > sys.maxsize:
689        raise ValueError("unicodestring4 byte count > sys.maxsize: %d" % n)
690    data = f.read(n)
691    if len(data) == n:
692        return str(data, 'utf-8', 'surrogatepass')
693    raise ValueError("expected %d bytes in a unicodestring4, but only %d "
694                     "remain" % (n, len(data)))
695
696unicodestring4 = ArgumentDescriptor(
697                    name="unicodestring4",
698                    n=TAKEN_FROM_ARGUMENT4U,
699                    reader=read_unicodestring4,
700                    doc="""A counted Unicode string.
701
702                    The first argument is a 4-byte little-endian signed int
703                    giving the number of bytes in the string, and the second
704                    argument-- the UTF-8 encoding of the Unicode string --
705                    contains that many bytes.
706                    """)
707
708
709def read_unicodestring8(f):
710    r"""
711    >>> import io
712    >>> s = 'abcd\uabcd'
713    >>> enc = s.encode('utf-8')
714    >>> enc
715    b'abcd\xea\xaf\x8d'
716    >>> n = bytes([len(enc)]) + b'\0' * 7  # little-endian 8-byte length
717    >>> t = read_unicodestring8(io.BytesIO(n + enc + b'junk'))
718    >>> s == t
719    True
720
721    >>> read_unicodestring8(io.BytesIO(n + enc[:-1]))
722    Traceback (most recent call last):
723    ...
724    ValueError: expected 7 bytes in a unicodestring8, but only 6 remain
725    """
726
727    n = read_uint8(f)
728    assert n >= 0
729    if n > sys.maxsize:
730        raise ValueError("unicodestring8 byte count > sys.maxsize: %d" % n)
731    data = f.read(n)
732    if len(data) == n:
733        return str(data, 'utf-8', 'surrogatepass')
734    raise ValueError("expected %d bytes in a unicodestring8, but only %d "
735                     "remain" % (n, len(data)))
736
737unicodestring8 = ArgumentDescriptor(
738                    name="unicodestring8",
739                    n=TAKEN_FROM_ARGUMENT8U,
740                    reader=read_unicodestring8,
741                    doc="""A counted Unicode string.
742
743                    The first argument is an 8-byte little-endian signed int
744                    giving the number of bytes in the string, and the second
745                    argument-- the UTF-8 encoding of the Unicode string --
746                    contains that many bytes.
747                    """)
748
749
750def read_decimalnl_short(f):
751    r"""
752    >>> import io
753    >>> read_decimalnl_short(io.BytesIO(b"1234\n56"))
754    1234
755
756    >>> read_decimalnl_short(io.BytesIO(b"1234L\n56"))
757    Traceback (most recent call last):
758    ...
759    ValueError: invalid literal for int() with base 10: b'1234L'
760    """
761
762    s = read_stringnl(f, decode=False, stripquotes=False)
763
764    # There's a hack for True and False here.
765    if s == b"00":
766        return False
767    elif s == b"01":
768        return True
769
770    return int(s)
771
772def read_decimalnl_long(f):
773    r"""
774    >>> import io
775
776    >>> read_decimalnl_long(io.BytesIO(b"1234L\n56"))
777    1234
778
779    >>> read_decimalnl_long(io.BytesIO(b"123456789012345678901234L\n6"))
780    123456789012345678901234
781    """
782
783    s = read_stringnl(f, decode=False, stripquotes=False)
784    if s[-1:] == b'L':
785        s = s[:-1]
786    return int(s)
787
788
789decimalnl_short = ArgumentDescriptor(
790                      name='decimalnl_short',
791                      n=UP_TO_NEWLINE,
792                      reader=read_decimalnl_short,
793                      doc="""A newline-terminated decimal integer literal.
794
795                          This never has a trailing 'L', and the integer fit
796                          in a short Python int on the box where the pickle
797                          was written -- but there's no guarantee it will fit
798                          in a short Python int on the box where the pickle
799                          is read.
800                          """)
801
802decimalnl_long = ArgumentDescriptor(
803                     name='decimalnl_long',
804                     n=UP_TO_NEWLINE,
805                     reader=read_decimalnl_long,
806                     doc="""A newline-terminated decimal integer literal.
807
808                         This has a trailing 'L', and can represent integers
809                         of any size.
810                         """)
811
812
813def read_floatnl(f):
814    r"""
815    >>> import io
816    >>> read_floatnl(io.BytesIO(b"-1.25\n6"))
817    -1.25
818    """
819    s = read_stringnl(f, decode=False, stripquotes=False)
820    return float(s)
821
822floatnl = ArgumentDescriptor(
823              name='floatnl',
824              n=UP_TO_NEWLINE,
825              reader=read_floatnl,
826              doc="""A newline-terminated decimal floating literal.
827
828              In general this requires 17 significant digits for roundtrip
829              identity, and pickling then unpickling infinities, NaNs, and
830              minus zero doesn't work across boxes, or on some boxes even
831              on itself (e.g., Windows can't read the strings it produces
832              for infinities or NaNs).
833              """)
834
835def read_float8(f):
836    r"""
837    >>> import io, struct
838    >>> raw = struct.pack(">d", -1.25)
839    >>> raw
840    b'\xbf\xf4\x00\x00\x00\x00\x00\x00'
841    >>> read_float8(io.BytesIO(raw + b"\n"))
842    -1.25
843    """
844
845    data = f.read(8)
846    if len(data) == 8:
847        return _unpack(">d", data)[0]
848    raise ValueError("not enough data in stream to read float8")
849
850
851float8 = ArgumentDescriptor(
852             name='float8',
853             n=8,
854             reader=read_float8,
855             doc="""An 8-byte binary representation of a float, big-endian.
856
857             The format is unique to Python, and shared with the struct
858             module (format string '>d') "in theory" (the struct and pickle
859             implementations don't share the code -- they should).  It's
860             strongly related to the IEEE-754 double format, and, in normal
861             cases, is in fact identical to the big-endian 754 double format.
862             On other boxes the dynamic range is limited to that of a 754
863             double, and "add a half and chop" rounding is used to reduce
864             the precision to 53 bits.  However, even on a 754 box,
865             infinities, NaNs, and minus zero may not be handled correctly
866             (may not survive roundtrip pickling intact).
867             """)
868
869# Protocol 2 formats
870
871from pickle import decode_long
872
873def read_long1(f):
874    r"""
875    >>> import io
876    >>> read_long1(io.BytesIO(b"\x00"))
877    0
878    >>> read_long1(io.BytesIO(b"\x02\xff\x00"))
879    255
880    >>> read_long1(io.BytesIO(b"\x02\xff\x7f"))
881    32767
882    >>> read_long1(io.BytesIO(b"\x02\x00\xff"))
883    -256
884    >>> read_long1(io.BytesIO(b"\x02\x00\x80"))
885    -32768
886    """
887
888    n = read_uint1(f)
889    data = f.read(n)
890    if len(data) != n:
891        raise ValueError("not enough data in stream to read long1")
892    return decode_long(data)
893
894long1 = ArgumentDescriptor(
895    name="long1",
896    n=TAKEN_FROM_ARGUMENT1,
897    reader=read_long1,
898    doc="""A binary long, little-endian, using 1-byte size.
899
900    This first reads one byte as an unsigned size, then reads that
901    many bytes and interprets them as a little-endian 2's-complement long.
902    If the size is 0, that's taken as a shortcut for the long 0L.
903    """)
904
905def read_long4(f):
906    r"""
907    >>> import io
908    >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x00"))
909    255
910    >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x7f"))
911    32767
912    >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\xff"))
913    -256
914    >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\x80"))
915    -32768
916    >>> read_long1(io.BytesIO(b"\x00\x00\x00\x00"))
917    0
918    """
919
920    n = read_int4(f)
921    if n < 0:
922        raise ValueError("long4 byte count < 0: %d" % n)
923    data = f.read(n)
924    if len(data) != n:
925        raise ValueError("not enough data in stream to read long4")
926    return decode_long(data)
927
928long4 = ArgumentDescriptor(
929    name="long4",
930    n=TAKEN_FROM_ARGUMENT4,
931    reader=read_long4,
932    doc="""A binary representation of a long, little-endian.
933
934    This first reads four bytes as a signed size (but requires the
935    size to be >= 0), then reads that many bytes and interprets them
936    as a little-endian 2's-complement long.  If the size is 0, that's taken
937    as a shortcut for the int 0, although LONG1 should really be used
938    then instead (and in any case where # of bytes < 256).
939    """)
940
941
942##############################################################################
943# Object descriptors.  The stack used by the pickle machine holds objects,
944# and in the stack_before and stack_after attributes of OpcodeInfo
945# descriptors we need names to describe the various types of objects that can
946# appear on the stack.
947
948class StackObject(object):
949    __slots__ = (
950        # name of descriptor record, for info only
951        'name',
952
953        # type of object, or tuple of type objects (meaning the object can
954        # be of any type in the tuple)
955        'obtype',
956
957        # human-readable docs for this kind of stack object; a string
958        'doc',
959    )
960
961    def __init__(self, name, obtype, doc):
962        assert isinstance(name, str)
963        self.name = name
964
965        assert isinstance(obtype, type) or isinstance(obtype, tuple)
966        if isinstance(obtype, tuple):
967            for contained in obtype:
968                assert isinstance(contained, type)
969        self.obtype = obtype
970
971        assert isinstance(doc, str)
972        self.doc = doc
973
974    def __repr__(self):
975        return self.name
976
977
978pyint = pylong = StackObject(
979    name='int',
980    obtype=int,
981    doc="A Python integer object.")
982
983pyinteger_or_bool = StackObject(
984    name='int_or_bool',
985    obtype=(int, bool),
986    doc="A Python integer or boolean object.")
987
988pybool = StackObject(
989    name='bool',
990    obtype=bool,
991    doc="A Python boolean object.")
992
993pyfloat = StackObject(
994    name='float',
995    obtype=float,
996    doc="A Python float object.")
997
998pybytes_or_str = pystring = StackObject(
999    name='bytes_or_str',
1000    obtype=(bytes, str),
1001    doc="A Python bytes or (Unicode) string object.")
1002
1003pybytes = StackObject(
1004    name='bytes',
1005    obtype=bytes,
1006    doc="A Python bytes object.")
1007
1008pybytearray = StackObject(
1009    name='bytearray',
1010    obtype=bytearray,
1011    doc="A Python bytearray object.")
1012
1013pyunicode = StackObject(
1014    name='str',
1015    obtype=str,
1016    doc="A Python (Unicode) string object.")
1017
1018pynone = StackObject(
1019    name="None",
1020    obtype=type(None),
1021    doc="The Python None object.")
1022
1023pytuple = StackObject(
1024    name="tuple",
1025    obtype=tuple,
1026    doc="A Python tuple object.")
1027
1028pylist = StackObject(
1029    name="list",
1030    obtype=list,
1031    doc="A Python list object.")
1032
1033pydict = StackObject(
1034    name="dict",
1035    obtype=dict,
1036    doc="A Python dict object.")
1037
1038pyset = StackObject(
1039    name="set",
1040    obtype=set,
1041    doc="A Python set object.")
1042
1043pyfrozenset = StackObject(
1044    name="frozenset",
1045    obtype=set,
1046    doc="A Python frozenset object.")
1047
1048pybuffer = StackObject(
1049    name='buffer',
1050    obtype=object,
1051    doc="A Python buffer-like object.")
1052
1053anyobject = StackObject(
1054    name='any',
1055    obtype=object,
1056    doc="Any kind of object whatsoever.")
1057
1058markobject = StackObject(
1059    name="mark",
1060    obtype=StackObject,
1061    doc="""'The mark' is a unique object.
1062
1063Opcodes that operate on a variable number of objects
1064generally don't embed the count of objects in the opcode,
1065or pull it off the stack.  Instead the MARK opcode is used
1066to push a special marker object on the stack, and then
1067some other opcodes grab all the objects from the top of
1068the stack down to (but not including) the topmost marker
1069object.
1070""")
1071
1072stackslice = StackObject(
1073    name="stackslice",
1074    obtype=StackObject,
1075    doc="""An object representing a contiguous slice of the stack.
1076
1077This is used in conjunction with markobject, to represent all
1078of the stack following the topmost markobject.  For example,
1079the POP_MARK opcode changes the stack from
1080
1081    [..., markobject, stackslice]
1082to
1083    [...]
1084
1085No matter how many object are on the stack after the topmost
1086markobject, POP_MARK gets rid of all of them (including the
1087topmost markobject too).
1088""")
1089
1090##############################################################################
1091# Descriptors for pickle opcodes.
1092
1093class OpcodeInfo(object):
1094
1095    __slots__ = (
1096        # symbolic name of opcode; a string
1097        'name',
1098
1099        # the code used in a bytestream to represent the opcode; a
1100        # one-character string
1101        'code',
1102
1103        # If the opcode has an argument embedded in the byte string, an
1104        # instance of ArgumentDescriptor specifying its type.  Note that
1105        # arg.reader(s) can be used to read and decode the argument from
1106        # the bytestream s, and arg.doc documents the format of the raw
1107        # argument bytes.  If the opcode doesn't have an argument embedded
1108        # in the bytestream, arg should be None.
1109        'arg',
1110
1111        # what the stack looks like before this opcode runs; a list
1112        'stack_before',
1113
1114        # what the stack looks like after this opcode runs; a list
1115        'stack_after',
1116
1117        # the protocol number in which this opcode was introduced; an int
1118        'proto',
1119
1120        # human-readable docs for this opcode; a string
1121        'doc',
1122    )
1123
1124    def __init__(self, name, code, arg,
1125                 stack_before, stack_after, proto, doc):
1126        assert isinstance(name, str)
1127        self.name = name
1128
1129        assert isinstance(code, str)
1130        assert len(code) == 1
1131        self.code = code
1132
1133        assert arg is None or isinstance(arg, ArgumentDescriptor)
1134        self.arg = arg
1135
1136        assert isinstance(stack_before, list)
1137        for x in stack_before:
1138            assert isinstance(x, StackObject)
1139        self.stack_before = stack_before
1140
1141        assert isinstance(stack_after, list)
1142        for x in stack_after:
1143            assert isinstance(x, StackObject)
1144        self.stack_after = stack_after
1145
1146        assert isinstance(proto, int) and 0 <= proto <= pickle.HIGHEST_PROTOCOL
1147        self.proto = proto
1148
1149        assert isinstance(doc, str)
1150        self.doc = doc
1151
1152I = OpcodeInfo
1153opcodes = [
1154
1155    # Ways to spell integers.
1156
1157    I(name='INT',
1158      code='I',
1159      arg=decimalnl_short,
1160      stack_before=[],
1161      stack_after=[pyinteger_or_bool],
1162      proto=0,
1163      doc="""Push an integer or bool.
1164
1165      The argument is a newline-terminated decimal literal string.
1166
1167      The intent may have been that this always fit in a short Python int,
1168      but INT can be generated in pickles written on a 64-bit box that
1169      require a Python long on a 32-bit box.  The difference between this
1170      and LONG then is that INT skips a trailing 'L', and produces a short
1171      int whenever possible.
1172
1173      Another difference is due to that, when bool was introduced as a
1174      distinct type in 2.3, builtin names True and False were also added to
1175      2.2.2, mapping to ints 1 and 0.  For compatibility in both directions,
1176      True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
1177      Leading zeroes are never produced for a genuine integer.  The 2.3
1178      (and later) unpicklers special-case these and return bool instead;
1179      earlier unpicklers ignore the leading "0" and return the int.
1180      """),
1181
1182    I(name='BININT',
1183      code='J',
1184      arg=int4,
1185      stack_before=[],
1186      stack_after=[pyint],
1187      proto=1,
1188      doc="""Push a four-byte signed integer.
1189
1190      This handles the full range of Python (short) integers on a 32-bit
1191      box, directly as binary bytes (1 for the opcode and 4 for the integer).
1192      If the integer is non-negative and fits in 1 or 2 bytes, pickling via
1193      BININT1 or BININT2 saves space.
1194      """),
1195
1196    I(name='BININT1',
1197      code='K',
1198      arg=uint1,
1199      stack_before=[],
1200      stack_after=[pyint],
1201      proto=1,
1202      doc="""Push a one-byte unsigned integer.
1203
1204      This is a space optimization for pickling very small non-negative ints,
1205      in range(256).
1206      """),
1207
1208    I(name='BININT2',
1209      code='M',
1210      arg=uint2,
1211      stack_before=[],
1212      stack_after=[pyint],
1213      proto=1,
1214      doc="""Push a two-byte unsigned integer.
1215
1216      This is a space optimization for pickling small positive ints, in
1217      range(256, 2**16).  Integers in range(256) can also be pickled via
1218      BININT2, but BININT1 instead saves a byte.
1219      """),
1220
1221    I(name='LONG',
1222      code='L',
1223      arg=decimalnl_long,
1224      stack_before=[],
1225      stack_after=[pyint],
1226      proto=0,
1227      doc="""Push a long integer.
1228
1229      The same as INT, except that the literal ends with 'L', and always
1230      unpickles to a Python long.  There doesn't seem a real purpose to the
1231      trailing 'L'.
1232
1233      Note that LONG takes time quadratic in the number of digits when
1234      unpickling (this is simply due to the nature of decimal->binary
1235      conversion).  Proto 2 added linear-time (in C; still quadratic-time
1236      in Python) LONG1 and LONG4 opcodes.
1237      """),
1238
1239    I(name="LONG1",
1240      code='\x8a',
1241      arg=long1,
1242      stack_before=[],
1243      stack_after=[pyint],
1244      proto=2,
1245      doc="""Long integer using one-byte length.
1246
1247      A more efficient encoding of a Python long; the long1 encoding
1248      says it all."""),
1249
1250    I(name="LONG4",
1251      code='\x8b',
1252      arg=long4,
1253      stack_before=[],
1254      stack_after=[pyint],
1255      proto=2,
1256      doc="""Long integer using found-byte length.
1257
1258      A more efficient encoding of a Python long; the long4 encoding
1259      says it all."""),
1260
1261    # Ways to spell strings (8-bit, not Unicode).
1262
1263    I(name='STRING',
1264      code='S',
1265      arg=stringnl,
1266      stack_before=[],
1267      stack_after=[pybytes_or_str],
1268      proto=0,
1269      doc="""Push a Python string object.
1270
1271      The argument is a repr-style string, with bracketing quote characters,
1272      and perhaps embedded escapes.  The argument extends until the next
1273      newline character.  These are usually decoded into a str instance
1274      using the encoding given to the Unpickler constructor. or the default,
1275      'ASCII'.  If the encoding given was 'bytes' however, they will be
1276      decoded as bytes object instead.
1277      """),
1278
1279    I(name='BINSTRING',
1280      code='T',
1281      arg=string4,
1282      stack_before=[],
1283      stack_after=[pybytes_or_str],
1284      proto=1,
1285      doc="""Push a Python string object.
1286
1287      There are two arguments: the first is a 4-byte little-endian
1288      signed int giving the number of bytes in the string, and the
1289      second is that many bytes, which are taken literally as the string
1290      content.  These are usually decoded into a str instance using the
1291      encoding given to the Unpickler constructor. or the default,
1292      'ASCII'.  If the encoding given was 'bytes' however, they will be
1293      decoded as bytes object instead.
1294      """),
1295
1296    I(name='SHORT_BINSTRING',
1297      code='U',
1298      arg=string1,
1299      stack_before=[],
1300      stack_after=[pybytes_or_str],
1301      proto=1,
1302      doc="""Push a Python string object.
1303
1304      There are two arguments: the first is a 1-byte unsigned int giving
1305      the number of bytes in the string, and the second is that many
1306      bytes, which are taken literally as the string content.  These are
1307      usually decoded into a str instance using the encoding given to
1308      the Unpickler constructor. or the default, 'ASCII'.  If the
1309      encoding given was 'bytes' however, they will be decoded as bytes
1310      object instead.
1311      """),
1312
1313    # Bytes (protocol 3 and higher)
1314
1315    I(name='BINBYTES',
1316      code='B',
1317      arg=bytes4,
1318      stack_before=[],
1319      stack_after=[pybytes],
1320      proto=3,
1321      doc="""Push a Python bytes object.
1322
1323      There are two arguments:  the first is a 4-byte little-endian unsigned int
1324      giving the number of bytes, and the second is that many bytes, which are
1325      taken literally as the bytes content.
1326      """),
1327
1328    I(name='SHORT_BINBYTES',
1329      code='C',
1330      arg=bytes1,
1331      stack_before=[],
1332      stack_after=[pybytes],
1333      proto=3,
1334      doc="""Push a Python bytes object.
1335
1336      There are two arguments:  the first is a 1-byte unsigned int giving
1337      the number of bytes, and the second is that many bytes, which are taken
1338      literally as the string content.
1339      """),
1340
1341    I(name='BINBYTES8',
1342      code='\x8e',
1343      arg=bytes8,
1344      stack_before=[],
1345      stack_after=[pybytes],
1346      proto=4,
1347      doc="""Push a Python bytes object.
1348
1349      There are two arguments:  the first is an 8-byte unsigned int giving
1350      the number of bytes in the string, and the second is that many bytes,
1351      which are taken literally as the string content.
1352      """),
1353
1354    # Bytearray (protocol 5 and higher)
1355
1356    I(name='BYTEARRAY8',
1357      code='\x96',
1358      arg=bytearray8,
1359      stack_before=[],
1360      stack_after=[pybytearray],
1361      proto=5,
1362      doc="""Push a Python bytearray object.
1363
1364      There are two arguments:  the first is an 8-byte unsigned int giving
1365      the number of bytes in the bytearray, and the second is that many bytes,
1366      which are taken literally as the bytearray content.
1367      """),
1368
1369    # Out-of-band buffer (protocol 5 and higher)
1370
1371    I(name='NEXT_BUFFER',
1372      code='\x97',
1373      arg=None,
1374      stack_before=[],
1375      stack_after=[pybuffer],
1376      proto=5,
1377      doc="Push an out-of-band buffer object."),
1378
1379    I(name='READONLY_BUFFER',
1380      code='\x98',
1381      arg=None,
1382      stack_before=[pybuffer],
1383      stack_after=[pybuffer],
1384      proto=5,
1385      doc="Make an out-of-band buffer object read-only."),
1386
1387    # Ways to spell None.
1388
1389    I(name='NONE',
1390      code='N',
1391      arg=None,
1392      stack_before=[],
1393      stack_after=[pynone],
1394      proto=0,
1395      doc="Push None on the stack."),
1396
1397    # Ways to spell bools, starting with proto 2.  See INT for how this was
1398    # done before proto 2.
1399
1400    I(name='NEWTRUE',
1401      code='\x88',
1402      arg=None,
1403      stack_before=[],
1404      stack_after=[pybool],
1405      proto=2,
1406      doc="Push True onto the stack."),
1407
1408    I(name='NEWFALSE',
1409      code='\x89',
1410      arg=None,
1411      stack_before=[],
1412      stack_after=[pybool],
1413      proto=2,
1414      doc="Push False onto the stack."),
1415
1416    # Ways to spell Unicode strings.
1417
1418    I(name='UNICODE',
1419      code='V',
1420      arg=unicodestringnl,
1421      stack_before=[],
1422      stack_after=[pyunicode],
1423      proto=0,  # this may be pure-text, but it's a later addition
1424      doc="""Push a Python Unicode string object.
1425
1426      The argument is a raw-unicode-escape encoding of a Unicode string,
1427      and so may contain embedded escape sequences.  The argument extends
1428      until the next newline character.
1429      """),
1430
1431    I(name='SHORT_BINUNICODE',
1432      code='\x8c',
1433      arg=unicodestring1,
1434      stack_before=[],
1435      stack_after=[pyunicode],
1436      proto=4,
1437      doc="""Push a Python Unicode string object.
1438
1439      There are two arguments:  the first is a 1-byte little-endian signed int
1440      giving the number of bytes in the string.  The second is that many
1441      bytes, and is the UTF-8 encoding of the Unicode string.
1442      """),
1443
1444    I(name='BINUNICODE',
1445      code='X',
1446      arg=unicodestring4,
1447      stack_before=[],
1448      stack_after=[pyunicode],
1449      proto=1,
1450      doc="""Push a Python Unicode string object.
1451
1452      There are two arguments:  the first is a 4-byte little-endian unsigned int
1453      giving the number of bytes in the string.  The second is that many
1454      bytes, and is the UTF-8 encoding of the Unicode string.
1455      """),
1456
1457    I(name='BINUNICODE8',
1458      code='\x8d',
1459      arg=unicodestring8,
1460      stack_before=[],
1461      stack_after=[pyunicode],
1462      proto=4,
1463      doc="""Push a Python Unicode string object.
1464
1465      There are two arguments:  the first is an 8-byte little-endian signed int
1466      giving the number of bytes in the string.  The second is that many
1467      bytes, and is the UTF-8 encoding of the Unicode string.
1468      """),
1469
1470    # Ways to spell floats.
1471
1472    I(name='FLOAT',
1473      code='F',
1474      arg=floatnl,
1475      stack_before=[],
1476      stack_after=[pyfloat],
1477      proto=0,
1478      doc="""Newline-terminated decimal float literal.
1479
1480      The argument is repr(a_float), and in general requires 17 significant
1481      digits for roundtrip conversion to be an identity (this is so for
1482      IEEE-754 double precision values, which is what Python float maps to
1483      on most boxes).
1484
1485      In general, FLOAT cannot be used to transport infinities, NaNs, or
1486      minus zero across boxes (or even on a single box, if the platform C
1487      library can't read the strings it produces for such things -- Windows
1488      is like that), but may do less damage than BINFLOAT on boxes with
1489      greater precision or dynamic range than IEEE-754 double.
1490      """),
1491
1492    I(name='BINFLOAT',
1493      code='G',
1494      arg=float8,
1495      stack_before=[],
1496      stack_after=[pyfloat],
1497      proto=1,
1498      doc="""Float stored in binary form, with 8 bytes of data.
1499
1500      This generally requires less than half the space of FLOAT encoding.
1501      In general, BINFLOAT cannot be used to transport infinities, NaNs, or
1502      minus zero, raises an exception if the exponent exceeds the range of
1503      an IEEE-754 double, and retains no more than 53 bits of precision (if
1504      there are more than that, "add a half and chop" rounding is used to
1505      cut it back to 53 significant bits).
1506      """),
1507
1508    # Ways to build lists.
1509
1510    I(name='EMPTY_LIST',
1511      code=']',
1512      arg=None,
1513      stack_before=[],
1514      stack_after=[pylist],
1515      proto=1,
1516      doc="Push an empty list."),
1517
1518    I(name='APPEND',
1519      code='a',
1520      arg=None,
1521      stack_before=[pylist, anyobject],
1522      stack_after=[pylist],
1523      proto=0,
1524      doc="""Append an object to a list.
1525
1526      Stack before:  ... pylist anyobject
1527      Stack after:   ... pylist+[anyobject]
1528
1529      although pylist is really extended in-place.
1530      """),
1531
1532    I(name='APPENDS',
1533      code='e',
1534      arg=None,
1535      stack_before=[pylist, markobject, stackslice],
1536      stack_after=[pylist],
1537      proto=1,
1538      doc="""Extend a list by a slice of stack objects.
1539
1540      Stack before:  ... pylist markobject stackslice
1541      Stack after:   ... pylist+stackslice
1542
1543      although pylist is really extended in-place.
1544      """),
1545
1546    I(name='LIST',
1547      code='l',
1548      arg=None,
1549      stack_before=[markobject, stackslice],
1550      stack_after=[pylist],
1551      proto=0,
1552      doc="""Build a list out of the topmost stack slice, after markobject.
1553
1554      All the stack entries following the topmost markobject are placed into
1555      a single Python list, which single list object replaces all of the
1556      stack from the topmost markobject onward.  For example,
1557
1558      Stack before: ... markobject 1 2 3 'abc'
1559      Stack after:  ... [1, 2, 3, 'abc']
1560      """),
1561
1562    # Ways to build tuples.
1563
1564    I(name='EMPTY_TUPLE',
1565      code=')',
1566      arg=None,
1567      stack_before=[],
1568      stack_after=[pytuple],
1569      proto=1,
1570      doc="Push an empty tuple."),
1571
1572    I(name='TUPLE',
1573      code='t',
1574      arg=None,
1575      stack_before=[markobject, stackslice],
1576      stack_after=[pytuple],
1577      proto=0,
1578      doc="""Build a tuple out of the topmost stack slice, after markobject.
1579
1580      All the stack entries following the topmost markobject are placed into
1581      a single Python tuple, which single tuple object replaces all of the
1582      stack from the topmost markobject onward.  For example,
1583
1584      Stack before: ... markobject 1 2 3 'abc'
1585      Stack after:  ... (1, 2, 3, 'abc')
1586      """),
1587
1588    I(name='TUPLE1',
1589      code='\x85',
1590      arg=None,
1591      stack_before=[anyobject],
1592      stack_after=[pytuple],
1593      proto=2,
1594      doc="""Build a one-tuple out of the topmost item on the stack.
1595
1596      This code pops one value off the stack and pushes a tuple of
1597      length 1 whose one item is that value back onto it.  In other
1598      words:
1599
1600          stack[-1] = tuple(stack[-1:])
1601      """),
1602
1603    I(name='TUPLE2',
1604      code='\x86',
1605      arg=None,
1606      stack_before=[anyobject, anyobject],
1607      stack_after=[pytuple],
1608      proto=2,
1609      doc="""Build a two-tuple out of the top two items on the stack.
1610
1611      This code pops two values off the stack and pushes a tuple of
1612      length 2 whose items are those values back onto it.  In other
1613      words:
1614
1615          stack[-2:] = [tuple(stack[-2:])]
1616      """),
1617
1618    I(name='TUPLE3',
1619      code='\x87',
1620      arg=None,
1621      stack_before=[anyobject, anyobject, anyobject],
1622      stack_after=[pytuple],
1623      proto=2,
1624      doc="""Build a three-tuple out of the top three items on the stack.
1625
1626      This code pops three values off the stack and pushes a tuple of
1627      length 3 whose items are those values back onto it.  In other
1628      words:
1629
1630          stack[-3:] = [tuple(stack[-3:])]
1631      """),
1632
1633    # Ways to build dicts.
1634
1635    I(name='EMPTY_DICT',
1636      code='}',
1637      arg=None,
1638      stack_before=[],
1639      stack_after=[pydict],
1640      proto=1,
1641      doc="Push an empty dict."),
1642
1643    I(name='DICT',
1644      code='d',
1645      arg=None,
1646      stack_before=[markobject, stackslice],
1647      stack_after=[pydict],
1648      proto=0,
1649      doc="""Build a dict out of the topmost stack slice, after markobject.
1650
1651      All the stack entries following the topmost markobject are placed into
1652      a single Python dict, which single dict object replaces all of the
1653      stack from the topmost markobject onward.  The stack slice alternates
1654      key, value, key, value, ....  For example,
1655
1656      Stack before: ... markobject 1 2 3 'abc'
1657      Stack after:  ... {1: 2, 3: 'abc'}
1658      """),
1659
1660    I(name='SETITEM',
1661      code='s',
1662      arg=None,
1663      stack_before=[pydict, anyobject, anyobject],
1664      stack_after=[pydict],
1665      proto=0,
1666      doc="""Add a key+value pair to an existing dict.
1667
1668      Stack before:  ... pydict key value
1669      Stack after:   ... pydict
1670
1671      where pydict has been modified via pydict[key] = value.
1672      """),
1673
1674    I(name='SETITEMS',
1675      code='u',
1676      arg=None,
1677      stack_before=[pydict, markobject, stackslice],
1678      stack_after=[pydict],
1679      proto=1,
1680      doc="""Add an arbitrary number of key+value pairs to an existing dict.
1681
1682      The slice of the stack following the topmost markobject is taken as
1683      an alternating sequence of keys and values, added to the dict
1684      immediately under the topmost markobject.  Everything at and after the
1685      topmost markobject is popped, leaving the mutated dict at the top
1686      of the stack.
1687
1688      Stack before:  ... pydict markobject key_1 value_1 ... key_n value_n
1689      Stack after:   ... pydict
1690
1691      where pydict has been modified via pydict[key_i] = value_i for i in
1692      1, 2, ..., n, and in that order.
1693      """),
1694
1695    # Ways to build sets
1696
1697    I(name='EMPTY_SET',
1698      code='\x8f',
1699      arg=None,
1700      stack_before=[],
1701      stack_after=[pyset],
1702      proto=4,
1703      doc="Push an empty set."),
1704
1705    I(name='ADDITEMS',
1706      code='\x90',
1707      arg=None,
1708      stack_before=[pyset, markobject, stackslice],
1709      stack_after=[pyset],
1710      proto=4,
1711      doc="""Add an arbitrary number of items to an existing set.
1712
1713      The slice of the stack following the topmost markobject is taken as
1714      a sequence of items, added to the set immediately under the topmost
1715      markobject.  Everything at and after the topmost markobject is popped,
1716      leaving the mutated set at the top of the stack.
1717
1718      Stack before:  ... pyset markobject item_1 ... item_n
1719      Stack after:   ... pyset
1720
1721      where pyset has been modified via pyset.add(item_i) = item_i for i in
1722      1, 2, ..., n, and in that order.
1723      """),
1724
1725    # Way to build frozensets
1726
1727    I(name='FROZENSET',
1728      code='\x91',
1729      arg=None,
1730      stack_before=[markobject, stackslice],
1731      stack_after=[pyfrozenset],
1732      proto=4,
1733      doc="""Build a frozenset out of the topmost slice, after markobject.
1734
1735      All the stack entries following the topmost markobject are placed into
1736      a single Python frozenset, which single frozenset object replaces all
1737      of the stack from the topmost markobject onward.  For example,
1738
1739      Stack before: ... markobject 1 2 3
1740      Stack after:  ... frozenset({1, 2, 3})
1741      """),
1742
1743    # Stack manipulation.
1744
1745    I(name='POP',
1746      code='0',
1747      arg=None,
1748      stack_before=[anyobject],
1749      stack_after=[],
1750      proto=0,
1751      doc="Discard the top stack item, shrinking the stack by one item."),
1752
1753    I(name='DUP',
1754      code='2',
1755      arg=None,
1756      stack_before=[anyobject],
1757      stack_after=[anyobject, anyobject],
1758      proto=0,
1759      doc="Push the top stack item onto the stack again, duplicating it."),
1760
1761    I(name='MARK',
1762      code='(',
1763      arg=None,
1764      stack_before=[],
1765      stack_after=[markobject],
1766      proto=0,
1767      doc="""Push markobject onto the stack.
1768
1769      markobject is a unique object, used by other opcodes to identify a
1770      region of the stack containing a variable number of objects for them
1771      to work on.  See markobject.doc for more detail.
1772      """),
1773
1774    I(name='POP_MARK',
1775      code='1',
1776      arg=None,
1777      stack_before=[markobject, stackslice],
1778      stack_after=[],
1779      proto=1,
1780      doc="""Pop all the stack objects at and above the topmost markobject.
1781
1782      When an opcode using a variable number of stack objects is done,
1783      POP_MARK is used to remove those objects, and to remove the markobject
1784      that delimited their starting position on the stack.
1785      """),
1786
1787    # Memo manipulation.  There are really only two operations (get and put),
1788    # each in all-text, "short binary", and "long binary" flavors.
1789
1790    I(name='GET',
1791      code='g',
1792      arg=decimalnl_short,
1793      stack_before=[],
1794      stack_after=[anyobject],
1795      proto=0,
1796      doc="""Read an object from the memo and push it on the stack.
1797
1798      The index of the memo object to push is given by the newline-terminated
1799      decimal string following.  BINGET and LONG_BINGET are space-optimized
1800      versions.
1801      """),
1802
1803    I(name='BINGET',
1804      code='h',
1805      arg=uint1,
1806      stack_before=[],
1807      stack_after=[anyobject],
1808      proto=1,
1809      doc="""Read an object from the memo and push it on the stack.
1810
1811      The index of the memo object to push is given by the 1-byte unsigned
1812      integer following.
1813      """),
1814
1815    I(name='LONG_BINGET',
1816      code='j',
1817      arg=uint4,
1818      stack_before=[],
1819      stack_after=[anyobject],
1820      proto=1,
1821      doc="""Read an object from the memo and push it on the stack.
1822
1823      The index of the memo object to push is given by the 4-byte unsigned
1824      little-endian integer following.
1825      """),
1826
1827    I(name='PUT',
1828      code='p',
1829      arg=decimalnl_short,
1830      stack_before=[],
1831      stack_after=[],
1832      proto=0,
1833      doc="""Store the stack top into the memo.  The stack is not popped.
1834
1835      The index of the memo location to write into is given by the newline-
1836      terminated decimal string following.  BINPUT and LONG_BINPUT are
1837      space-optimized versions.
1838      """),
1839
1840    I(name='BINPUT',
1841      code='q',
1842      arg=uint1,
1843      stack_before=[],
1844      stack_after=[],
1845      proto=1,
1846      doc="""Store the stack top into the memo.  The stack is not popped.
1847
1848      The index of the memo location to write into is given by the 1-byte
1849      unsigned integer following.
1850      """),
1851
1852    I(name='LONG_BINPUT',
1853      code='r',
1854      arg=uint4,
1855      stack_before=[],
1856      stack_after=[],
1857      proto=1,
1858      doc="""Store the stack top into the memo.  The stack is not popped.
1859
1860      The index of the memo location to write into is given by the 4-byte
1861      unsigned little-endian integer following.
1862      """),
1863
1864    I(name='MEMOIZE',
1865      code='\x94',
1866      arg=None,
1867      stack_before=[anyobject],
1868      stack_after=[anyobject],
1869      proto=4,
1870      doc="""Store the stack top into the memo.  The stack is not popped.
1871
1872      The index of the memo location to write is the number of
1873      elements currently present in the memo.
1874      """),
1875
1876    # Access the extension registry (predefined objects).  Akin to the GET
1877    # family.
1878
1879    I(name='EXT1',
1880      code='\x82',
1881      arg=uint1,
1882      stack_before=[],
1883      stack_after=[anyobject],
1884      proto=2,
1885      doc="""Extension code.
1886
1887      This code and the similar EXT2 and EXT4 allow using a registry
1888      of popular objects that are pickled by name, typically classes.
1889      It is envisioned that through a global negotiation and
1890      registration process, third parties can set up a mapping between
1891      ints and object names.
1892
1893      In order to guarantee pickle interchangeability, the extension
1894      code registry ought to be global, although a range of codes may
1895      be reserved for private use.
1896
1897      EXT1 has a 1-byte integer argument.  This is used to index into the
1898      extension registry, and the object at that index is pushed on the stack.
1899      """),
1900
1901    I(name='EXT2',
1902      code='\x83',
1903      arg=uint2,
1904      stack_before=[],
1905      stack_after=[anyobject],
1906      proto=2,
1907      doc="""Extension code.
1908
1909      See EXT1.  EXT2 has a two-byte integer argument.
1910      """),
1911
1912    I(name='EXT4',
1913      code='\x84',
1914      arg=int4,
1915      stack_before=[],
1916      stack_after=[anyobject],
1917      proto=2,
1918      doc="""Extension code.
1919
1920      See EXT1.  EXT4 has a four-byte integer argument.
1921      """),
1922
1923    # Push a class object, or module function, on the stack, via its module
1924    # and name.
1925
1926    I(name='GLOBAL',
1927      code='c',
1928      arg=stringnl_noescape_pair,
1929      stack_before=[],
1930      stack_after=[anyobject],
1931      proto=0,
1932      doc="""Push a global object (module.attr) on the stack.
1933
1934      Two newline-terminated strings follow the GLOBAL opcode.  The first is
1935      taken as a module name, and the second as a class name.  The class
1936      object module.class is pushed on the stack.  More accurately, the
1937      object returned by self.find_class(module, class) is pushed on the
1938      stack, so unpickling subclasses can override this form of lookup.
1939      """),
1940
1941    I(name='STACK_GLOBAL',
1942      code='\x93',
1943      arg=None,
1944      stack_before=[pyunicode, pyunicode],
1945      stack_after=[anyobject],
1946      proto=4,
1947      doc="""Push a global object (module.attr) on the stack.
1948      """),
1949
1950    # Ways to build objects of classes pickle doesn't know about directly
1951    # (user-defined classes).  I despair of documenting this accurately
1952    # and comprehensibly -- you really have to read the pickle code to
1953    # find all the special cases.
1954
1955    I(name='REDUCE',
1956      code='R',
1957      arg=None,
1958      stack_before=[anyobject, anyobject],
1959      stack_after=[anyobject],
1960      proto=0,
1961      doc="""Push an object built from a callable and an argument tuple.
1962
1963      The opcode is named to remind of the __reduce__() method.
1964
1965      Stack before: ... callable pytuple
1966      Stack after:  ... callable(*pytuple)
1967
1968      The callable and the argument tuple are the first two items returned
1969      by a __reduce__ method.  Applying the callable to the argtuple is
1970      supposed to reproduce the original object, or at least get it started.
1971      If the __reduce__ method returns a 3-tuple, the last component is an
1972      argument to be passed to the object's __setstate__, and then the REDUCE
1973      opcode is followed by code to create setstate's argument, and then a
1974      BUILD opcode to apply  __setstate__ to that argument.
1975
1976      If not isinstance(callable, type), REDUCE complains unless the
1977      callable has been registered with the copyreg module's
1978      safe_constructors dict, or the callable has a magic
1979      '__safe_for_unpickling__' attribute with a true value.  I'm not sure
1980      why it does this, but I've sure seen this complaint often enough when
1981      I didn't want to <wink>.
1982      """),
1983
1984    I(name='BUILD',
1985      code='b',
1986      arg=None,
1987      stack_before=[anyobject, anyobject],
1988      stack_after=[anyobject],
1989      proto=0,
1990      doc="""Finish building an object, via __setstate__ or dict update.
1991
1992      Stack before: ... anyobject argument
1993      Stack after:  ... anyobject
1994
1995      where anyobject may have been mutated, as follows:
1996
1997      If the object has a __setstate__ method,
1998
1999          anyobject.__setstate__(argument)
2000
2001      is called.
2002
2003      Else the argument must be a dict, the object must have a __dict__, and
2004      the object is updated via
2005
2006          anyobject.__dict__.update(argument)
2007      """),
2008
2009    I(name='INST',
2010      code='i',
2011      arg=stringnl_noescape_pair,
2012      stack_before=[markobject, stackslice],
2013      stack_after=[anyobject],
2014      proto=0,
2015      doc="""Build a class instance.
2016
2017      This is the protocol 0 version of protocol 1's OBJ opcode.
2018      INST is followed by two newline-terminated strings, giving a
2019      module and class name, just as for the GLOBAL opcode (and see
2020      GLOBAL for more details about that).  self.find_class(module, name)
2021      is used to get a class object.
2022
2023      In addition, all the objects on the stack following the topmost
2024      markobject are gathered into a tuple and popped (along with the
2025      topmost markobject), just as for the TUPLE opcode.
2026
2027      Now it gets complicated.  If all of these are true:
2028
2029        + The argtuple is empty (markobject was at the top of the stack
2030          at the start).
2031
2032        + The class object does not have a __getinitargs__ attribute.
2033
2034      then we want to create an old-style class instance without invoking
2035      its __init__() method (pickle has waffled on this over the years; not
2036      calling __init__() is current wisdom).  In this case, an instance of
2037      an old-style dummy class is created, and then we try to rebind its
2038      __class__ attribute to the desired class object.  If this succeeds,
2039      the new instance object is pushed on the stack, and we're done.
2040
2041      Else (the argtuple is not empty, it's not an old-style class object,
2042      or the class object does have a __getinitargs__ attribute), the code
2043      first insists that the class object have a __safe_for_unpickling__
2044      attribute.  Unlike as for the __safe_for_unpickling__ check in REDUCE,
2045      it doesn't matter whether this attribute has a true or false value, it
2046      only matters whether it exists (XXX this is a bug).  If
2047      __safe_for_unpickling__ doesn't exist, UnpicklingError is raised.
2048
2049      Else (the class object does have a __safe_for_unpickling__ attr),
2050      the class object obtained from INST's arguments is applied to the
2051      argtuple obtained from the stack, and the resulting instance object
2052      is pushed on the stack.
2053
2054      NOTE:  checks for __safe_for_unpickling__ went away in Python 2.3.
2055      NOTE:  the distinction between old-style and new-style classes does
2056             not make sense in Python 3.
2057      """),
2058
2059    I(name='OBJ',
2060      code='o',
2061      arg=None,
2062      stack_before=[markobject, anyobject, stackslice],
2063      stack_after=[anyobject],
2064      proto=1,
2065      doc="""Build a class instance.
2066
2067      This is the protocol 1 version of protocol 0's INST opcode, and is
2068      very much like it.  The major difference is that the class object
2069      is taken off the stack, allowing it to be retrieved from the memo
2070      repeatedly if several instances of the same class are created.  This
2071      can be much more efficient (in both time and space) than repeatedly
2072      embedding the module and class names in INST opcodes.
2073
2074      Unlike INST, OBJ takes no arguments from the opcode stream.  Instead
2075      the class object is taken off the stack, immediately above the
2076      topmost markobject:
2077
2078      Stack before: ... markobject classobject stackslice
2079      Stack after:  ... new_instance_object
2080
2081      As for INST, the remainder of the stack above the markobject is
2082      gathered into an argument tuple, and then the logic seems identical,
2083      except that no __safe_for_unpickling__ check is done (XXX this is
2084      a bug).  See INST for the gory details.
2085
2086      NOTE:  In Python 2.3, INST and OBJ are identical except for how they
2087      get the class object.  That was always the intent; the implementations
2088      had diverged for accidental reasons.
2089      """),
2090
2091    I(name='NEWOBJ',
2092      code='\x81',
2093      arg=None,
2094      stack_before=[anyobject, anyobject],
2095      stack_after=[anyobject],
2096      proto=2,
2097      doc="""Build an object instance.
2098
2099      The stack before should be thought of as containing a class
2100      object followed by an argument tuple (the tuple being the stack
2101      top).  Call these cls and args.  They are popped off the stack,
2102      and the value returned by cls.__new__(cls, *args) is pushed back
2103      onto the stack.
2104      """),
2105
2106    I(name='NEWOBJ_EX',
2107      code='\x92',
2108      arg=None,
2109      stack_before=[anyobject, anyobject, anyobject],
2110      stack_after=[anyobject],
2111      proto=4,
2112      doc="""Build an object instance.
2113
2114      The stack before should be thought of as containing a class
2115      object followed by an argument tuple and by a keyword argument dict
2116      (the dict being the stack top).  Call these cls and args.  They are
2117      popped off the stack, and the value returned by
2118      cls.__new__(cls, *args, *kwargs) is  pushed back  onto the stack.
2119      """),
2120
2121    # Machine control.
2122
2123    I(name='PROTO',
2124      code='\x80',
2125      arg=uint1,
2126      stack_before=[],
2127      stack_after=[],
2128      proto=2,
2129      doc="""Protocol version indicator.
2130
2131      For protocol 2 and above, a pickle must start with this opcode.
2132      The argument is the protocol version, an int in range(2, 256).
2133      """),
2134
2135    I(name='STOP',
2136      code='.',
2137      arg=None,
2138      stack_before=[anyobject],
2139      stack_after=[],
2140      proto=0,
2141      doc="""Stop the unpickling machine.
2142
2143      Every pickle ends with this opcode.  The object at the top of the stack
2144      is popped, and that's the result of unpickling.  The stack should be
2145      empty then.
2146      """),
2147
2148    # Framing support.
2149
2150    I(name='FRAME',
2151      code='\x95',
2152      arg=uint8,
2153      stack_before=[],
2154      stack_after=[],
2155      proto=4,
2156      doc="""Indicate the beginning of a new frame.
2157
2158      The unpickler may use this opcode to safely prefetch data from its
2159      underlying stream.
2160      """),
2161
2162    # Ways to deal with persistent IDs.
2163
2164    I(name='PERSID',
2165      code='P',
2166      arg=stringnl_noescape,
2167      stack_before=[],
2168      stack_after=[anyobject],
2169      proto=0,
2170      doc="""Push an object identified by a persistent ID.
2171
2172      The pickle module doesn't define what a persistent ID means.  PERSID's
2173      argument is a newline-terminated str-style (no embedded escapes, no
2174      bracketing quote characters) string, which *is* "the persistent ID".
2175      The unpickler passes this string to self.persistent_load().  Whatever
2176      object that returns is pushed on the stack.  There is no implementation
2177      of persistent_load() in Python's unpickler:  it must be supplied by an
2178      unpickler subclass.
2179      """),
2180
2181    I(name='BINPERSID',
2182      code='Q',
2183      arg=None,
2184      stack_before=[anyobject],
2185      stack_after=[anyobject],
2186      proto=1,
2187      doc="""Push an object identified by a persistent ID.
2188
2189      Like PERSID, except the persistent ID is popped off the stack (instead
2190      of being a string embedded in the opcode bytestream).  The persistent
2191      ID is passed to self.persistent_load(), and whatever object that
2192      returns is pushed on the stack.  See PERSID for more detail.
2193      """),
2194]
2195del I
2196
2197# Verify uniqueness of .name and .code members.
2198name2i = {}
2199code2i = {}
2200
2201for i, d in enumerate(opcodes):
2202    if d.name in name2i:
2203        raise ValueError("repeated name %r at indices %d and %d" %
2204                         (d.name, name2i[d.name], i))
2205    if d.code in code2i:
2206        raise ValueError("repeated code %r at indices %d and %d" %
2207                         (d.code, code2i[d.code], i))
2208
2209    name2i[d.name] = i
2210    code2i[d.code] = i
2211
2212del name2i, code2i, i, d
2213
2214##############################################################################
2215# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
2216# Also ensure we've got the same stuff as pickle.py, although the
2217# introspection here is dicey.
2218
2219code2op = {}
2220for d in opcodes:
2221    code2op[d.code] = d
2222del d
2223
2224def assure_pickle_consistency(verbose=False):
2225
2226    copy = code2op.copy()
2227    for name in pickle.__all__:
2228        if not re.match("[A-Z][A-Z0-9_]+$", name):
2229            if verbose:
2230                print("skipping %r: it doesn't look like an opcode name" % name)
2231            continue
2232        picklecode = getattr(pickle, name)
2233        if not isinstance(picklecode, bytes) or len(picklecode) != 1:
2234            if verbose:
2235                print(("skipping %r: value %r doesn't look like a pickle "
2236                       "code" % (name, picklecode)))
2237            continue
2238        picklecode = picklecode.decode("latin-1")
2239        if picklecode in copy:
2240            if verbose:
2241                print("checking name %r w/ code %r for consistency" % (
2242                      name, picklecode))
2243            d = copy[picklecode]
2244            if d.name != name:
2245                raise ValueError("for pickle code %r, pickle.py uses name %r "
2246                                 "but we're using name %r" % (picklecode,
2247                                                              name,
2248                                                              d.name))
2249            # Forget this one.  Any left over in copy at the end are a problem
2250            # of a different kind.
2251            del copy[picklecode]
2252        else:
2253            raise ValueError("pickle.py appears to have a pickle opcode with "
2254                             "name %r and code %r, but we don't" %
2255                             (name, picklecode))
2256    if copy:
2257        msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
2258        for code, d in copy.items():
2259            msg.append("    name %r with code %r" % (d.name, code))
2260        raise ValueError("\n".join(msg))
2261
2262assure_pickle_consistency()
2263del assure_pickle_consistency
2264
2265##############################################################################
2266# A pickle opcode generator.
2267
2268def _genops(data, yield_end_pos=False):
2269    if isinstance(data, bytes_types):
2270        data = io.BytesIO(data)
2271
2272    if hasattr(data, "tell"):
2273        getpos = data.tell
2274    else:
2275        getpos = lambda: None
2276
2277    while True:
2278        pos = getpos()
2279        code = data.read(1)
2280        opcode = code2op.get(code.decode("latin-1"))
2281        if opcode is None:
2282            if code == b"":
2283                raise ValueError("pickle exhausted before seeing STOP")
2284            else:
2285                raise ValueError("at position %s, opcode %r unknown" % (
2286                                 "<unknown>" if pos is None else pos,
2287                                 code))
2288        if opcode.arg is None:
2289            arg = None
2290        else:
2291            arg = opcode.arg.reader(data)
2292        if yield_end_pos:
2293            yield opcode, arg, pos, getpos()
2294        else:
2295            yield opcode, arg, pos
2296        if code == b'.':
2297            assert opcode.name == 'STOP'
2298            break
2299
2300def genops(pickle):
2301    """Generate all the opcodes in a pickle.
2302
2303    'pickle' is a file-like object, or string, containing the pickle.
2304
2305    Each opcode in the pickle is generated, from the current pickle position,
2306    stopping after a STOP opcode is delivered.  A triple is generated for
2307    each opcode:
2308
2309        opcode, arg, pos
2310
2311    opcode is an OpcodeInfo record, describing the current opcode.
2312
2313    If the opcode has an argument embedded in the pickle, arg is its decoded
2314    value, as a Python object.  If the opcode doesn't have an argument, arg
2315    is None.
2316
2317    If the pickle has a tell() method, pos was the value of pickle.tell()
2318    before reading the current opcode.  If the pickle is a bytes object,
2319    it's wrapped in a BytesIO object, and the latter's tell() result is
2320    used.  Else (the pickle doesn't have a tell(), and it's not obvious how
2321    to query its current position) pos is None.
2322    """
2323    return _genops(pickle)
2324
2325##############################################################################
2326# A pickle optimizer.
2327
2328def optimize(p):
2329    'Optimize a pickle string by removing unused PUT opcodes'
2330    put = 'PUT'
2331    get = 'GET'
2332    oldids = set()          # set of all PUT ids
2333    newids = {}             # set of ids used by a GET opcode
2334    opcodes = []            # (op, idx) or (pos, end_pos)
2335    proto = 0
2336    protoheader = b''
2337    for opcode, arg, pos, end_pos in _genops(p, yield_end_pos=True):
2338        if 'PUT' in opcode.name:
2339            oldids.add(arg)
2340            opcodes.append((put, arg))
2341        elif opcode.name == 'MEMOIZE':
2342            idx = len(oldids)
2343            oldids.add(idx)
2344            opcodes.append((put, idx))
2345        elif 'FRAME' in opcode.name:
2346            pass
2347        elif 'GET' in opcode.name:
2348            if opcode.proto > proto:
2349                proto = opcode.proto
2350            newids[arg] = None
2351            opcodes.append((get, arg))
2352        elif opcode.name == 'PROTO':
2353            if arg > proto:
2354                proto = arg
2355            if pos == 0:
2356                protoheader = p[pos:end_pos]
2357            else:
2358                opcodes.append((pos, end_pos))
2359        else:
2360            opcodes.append((pos, end_pos))
2361    del oldids
2362
2363    # Copy the opcodes except for PUTS without a corresponding GET
2364    out = io.BytesIO()
2365    # Write the PROTO header before any framing
2366    out.write(protoheader)
2367    pickler = pickle._Pickler(out, proto)
2368    if proto >= 4:
2369        pickler.framer.start_framing()
2370    idx = 0
2371    for op, arg in opcodes:
2372        frameless = False
2373        if op is put:
2374            if arg not in newids:
2375                continue
2376            data = pickler.put(idx)
2377            newids[arg] = idx
2378            idx += 1
2379        elif op is get:
2380            data = pickler.get(newids[arg])
2381        else:
2382            data = p[op:arg]
2383            frameless = len(data) > pickler.framer._FRAME_SIZE_TARGET
2384        pickler.framer.commit_frame(force=frameless)
2385        if frameless:
2386            pickler.framer.file_write(data)
2387        else:
2388            pickler.write(data)
2389    pickler.framer.end_framing()
2390    return out.getvalue()
2391
2392##############################################################################
2393# A symbolic pickle disassembler.
2394
2395def dis(pickle, out=None, memo=None, indentlevel=4, annotate=0):
2396    """Produce a symbolic disassembly of a pickle.
2397
2398    'pickle' is a file-like object, or string, containing a (at least one)
2399    pickle.  The pickle is disassembled from the current position, through
2400    the first STOP opcode encountered.
2401
2402    Optional arg 'out' is a file-like object to which the disassembly is
2403    printed.  It defaults to sys.stdout.
2404
2405    Optional arg 'memo' is a Python dict, used as the pickle's memo.  It
2406    may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
2407    Passing the same memo object to another dis() call then allows disassembly
2408    to proceed across multiple pickles that were all created by the same
2409    pickler with the same memo.  Ordinarily you don't need to worry about this.
2410
2411    Optional arg 'indentlevel' is the number of blanks by which to indent
2412    a new MARK level.  It defaults to 4.
2413
2414    Optional arg 'annotate' if nonzero instructs dis() to add short
2415    description of the opcode on each line of disassembled output.
2416    The value given to 'annotate' must be an integer and is used as a
2417    hint for the column where annotation should start.  The default
2418    value is 0, meaning no annotations.
2419
2420    In addition to printing the disassembly, some sanity checks are made:
2421
2422    + All embedded opcode arguments "make sense".
2423
2424    + Explicit and implicit pop operations have enough items on the stack.
2425
2426    + When an opcode implicitly refers to a markobject, a markobject is
2427      actually on the stack.
2428
2429    + A memo entry isn't referenced before it's defined.
2430
2431    + The markobject isn't stored in the memo.
2432
2433    + A memo entry isn't redefined.
2434    """
2435
2436    # Most of the hair here is for sanity checks, but most of it is needed
2437    # anyway to detect when a protocol 0 POP takes a MARK off the stack
2438    # (which in turn is needed to indent MARK blocks correctly).
2439
2440    stack = []          # crude emulation of unpickler stack
2441    if memo is None:
2442        memo = {}       # crude emulation of unpickler memo
2443    maxproto = -1       # max protocol number seen
2444    markstack = []      # bytecode positions of MARK opcodes
2445    indentchunk = ' ' * indentlevel
2446    errormsg = None
2447    annocol = annotate  # column hint for annotations
2448    for opcode, arg, pos in genops(pickle):
2449        if pos is not None:
2450            print("%5d:" % pos, end=' ', file=out)
2451
2452        line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
2453                              indentchunk * len(markstack),
2454                              opcode.name)
2455
2456        maxproto = max(maxproto, opcode.proto)
2457        before = opcode.stack_before    # don't mutate
2458        after = opcode.stack_after      # don't mutate
2459        numtopop = len(before)
2460
2461        # See whether a MARK should be popped.
2462        markmsg = None
2463        if markobject in before or (opcode.name == "POP" and
2464                                    stack and
2465                                    stack[-1] is markobject):
2466            assert markobject not in after
2467            if __debug__:
2468                if markobject in before:
2469                    assert before[-1] is stackslice
2470            if markstack:
2471                markpos = markstack.pop()
2472                if markpos is None:
2473                    markmsg = "(MARK at unknown opcode offset)"
2474                else:
2475                    markmsg = "(MARK at %d)" % markpos
2476                # Pop everything at and after the topmost markobject.
2477                while stack[-1] is not markobject:
2478                    stack.pop()
2479                stack.pop()
2480                # Stop later code from popping too much.
2481                try:
2482                    numtopop = before.index(markobject)
2483                except ValueError:
2484                    assert opcode.name == "POP"
2485                    numtopop = 0
2486            else:
2487                errormsg = markmsg = "no MARK exists on stack"
2488
2489        # Check for correct memo usage.
2490        if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT", "MEMOIZE"):
2491            if opcode.name == "MEMOIZE":
2492                memo_idx = len(memo)
2493                markmsg = "(as %d)" % memo_idx
2494            else:
2495                assert arg is not None
2496                memo_idx = arg
2497            if memo_idx in memo:
2498                errormsg = "memo key %r already defined" % arg
2499            elif not stack:
2500                errormsg = "stack is empty -- can't store into memo"
2501            elif stack[-1] is markobject:
2502                errormsg = "can't store markobject in the memo"
2503            else:
2504                memo[memo_idx] = stack[-1]
2505        elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
2506            if arg in memo:
2507                assert len(after) == 1
2508                after = [memo[arg]]     # for better stack emulation
2509            else:
2510                errormsg = "memo key %r has never been stored into" % arg
2511
2512        if arg is not None or markmsg:
2513            # make a mild effort to align arguments
2514            line += ' ' * (10 - len(opcode.name))
2515            if arg is not None:
2516                line += ' ' + repr(arg)
2517            if markmsg:
2518                line += ' ' + markmsg
2519        if annotate:
2520            line += ' ' * (annocol - len(line))
2521            # make a mild effort to align annotations
2522            annocol = len(line)
2523            if annocol > 50:
2524                annocol = annotate
2525            line += ' ' + opcode.doc.split('\n', 1)[0]
2526        print(line, file=out)
2527
2528        if errormsg:
2529            # Note that we delayed complaining until the offending opcode
2530            # was printed.
2531            raise ValueError(errormsg)
2532
2533        # Emulate the stack effects.
2534        if len(stack) < numtopop:
2535            raise ValueError("tries to pop %d items from stack with "
2536                             "only %d items" % (numtopop, len(stack)))
2537        if numtopop:
2538            del stack[-numtopop:]
2539        if markobject in after:
2540            assert markobject not in before
2541            markstack.append(pos)
2542
2543        stack.extend(after)
2544
2545    print("highest protocol among opcodes =", maxproto, file=out)
2546    if stack:
2547        raise ValueError("stack not empty after STOP: %r" % stack)
2548
2549# For use in the doctest, simply as an example of a class to pickle.
2550class _Example:
2551    def __init__(self, value):
2552        self.value = value
2553
2554_dis_test = r"""
2555>>> import pickle
2556>>> x = [1, 2, (3, 4), {b'abc': "def"}]
2557>>> pkl0 = pickle.dumps(x, 0)
2558>>> dis(pkl0)
2559    0: (    MARK
2560    1: l        LIST       (MARK at 0)
2561    2: p    PUT        0
2562    5: I    INT        1
2563    8: a    APPEND
2564    9: I    INT        2
2565   12: a    APPEND
2566   13: (    MARK
2567   14: I        INT        3
2568   17: I        INT        4
2569   20: t        TUPLE      (MARK at 13)
2570   21: p    PUT        1
2571   24: a    APPEND
2572   25: (    MARK
2573   26: d        DICT       (MARK at 25)
2574   27: p    PUT        2
2575   30: c    GLOBAL     '_codecs encode'
2576   46: p    PUT        3
2577   49: (    MARK
2578   50: V        UNICODE    'abc'
2579   55: p        PUT        4
2580   58: V        UNICODE    'latin1'
2581   66: p        PUT        5
2582   69: t        TUPLE      (MARK at 49)
2583   70: p    PUT        6
2584   73: R    REDUCE
2585   74: p    PUT        7
2586   77: V    UNICODE    'def'
2587   82: p    PUT        8
2588   85: s    SETITEM
2589   86: a    APPEND
2590   87: .    STOP
2591highest protocol among opcodes = 0
2592
2593Try again with a "binary" pickle.
2594
2595>>> pkl1 = pickle.dumps(x, 1)
2596>>> dis(pkl1)
2597    0: ]    EMPTY_LIST
2598    1: q    BINPUT     0
2599    3: (    MARK
2600    4: K        BININT1    1
2601    6: K        BININT1    2
2602    8: (        MARK
2603    9: K            BININT1    3
2604   11: K            BININT1    4
2605   13: t            TUPLE      (MARK at 8)
2606   14: q        BINPUT     1
2607   16: }        EMPTY_DICT
2608   17: q        BINPUT     2
2609   19: c        GLOBAL     '_codecs encode'
2610   35: q        BINPUT     3
2611   37: (        MARK
2612   38: X            BINUNICODE 'abc'
2613   46: q            BINPUT     4
2614   48: X            BINUNICODE 'latin1'
2615   59: q            BINPUT     5
2616   61: t            TUPLE      (MARK at 37)
2617   62: q        BINPUT     6
2618   64: R        REDUCE
2619   65: q        BINPUT     7
2620   67: X        BINUNICODE 'def'
2621   75: q        BINPUT     8
2622   77: s        SETITEM
2623   78: e        APPENDS    (MARK at 3)
2624   79: .    STOP
2625highest protocol among opcodes = 1
2626
2627Exercise the INST/OBJ/BUILD family.
2628
2629>>> import pickletools
2630>>> dis(pickle.dumps(pickletools.dis, 0))
2631    0: c    GLOBAL     'pickletools dis'
2632   17: p    PUT        0
2633   20: .    STOP
2634highest protocol among opcodes = 0
2635
2636>>> from pickletools import _Example
2637>>> x = [_Example(42)] * 2
2638>>> dis(pickle.dumps(x, 0))
2639    0: (    MARK
2640    1: l        LIST       (MARK at 0)
2641    2: p    PUT        0
2642    5: c    GLOBAL     'copy_reg _reconstructor'
2643   30: p    PUT        1
2644   33: (    MARK
2645   34: c        GLOBAL     'pickletools _Example'
2646   56: p        PUT        2
2647   59: c        GLOBAL     '__builtin__ object'
2648   79: p        PUT        3
2649   82: N        NONE
2650   83: t        TUPLE      (MARK at 33)
2651   84: p    PUT        4
2652   87: R    REDUCE
2653   88: p    PUT        5
2654   91: (    MARK
2655   92: d        DICT       (MARK at 91)
2656   93: p    PUT        6
2657   96: V    UNICODE    'value'
2658  103: p    PUT        7
2659  106: I    INT        42
2660  110: s    SETITEM
2661  111: b    BUILD
2662  112: a    APPEND
2663  113: g    GET        5
2664  116: a    APPEND
2665  117: .    STOP
2666highest protocol among opcodes = 0
2667
2668>>> dis(pickle.dumps(x, 1))
2669    0: ]    EMPTY_LIST
2670    1: q    BINPUT     0
2671    3: (    MARK
2672    4: c        GLOBAL     'copy_reg _reconstructor'
2673   29: q        BINPUT     1
2674   31: (        MARK
2675   32: c            GLOBAL     'pickletools _Example'
2676   54: q            BINPUT     2
2677   56: c            GLOBAL     '__builtin__ object'
2678   76: q            BINPUT     3
2679   78: N            NONE
2680   79: t            TUPLE      (MARK at 31)
2681   80: q        BINPUT     4
2682   82: R        REDUCE
2683   83: q        BINPUT     5
2684   85: }        EMPTY_DICT
2685   86: q        BINPUT     6
2686   88: X        BINUNICODE 'value'
2687   98: q        BINPUT     7
2688  100: K        BININT1    42
2689  102: s        SETITEM
2690  103: b        BUILD
2691  104: h        BINGET     5
2692  106: e        APPENDS    (MARK at 3)
2693  107: .    STOP
2694highest protocol among opcodes = 1
2695
2696Try "the canonical" recursive-object test.
2697
2698>>> L = []
2699>>> T = L,
2700>>> L.append(T)
2701>>> L[0] is T
2702True
2703>>> T[0] is L
2704True
2705>>> L[0][0] is L
2706True
2707>>> T[0][0] is T
2708True
2709>>> dis(pickle.dumps(L, 0))
2710    0: (    MARK
2711    1: l        LIST       (MARK at 0)
2712    2: p    PUT        0
2713    5: (    MARK
2714    6: g        GET        0
2715    9: t        TUPLE      (MARK at 5)
2716   10: p    PUT        1
2717   13: a    APPEND
2718   14: .    STOP
2719highest protocol among opcodes = 0
2720
2721>>> dis(pickle.dumps(L, 1))
2722    0: ]    EMPTY_LIST
2723    1: q    BINPUT     0
2724    3: (    MARK
2725    4: h        BINGET     0
2726    6: t        TUPLE      (MARK at 3)
2727    7: q    BINPUT     1
2728    9: a    APPEND
2729   10: .    STOP
2730highest protocol among opcodes = 1
2731
2732Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
2733has to emulate the stack in order to realize that the POP opcode at 16 gets
2734rid of the MARK at 0.
2735
2736>>> dis(pickle.dumps(T, 0))
2737    0: (    MARK
2738    1: (        MARK
2739    2: l            LIST       (MARK at 1)
2740    3: p        PUT        0
2741    6: (        MARK
2742    7: g            GET        0
2743   10: t            TUPLE      (MARK at 6)
2744   11: p        PUT        1
2745   14: a        APPEND
2746   15: 0        POP
2747   16: 0        POP        (MARK at 0)
2748   17: g    GET        1
2749   20: .    STOP
2750highest protocol among opcodes = 0
2751
2752>>> dis(pickle.dumps(T, 1))
2753    0: (    MARK
2754    1: ]        EMPTY_LIST
2755    2: q        BINPUT     0
2756    4: (        MARK
2757    5: h            BINGET     0
2758    7: t            TUPLE      (MARK at 4)
2759    8: q        BINPUT     1
2760   10: a        APPEND
2761   11: 1        POP_MARK   (MARK at 0)
2762   12: h    BINGET     1
2763   14: .    STOP
2764highest protocol among opcodes = 1
2765
2766Try protocol 2.
2767
2768>>> dis(pickle.dumps(L, 2))
2769    0: \x80 PROTO      2
2770    2: ]    EMPTY_LIST
2771    3: q    BINPUT     0
2772    5: h    BINGET     0
2773    7: \x85 TUPLE1
2774    8: q    BINPUT     1
2775   10: a    APPEND
2776   11: .    STOP
2777highest protocol among opcodes = 2
2778
2779>>> dis(pickle.dumps(T, 2))
2780    0: \x80 PROTO      2
2781    2: ]    EMPTY_LIST
2782    3: q    BINPUT     0
2783    5: h    BINGET     0
2784    7: \x85 TUPLE1
2785    8: q    BINPUT     1
2786   10: a    APPEND
2787   11: 0    POP
2788   12: h    BINGET     1
2789   14: .    STOP
2790highest protocol among opcodes = 2
2791
2792Try protocol 3 with annotations:
2793
2794>>> dis(pickle.dumps(T, 3), annotate=1)
2795    0: \x80 PROTO      3 Protocol version indicator.
2796    2: ]    EMPTY_LIST   Push an empty list.
2797    3: q    BINPUT     0 Store the stack top into the memo.  The stack is not popped.
2798    5: h    BINGET     0 Read an object from the memo and push it on the stack.
2799    7: \x85 TUPLE1       Build a one-tuple out of the topmost item on the stack.
2800    8: q    BINPUT     1 Store the stack top into the memo.  The stack is not popped.
2801   10: a    APPEND       Append an object to a list.
2802   11: 0    POP          Discard the top stack item, shrinking the stack by one item.
2803   12: h    BINGET     1 Read an object from the memo and push it on the stack.
2804   14: .    STOP         Stop the unpickling machine.
2805highest protocol among opcodes = 2
2806
2807"""
2808
2809_memo_test = r"""
2810>>> import pickle
2811>>> import io
2812>>> f = io.BytesIO()
2813>>> p = pickle.Pickler(f, 2)
2814>>> x = [1, 2, 3]
2815>>> p.dump(x)
2816>>> p.dump(x)
2817>>> f.seek(0)
28180
2819>>> memo = {}
2820>>> dis(f, memo=memo)
2821    0: \x80 PROTO      2
2822    2: ]    EMPTY_LIST
2823    3: q    BINPUT     0
2824    5: (    MARK
2825    6: K        BININT1    1
2826    8: K        BININT1    2
2827   10: K        BININT1    3
2828   12: e        APPENDS    (MARK at 5)
2829   13: .    STOP
2830highest protocol among opcodes = 2
2831>>> dis(f, memo=memo)
2832   14: \x80 PROTO      2
2833   16: h    BINGET     0
2834   18: .    STOP
2835highest protocol among opcodes = 2
2836"""
2837
2838__test__ = {'disassembler_test': _dis_test,
2839            'disassembler_memo_test': _memo_test,
2840           }
2841
2842def _test():
2843    import doctest
2844    return doctest.testmod()
2845
2846if __name__ == "__main__":
2847    import argparse
2848    parser = argparse.ArgumentParser(
2849        description='disassemble one or more pickle files')
2850    parser.add_argument(
2851        'pickle_file', type=argparse.FileType('br'),
2852        nargs='*', help='the pickle file')
2853    parser.add_argument(
2854        '-o', '--output', default=sys.stdout, type=argparse.FileType('w'),
2855        help='the file where the output should be written')
2856    parser.add_argument(
2857        '-m', '--memo', action='store_true',
2858        help='preserve memo between disassemblies')
2859    parser.add_argument(
2860        '-l', '--indentlevel', default=4, type=int,
2861        help='the number of blanks by which to indent a new MARK level')
2862    parser.add_argument(
2863        '-a', '--annotate',  action='store_true',
2864        help='annotate each line with a short opcode description')
2865    parser.add_argument(
2866        '-p', '--preamble', default="==> {name} <==",
2867        help='if more than one pickle file is specified, print this before'
2868        ' each disassembly')
2869    parser.add_argument(
2870        '-t', '--test', action='store_true',
2871        help='run self-test suite')
2872    parser.add_argument(
2873        '-v', action='store_true',
2874        help='run verbosely; only affects self-test run')
2875    args = parser.parse_args()
2876    if args.test:
2877        _test()
2878    else:
2879        annotate = 30 if args.annotate else 0
2880        if not args.pickle_file:
2881            parser.print_help()
2882        elif len(args.pickle_file) == 1:
2883            dis(args.pickle_file[0], args.output, None,
2884                args.indentlevel, annotate)
2885        else:
2886            memo = {} if args.memo else None
2887            for f in args.pickle_file:
2888                preamble = args.preamble.format(name=f.name)
2889                args.output.write(preamble + '\n')
2890                dis(f, args.output, memo, args.indentlevel, annotate)
2891