PyXR

c:\python24\lib \ pickletools.py


0001 '''"Executable documentation" for the pickle module.
0002 
0003 Extensive comments about the pickle protocols and pickle-machine opcodes
0004 can be found here.  Some functions meant for external use:
0005 
0006 genops(pickle)
0007    Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
0008 
0009 dis(pickle, out=None, memo=None, indentlevel=4)
0010    Print a symbolic disassembly of a pickle.
0011 '''
0012 
0013 # Other ideas:
0014 #
0015 # - A pickle verifier:  read a pickle and check it exhaustively for
0016 #   well-formedness.  dis() does a lot of this already.
0017 #
0018 # - A protocol identifier:  examine a pickle and return its protocol number
0019 #   (== the highest .proto attr value among all the opcodes in the pickle).
0020 #   dis() already prints this info at the end.
0021 #
0022 # - A pickle optimizer:  for example, tuple-building code is sometimes more
0023 #   elaborate than necessary, catering for the possibility that the tuple
0024 #   is recursive.  Or lots of times a PUT is generated that's never accessed
0025 #   by a later GET.
0026 
0027 
0028 """
0029 "A pickle" is a program for a virtual pickle machine (PM, but more accurately
0030 called an unpickling machine).  It's a sequence of opcodes, interpreted by the
0031 PM, building an arbitrarily complex Python object.
0032 
0033 For the most part, the PM is very simple:  there are no looping, testing, or
0034 conditional instructions, no arithmetic and no function calls.  Opcodes are
0035 executed once each, from first to last, until a STOP opcode is reached.
0036 
0037 The PM has two data areas, "the stack" and "the memo".
0038 
0039 Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
0040 integer object on the stack, whose value is gotten from a decimal string
0041 literal immediately following the INT opcode in the pickle bytestream.  Other
0042 opcodes take Python objects off the stack.  The result of unpickling is
0043 whatever object is left on the stack when the final STOP opcode is executed.
0044 
0045 The memo is simply an array of objects, or it can be implemented as a dict
0046 mapping little integers to objects.  The memo serves as the PM's "long term
0047 memory", and the little integers indexing the memo are akin to variable
0048 names.  Some opcodes pop a stack object into the memo at a given index,
0049 and others push a memo object at a given index onto the stack again.
0050 
0051 At heart, that's all the PM has.  Subtleties arise for these reasons:
0052 
0053 + Object identity.  Objects can be arbitrarily complex, and subobjects
0054   may be shared (for example, the list [a, a] refers to the same object a
0055   twice).  It can be vital that unpickling recreate an isomorphic object
0056   graph, faithfully reproducing sharing.
0057 
0058 + Recursive objects.  For example, after "L = []; L.append(L)", L is a
0059   list, and L[0] is the same list.  This is related to the object identity
0060   point, and some sequences of pickle opcodes are subtle in order to
0061   get the right result in all cases.
0062 
0063 + Things pickle doesn't know everything about.  Examples of things pickle
0064   does know everything about are Python's builtin scalar and container
0065   types, like ints and tuples.  They generally have opcodes dedicated to
0066   them.  For things like module references and instances of user-defined
0067   classes, pickle's knowledge is limited.  Historically, many enhancements
0068   have been made to the pickle protocol in order to do a better (faster,
0069   and/or more compact) job on those.
0070 
0071 + Backward compatibility and micro-optimization.  As explained below,
0072   pickle opcodes never go away, not even when better ways to do a thing
0073   get invented.  The repertoire of the PM just keeps growing over time.
0074   For example, protocol 0 had two opcodes for building Python integers (INT
0075   and LONG), protocol 1 added three more for more-efficient pickling of short
0076   integers, and protocol 2 added two more for more-efficient pickling of
0077   long integers (before protocol 2, the only ways to pickle a Python long
0078   took time quadratic in the number of digits, for both pickling and
0079   unpickling).  "Opcode bloat" isn't so much a subtlety as a source of
0080   wearying complication.
0081 
0082 
0083 Pickle protocols:
0084 
0085 For compatibility, the meaning of a pickle opcode never changes.  Instead new
0086 pickle opcodes get added, and each version's unpickler can handle all the
0087 pickle opcodes in all protocol versions to date.  So old pickles continue to
0088 be readable forever.  The pickler can generally be told to restrict itself to
0089 the subset of opcodes available under previous protocol versions too, so that
0090 users can create pickles under the current version readable by older
0091 versions.  However, a pickle does not contain its version number embedded
0092 within it.  If an older unpickler tries to read a pickle using a later
0093 protocol, the result is most likely an exception due to seeing an unknown (in
0094 the older unpickler) opcode.
0095 
0096 The original pickle used what's now called "protocol 0", and what was called
0097 "text mode" before Python 2.3.  The entire pickle bytestream is made up of
0098 printable 7-bit ASCII characters, plus the newline character, in protocol 0.
0099 That's why it was called text mode.  Protocol 0 is small and elegant, but
0100 sometimes painfully inefficient.
0101 
0102 The second major set of additions is now called "protocol 1", and was called
0103 "binary mode" before Python 2.3.  This added many opcodes with arguments
0104 consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
0105 bytes.  Binary mode pickles can be substantially smaller than equivalent
0106 text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
0107 int as 4 bytes following the opcode, which is cheaper to unpickle than the
0108 (perhaps) 11-character decimal string attached to INT.  Protocol 1 also added
0109 a number of opcodes that operate on many stack elements at once (like APPENDS
0110 and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
0111 
0112 The third major set of additions came in Python 2.3, and is called "protocol
0113 2".  This added:
0114 
0115 - A better way to pickle instances of new-style classes (NEWOBJ).
0116 
0117 - A way for a pickle to identify its protocol (PROTO).
0118 
0119 - Time- and space- efficient pickling of long ints (LONG{1,4}).
0120 
0121 - Shortcuts for small tuples (TUPLE{1,2,3}}.
0122 
0123 - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
0124 
0125 - The "extension registry", a vector of popular objects that can be pushed
0126   efficiently by index (EXT{1,2,4}).  This is akin to the memo and GET, but
0127   the registry contents are predefined (there's nothing akin to the memo's
0128   PUT).
0129 
0130 Another independent change with Python 2.3 is the abandonment of any
0131 pretense that it might be safe to load pickles received from untrusted
0132 parties -- no sufficient security analysis has been done to guarantee
0133 this and there isn't a use case that warrants the expense of such an
0134 analysis.
0135 
0136 To this end, all tests for __safe_for_unpickling__ or for
0137 copy_reg.safe_constructors are removed from the unpickling code.
0138 References to these variables in the descriptions below are to be seen
0139 as describing unpickling in Python 2.2 and before.
0140 """
0141 
0142 # Meta-rule:  Descriptions are stored in instances of descriptor objects,
0143 # with plain constructors.  No meta-language is defined from which
0144 # descriptors could be constructed.  If you want, e.g., XML, write a little
0145 # program to generate XML from the objects.
0146 
0147 ##############################################################################
0148 # Some pickle opcodes have an argument, following the opcode in the
0149 # bytestream.  An argument is of a specific type, described by an instance
0150 # of ArgumentDescriptor.  These are not to be confused with arguments taken
0151 # off the stack -- ArgumentDescriptor applies only to arguments embedded in
0152 # the opcode stream, immediately following an opcode.
0153 
0154 # Represents the number of bytes consumed by an argument delimited by the
0155 # next newline character.
0156 UP_TO_NEWLINE = -1
0157 
0158 # Represents the number of bytes consumed by a two-argument opcode where
0159 # the first argument gives the number of bytes in the second argument.
0160 TAKEN_FROM_ARGUMENT1 = -2   # num bytes is 1-byte unsigned int
0161 TAKEN_FROM_ARGUMENT4 = -3   # num bytes is 4-byte signed little-endian int
0162 
0163 class ArgumentDescriptor(object):
0164     __slots__ = (
0165         # name of descriptor record, also a module global name; a string
0166         'name',
0167 
0168         # length of argument, in bytes; an int; UP_TO_NEWLINE and
0169         # TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length
0170         # cases
0171         'n',
0172 
0173         # a function taking a file-like object, reading this kind of argument
0174         # from the object at the current position, advancing the current
0175         # position by n bytes, and returning the value of the argument
0176         'reader',
0177 
0178         # human-readable docs for this arg descriptor; a string
0179         'doc',
0180     )
0181 
0182     def __init__(self, name, n, reader, doc):
0183         assert isinstance(name, str)
0184         self.name = name
0185 
0186         assert isinstance(n, int) and (n >= 0 or
0187                                        n in (UP_TO_NEWLINE,
0188                                              TAKEN_FROM_ARGUMENT1,
0189                                              TAKEN_FROM_ARGUMENT4))
0190         self.n = n
0191 
0192         self.reader = reader
0193 
0194         assert isinstance(doc, str)
0195         self.doc = doc
0196 
0197 from struct import unpack as _unpack
0198 
0199 def read_uint1(f):
0200     r"""
0201     >>> import StringIO
0202     >>> read_uint1(StringIO.StringIO('\xff'))
0203     255
0204     """
0205 
0206     data = f.read(1)
0207     if data:
0208         return ord(data)
0209     raise ValueError("not enough data in stream to read uint1")
0210 
0211 uint1 = ArgumentDescriptor(
0212             name='uint1',
0213             n=1,
0214             reader=read_uint1,
0215             doc="One-byte unsigned integer.")
0216 
0217 
0218 def read_uint2(f):
0219     r"""
0220     >>> import StringIO
0221     >>> read_uint2(StringIO.StringIO('\xff\x00'))
0222     255
0223     >>> read_uint2(StringIO.StringIO('\xff\xff'))
0224     65535
0225     """
0226 
0227     data = f.read(2)
0228     if len(data) == 2:
0229         return _unpack("<H", data)[0]
0230     raise ValueError("not enough data in stream to read uint2")
0231 
0232 uint2 = ArgumentDescriptor(
0233             name='uint2',
0234             n=2,
0235             reader=read_uint2,
0236             doc="Two-byte unsigned integer, little-endian.")
0237 
0238 
0239 def read_int4(f):
0240     r"""
0241     >>> import StringIO
0242     >>> read_int4(StringIO.StringIO('\xff\x00\x00\x00'))
0243     255
0244     >>> read_int4(StringIO.StringIO('\x00\x00\x00\x80')) == -(2**31)
0245     True
0246     """
0247 
0248     data = f.read(4)
0249     if len(data) == 4:
0250         return _unpack("<i", data)[0]
0251     raise ValueError("not enough data in stream to read int4")
0252 
0253 int4 = ArgumentDescriptor(
0254            name='int4',
0255            n=4,
0256            reader=read_int4,
0257            doc="Four-byte signed integer, little-endian, 2's complement.")
0258 
0259 
0260 def read_stringnl(f, decode=True, stripquotes=True):
0261     r"""
0262     >>> import StringIO
0263     >>> read_stringnl(StringIO.StringIO("'abcd'\nefg\n"))
0264     'abcd'
0265 
0266     >>> read_stringnl(StringIO.StringIO("\n"))
0267     Traceback (most recent call last):
0268     ...
0269     ValueError: no string quotes around ''
0270 
0271     >>> read_stringnl(StringIO.StringIO("\n"), stripquotes=False)
0272     ''
0273 
0274     >>> read_stringnl(StringIO.StringIO("''\n"))
0275     ''
0276 
0277     >>> read_stringnl(StringIO.StringIO('"abcd"'))
0278     Traceback (most recent call last):
0279     ...
0280     ValueError: no newline found when trying to read stringnl
0281 
0282     Embedded escapes are undone in the result.
0283     >>> read_stringnl(StringIO.StringIO(r"'a\n\\b\x00c\td'" + "\n'e'"))
0284     'a\n\\b\x00c\td'
0285     """
0286 
0287     data = f.readline()
0288     if not data.endswith('\n'):
0289         raise ValueError("no newline found when trying to read stringnl")
0290     data = data[:-1]    # lose the newline
0291 
0292     if stripquotes:
0293         for q in "'\"":
0294             if data.startswith(q):
0295                 if not data.endswith(q):
0296                     raise ValueError("strinq quote %r not found at both "
0297                                      "ends of %r" % (q, data))
0298                 data = data[1:-1]
0299                 break
0300         else:
0301             raise ValueError("no string quotes around %r" % data)
0302 
0303     # I'm not sure when 'string_escape' was added to the std codecs; it's
0304     # crazy not to use it if it's there.
0305     if decode:
0306         data = data.decode('string_escape')
0307     return data
0308 
0309 stringnl = ArgumentDescriptor(
0310                name='stringnl',
0311                n=UP_TO_NEWLINE,
0312                reader=read_stringnl,
0313                doc="""A newline-terminated string.
0314 
0315                    This is a repr-style string, with embedded escapes, and
0316                    bracketing quotes.
0317                    """)
0318 
0319 def read_stringnl_noescape(f):
0320     return read_stringnl(f, decode=False, stripquotes=False)
0321 
0322 stringnl_noescape = ArgumentDescriptor(
0323                         name='stringnl_noescape',
0324                         n=UP_TO_NEWLINE,
0325                         reader=read_stringnl_noescape,
0326                         doc="""A newline-terminated string.
0327 
0328                         This is a str-style string, without embedded escapes,
0329                         or bracketing quotes.  It should consist solely of
0330                         printable ASCII characters.
0331                         """)
0332 
0333 def read_stringnl_noescape_pair(f):
0334     r"""
0335     >>> import StringIO
0336     >>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\nEmpty\njunk"))
0337     'Queue Empty'
0338     """
0339 
0340     return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
0341 
0342 stringnl_noescape_pair = ArgumentDescriptor(
0343                              name='stringnl_noescape_pair',
0344                              n=UP_TO_NEWLINE,
0345                              reader=read_stringnl_noescape_pair,
0346                              doc="""A pair of newline-terminated strings.
0347 
0348                              These are str-style strings, without embedded
0349                              escapes, or bracketing quotes.  They should
0350                              consist solely of printable ASCII characters.
0351                              The pair is returned as a single string, with
0352                              a single blank separating the two strings.
0353                              """)
0354 
0355 def read_string4(f):
0356     r"""
0357     >>> import StringIO
0358     >>> read_string4(StringIO.StringIO("\x00\x00\x00\x00abc"))
0359     ''
0360     >>> read_string4(StringIO.StringIO("\x03\x00\x00\x00abcdef"))
0361     'abc'
0362     >>> read_string4(StringIO.StringIO("\x00\x00\x00\x03abcdef"))
0363     Traceback (most recent call last):
0364     ...
0365     ValueError: expected 50331648 bytes in a string4, but only 6 remain
0366     """
0367 
0368     n = read_int4(f)
0369     if n < 0:
0370         raise ValueError("string4 byte count < 0: %d" % n)
0371     data = f.read(n)
0372     if len(data) == n:
0373         return data
0374     raise ValueError("expected %d bytes in a string4, but only %d remain" %
0375                      (n, len(data)))
0376 
0377 string4 = ArgumentDescriptor(
0378               name="string4",
0379               n=TAKEN_FROM_ARGUMENT4,
0380               reader=read_string4,
0381               doc="""A counted string.
0382 
0383               The first argument is a 4-byte little-endian signed int giving
0384               the number of bytes in the string, and the second argument is
0385               that many bytes.
0386               """)
0387 
0388 
0389 def read_string1(f):
0390     r"""
0391     >>> import StringIO
0392     >>> read_string1(StringIO.StringIO("\x00"))
0393     ''
0394     >>> read_string1(StringIO.StringIO("\x03abcdef"))
0395     'abc'
0396     """
0397 
0398     n = read_uint1(f)
0399     assert n >= 0
0400     data = f.read(n)
0401     if len(data) == n:
0402         return data
0403     raise ValueError("expected %d bytes in a string1, but only %d remain" %
0404                      (n, len(data)))
0405 
0406 string1 = ArgumentDescriptor(
0407               name="string1",
0408               n=TAKEN_FROM_ARGUMENT1,
0409               reader=read_string1,
0410               doc="""A counted string.
0411 
0412               The first argument is a 1-byte unsigned int giving the number
0413               of bytes in the string, and the second argument is that many
0414               bytes.
0415               """)
0416 
0417 
0418 def read_unicodestringnl(f):
0419     r"""
0420     >>> import StringIO
0421     >>> read_unicodestringnl(StringIO.StringIO("abc\uabcd\njunk"))
0422     u'abc\uabcd'
0423     """
0424 
0425     data = f.readline()
0426     if not data.endswith('\n'):
0427         raise ValueError("no newline found when trying to read "
0428                          "unicodestringnl")
0429     data = data[:-1]    # lose the newline
0430     return unicode(data, 'raw-unicode-escape')
0431 
0432 unicodestringnl = ArgumentDescriptor(
0433                       name='unicodestringnl',
0434                       n=UP_TO_NEWLINE,
0435                       reader=read_unicodestringnl,
0436                       doc="""A newline-terminated Unicode string.
0437 
0438                       This is raw-unicode-escape encoded, so consists of
0439                       printable ASCII characters, and may contain embedded
0440                       escape sequences.
0441                       """)
0442 
0443 def read_unicodestring4(f):
0444     r"""
0445     >>> import StringIO
0446     >>> s = u'abcd\uabcd'
0447     >>> enc = s.encode('utf-8')
0448     >>> enc
0449     'abcd\xea\xaf\x8d'
0450     >>> n = chr(len(enc)) + chr(0) * 3  # little-endian 4-byte length
0451     >>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
0452     >>> s == t
0453     True
0454 
0455     >>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
0456     Traceback (most recent call last):
0457     ...
0458     ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
0459     """
0460 
0461     n = read_int4(f)
0462     if n < 0:
0463         raise ValueError("unicodestring4 byte count < 0: %d" % n)
0464     data = f.read(n)
0465     if len(data) == n:
0466         return unicode(data, 'utf-8')
0467     raise ValueError("expected %d bytes in a unicodestring4, but only %d "
0468                      "remain" % (n, len(data)))
0469 
0470 unicodestring4 = ArgumentDescriptor(
0471                     name="unicodestring4",
0472                     n=TAKEN_FROM_ARGUMENT4,
0473                     reader=read_unicodestring4,
0474                     doc="""A counted Unicode string.
0475 
0476                     The first argument is a 4-byte little-endian signed int
0477                     giving the number of bytes in the string, and the second
0478                     argument-- the UTF-8 encoding of the Unicode string --
0479                     contains that many bytes.
0480                     """)
0481 
0482 
0483 def read_decimalnl_short(f):
0484     r"""
0485     >>> import StringIO
0486     >>> read_decimalnl_short(StringIO.StringIO("1234\n56"))
0487     1234
0488 
0489     >>> read_decimalnl_short(StringIO.StringIO("1234L\n56"))
0490     Traceback (most recent call last):
0491     ...
0492     ValueError: trailing 'L' not allowed in '1234L'
0493     """
0494 
0495     s = read_stringnl(f, decode=False, stripquotes=False)
0496     if s.endswith("L"):
0497         raise ValueError("trailing 'L' not allowed in %r" % s)
0498 
0499     # It's not necessarily true that the result fits in a Python short int:
0500     # the pickle may have been written on a 64-bit box.  There's also a hack
0501     # for True and False here.
0502     if s == "00":
0503         return False
0504     elif s == "01":
0505         return True
0506 
0507     try:
0508         return int(s)
0509     except OverflowError:
0510         return long(s)
0511 
0512 def read_decimalnl_long(f):
0513     r"""
0514     >>> import StringIO
0515 
0516     >>> read_decimalnl_long(StringIO.StringIO("1234\n56"))
0517     Traceback (most recent call last):
0518     ...
0519     ValueError: trailing 'L' required in '1234'
0520 
0521     Someday the trailing 'L' will probably go away from this output.
0522 
0523     >>> read_decimalnl_long(StringIO.StringIO("1234L\n56"))
0524     1234L
0525 
0526     >>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\n6"))
0527     123456789012345678901234L
0528     """
0529 
0530     s = read_stringnl(f, decode=False, stripquotes=False)
0531     if not s.endswith("L"):
0532         raise ValueError("trailing 'L' required in %r" % s)
0533     return long(s)
0534 
0535 
0536 decimalnl_short = ArgumentDescriptor(
0537                       name='decimalnl_short',
0538                       n=UP_TO_NEWLINE,
0539                       reader=read_decimalnl_short,
0540                       doc="""A newline-terminated decimal integer literal.
0541 
0542                           This never has a trailing 'L', and the integer fit
0543                           in a short Python int on the box where the pickle
0544                           was written -- but there's no guarantee it will fit
0545                           in a short Python int on the box where the pickle
0546                           is read.
0547                           """)
0548 
0549 decimalnl_long = ArgumentDescriptor(
0550                      name='decimalnl_long',
0551                      n=UP_TO_NEWLINE,
0552                      reader=read_decimalnl_long,
0553                      doc="""A newline-terminated decimal integer literal.
0554 
0555                          This has a trailing 'L', and can represent integers
0556                          of any size.
0557                          """)
0558 
0559 
0560 def read_floatnl(f):
0561     r"""
0562     >>> import StringIO
0563     >>> read_floatnl(StringIO.StringIO("-1.25\n6"))
0564     -1.25
0565     """
0566     s = read_stringnl(f, decode=False, stripquotes=False)
0567     return float(s)
0568 
0569 floatnl = ArgumentDescriptor(
0570               name='floatnl',
0571               n=UP_TO_NEWLINE,
0572               reader=read_floatnl,
0573               doc="""A newline-terminated decimal floating literal.
0574 
0575               In general this requires 17 significant digits for roundtrip
0576               identity, and pickling then unpickling infinities, NaNs, and
0577               minus zero doesn't work across boxes, or on some boxes even
0578               on itself (e.g., Windows can't read the strings it produces
0579               for infinities or NaNs).
0580               """)
0581 
0582 def read_float8(f):
0583     r"""
0584     >>> import StringIO, struct
0585     >>> raw = struct.pack(">d", -1.25)
0586     >>> raw
0587     '\xbf\xf4\x00\x00\x00\x00\x00\x00'
0588     >>> read_float8(StringIO.StringIO(raw + "\n"))
0589     -1.25
0590     """
0591 
0592     data = f.read(8)
0593     if len(data) == 8:
0594         return _unpack(">d", data)[0]
0595     raise ValueError("not enough data in stream to read float8")
0596 
0597 
0598 float8 = ArgumentDescriptor(
0599              name='float8',
0600              n=8,
0601              reader=read_float8,
0602              doc="""An 8-byte binary representation of a float, big-endian.
0603 
0604              The format is unique to Python, and shared with the struct
0605              module (format string '>d') "in theory" (the struct and cPickle
0606              implementations don't share the code -- they should).  It's
0607              strongly related to the IEEE-754 double format, and, in normal
0608              cases, is in fact identical to the big-endian 754 double format.
0609              On other boxes the dynamic range is limited to that of a 754
0610              double, and "add a half and chop" rounding is used to reduce
0611              the precision to 53 bits.  However, even on a 754 box,
0612              infinities, NaNs, and minus zero may not be handled correctly
0613              (may not survive roundtrip pickling intact).
0614              """)
0615 
0616 # Protocol 2 formats
0617 
0618 from pickle import decode_long
0619 
0620 def read_long1(f):
0621     r"""
0622     >>> import StringIO
0623     >>> read_long1(StringIO.StringIO("\x00"))
0624     0L
0625     >>> read_long1(StringIO.StringIO("\x02\xff\x00"))
0626     255L
0627     >>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
0628     32767L
0629     >>> read_long1(StringIO.StringIO("\x02\x00\xff"))
0630     -256L
0631     >>> read_long1(StringIO.StringIO("\x02\x00\x80"))
0632     -32768L
0633     """
0634 
0635     n = read_uint1(f)
0636     data = f.read(n)
0637     if len(data) != n:
0638         raise ValueError("not enough data in stream to read long1")
0639     return decode_long(data)
0640 
0641 long1 = ArgumentDescriptor(
0642     name="long1",
0643     n=TAKEN_FROM_ARGUMENT1,
0644     reader=read_long1,
0645     doc="""A binary long, little-endian, using 1-byte size.
0646 
0647     This first reads one byte as an unsigned size, then reads that
0648     many bytes and interprets them as a little-endian 2's-complement long.
0649     If the size is 0, that's taken as a shortcut for the long 0L.
0650     """)
0651 
0652 def read_long4(f):
0653     r"""
0654     >>> import StringIO
0655     >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
0656     255L
0657     >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
0658     32767L
0659     >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
0660     -256L
0661     >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
0662     -32768L
0663     >>> read_long1(StringIO.StringIO("\x00\x00\x00\x00"))
0664     0L
0665     """
0666 
0667     n = read_int4(f)
0668     if n < 0:
0669         raise ValueError("long4 byte count < 0: %d" % n)
0670     data = f.read(n)
0671     if len(data) != n:
0672         raise ValueError("not enough data in stream to read long4")
0673     return decode_long(data)
0674 
0675 long4 = ArgumentDescriptor(
0676     name="long4",
0677     n=TAKEN_FROM_ARGUMENT4,
0678     reader=read_long4,
0679     doc="""A binary representation of a long, little-endian.
0680 
0681     This first reads four bytes as a signed size (but requires the
0682     size to be >= 0), then reads that many bytes and interprets them
0683     as a little-endian 2's-complement long.  If the size is 0, that's taken
0684     as a shortcut for the long 0L, although LONG1 should really be used
0685     then instead (and in any case where # of bytes < 256).
0686     """)
0687 
0688 
0689 ##############################################################################
0690 # Object descriptors.  The stack used by the pickle machine holds objects,
0691 # and in the stack_before and stack_after attributes of OpcodeInfo
0692 # descriptors we need names to describe the various types of objects that can
0693 # appear on the stack.
0694 
0695 class StackObject(object):
0696     __slots__ = (
0697         # name of descriptor record, for info only
0698         'name',
0699 
0700         # type of object, or tuple of type objects (meaning the object can
0701         # be of any type in the tuple)
0702         'obtype',
0703 
0704         # human-readable docs for this kind of stack object; a string
0705         'doc',
0706     )
0707 
0708     def __init__(self, name, obtype, doc):
0709         assert isinstance(name, str)
0710         self.name = name
0711 
0712         assert isinstance(obtype, type) or isinstance(obtype, tuple)
0713         if isinstance(obtype, tuple):
0714             for contained in obtype:
0715                 assert isinstance(contained, type)
0716         self.obtype = obtype
0717 
0718         assert isinstance(doc, str)
0719         self.doc = doc
0720 
0721     def __repr__(self):
0722         return self.name
0723 
0724 
0725 pyint = StackObject(
0726             name='int',
0727             obtype=int,
0728             doc="A short (as opposed to long) Python integer object.")
0729 
0730 pylong = StackObject(
0731              name='long',
0732              obtype=long,
0733              doc="A long (as opposed to short) Python integer object.")
0734 
0735 pyinteger_or_bool = StackObject(
0736                         name='int_or_bool',
0737                         obtype=(int, long, bool),
0738                         doc="A Python integer object (short or long), or "
0739                             "a Python bool.")
0740 
0741 pybool = StackObject(
0742              name='bool',
0743              obtype=(bool,),
0744              doc="A Python bool object.")
0745 
0746 pyfloat = StackObject(
0747               name='float',
0748               obtype=float,
0749               doc="A Python float object.")
0750 
0751 pystring = StackObject(
0752                name='str',
0753                obtype=str,
0754                doc="A Python string object.")
0755 
0756 pyunicode = StackObject(
0757                 name='unicode',
0758                 obtype=unicode,
0759                 doc="A Python Unicode string object.")
0760 
0761 pynone = StackObject(
0762              name="None",
0763              obtype=type(None),
0764              doc="The Python None object.")
0765 
0766 pytuple = StackObject(
0767               name="tuple",
0768               obtype=tuple,
0769               doc="A Python tuple object.")
0770 
0771 pylist = StackObject(
0772              name="list",
0773              obtype=list,
0774              doc="A Python list object.")
0775 
0776 pydict = StackObject(
0777              name="dict",
0778              obtype=dict,
0779              doc="A Python dict object.")
0780 
0781 anyobject = StackObject(
0782                 name='any',
0783                 obtype=object,
0784                 doc="Any kind of object whatsoever.")
0785 
0786 markobject = StackObject(
0787                  name="mark",
0788                  obtype=StackObject,
0789                  doc="""'The mark' is a unique object.
0790 
0791                  Opcodes that operate on a variable number of objects
0792                  generally don't embed the count of objects in the opcode,
0793                  or pull it off the stack.  Instead the MARK opcode is used
0794                  to push a special marker object on the stack, and then
0795                  some other opcodes grab all the objects from the top of
0796                  the stack down to (but not including) the topmost marker
0797                  object.
0798                  """)
0799 
0800 stackslice = StackObject(
0801                  name="stackslice",
0802                  obtype=StackObject,
0803                  doc="""An object representing a contiguous slice of the stack.
0804 
0805                  This is used in conjuction with markobject, to represent all
0806                  of the stack following the topmost markobject.  For example,
0807                  the POP_MARK opcode changes the stack from
0808 
0809                      [..., markobject, stackslice]
0810                  to
0811                      [...]
0812 
0813                  No matter how many object are on the stack after the topmost
0814                  markobject, POP_MARK gets rid of all of them (including the
0815                  topmost markobject too).
0816                  """)
0817 
0818 ##############################################################################
0819 # Descriptors for pickle opcodes.
0820 
0821 class OpcodeInfo(object):
0822 
0823     __slots__ = (
0824         # symbolic name of opcode; a string
0825         'name',
0826 
0827         # the code used in a bytestream to represent the opcode; a
0828         # one-character string
0829         'code',
0830 
0831         # If the opcode has an argument embedded in the byte string, an
0832         # instance of ArgumentDescriptor specifying its type.  Note that
0833         # arg.reader(s) can be used to read and decode the argument from
0834         # the bytestream s, and arg.doc documents the format of the raw
0835         # argument bytes.  If the opcode doesn't have an argument embedded
0836         # in the bytestream, arg should be None.
0837         'arg',
0838 
0839         # what the stack looks like before this opcode runs; a list
0840         'stack_before',
0841 
0842         # what the stack looks like after this opcode runs; a list
0843         'stack_after',
0844 
0845         # the protocol number in which this opcode was introduced; an int
0846         'proto',
0847 
0848         # human-readable docs for this opcode; a string
0849         'doc',
0850     )
0851 
0852     def __init__(self, name, code, arg,
0853                  stack_before, stack_after, proto, doc):
0854         assert isinstance(name, str)
0855         self.name = name
0856 
0857         assert isinstance(code, str)
0858         assert len(code) == 1
0859         self.code = code
0860 
0861         assert arg is None or isinstance(arg, ArgumentDescriptor)
0862         self.arg = arg
0863 
0864         assert isinstance(stack_before, list)
0865         for x in stack_before:
0866             assert isinstance(x, StackObject)
0867         self.stack_before = stack_before
0868 
0869         assert isinstance(stack_after, list)
0870         for x in stack_after:
0871             assert isinstance(x, StackObject)
0872         self.stack_after = stack_after
0873 
0874         assert isinstance(proto, int) and 0 <= proto <= 2
0875         self.proto = proto
0876 
0877         assert isinstance(doc, str)
0878         self.doc = doc
0879 
0880 I = OpcodeInfo
0881 opcodes = [
0882 
0883     # Ways to spell integers.
0884 
0885     I(name='INT',
0886       code='I',
0887       arg=decimalnl_short,
0888       stack_before=[],
0889       stack_after=[pyinteger_or_bool],
0890       proto=0,
0891       doc="""Push an integer or bool.
0892 
0893       The argument is a newline-terminated decimal literal string.
0894 
0895       The intent may have been that this always fit in a short Python int,
0896       but INT can be generated in pickles written on a 64-bit box that
0897       require a Python long on a 32-bit box.  The difference between this
0898       and LONG then is that INT skips a trailing 'L', and produces a short
0899       int whenever possible.
0900 
0901       Another difference is due to that, when bool was introduced as a
0902       distinct type in 2.3, builtin names True and False were also added to
0903       2.2.2, mapping to ints 1 and 0.  For compatibility in both directions,
0904       True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
0905       Leading zeroes are never produced for a genuine integer.  The 2.3
0906       (and later) unpicklers special-case these and return bool instead;
0907       earlier unpicklers ignore the leading "0" and return the int.
0908       """),
0909 
0910     I(name='BININT',
0911       code='J',
0912       arg=int4,
0913       stack_before=[],
0914       stack_after=[pyint],
0915       proto=1,
0916       doc="""Push a four-byte signed integer.
0917 
0918       This handles the full range of Python (short) integers on a 32-bit
0919       box, directly as binary bytes (1 for the opcode and 4 for the integer).
0920       If the integer is non-negative and fits in 1 or 2 bytes, pickling via
0921       BININT1 or BININT2 saves space.
0922       """),
0923 
0924     I(name='BININT1',
0925       code='K',
0926       arg=uint1,
0927       stack_before=[],
0928       stack_after=[pyint],
0929       proto=1,
0930       doc="""Push a one-byte unsigned integer.
0931 
0932       This is a space optimization for pickling very small non-negative ints,
0933       in range(256).
0934       """),
0935 
0936     I(name='BININT2',
0937       code='M',
0938       arg=uint2,
0939       stack_before=[],
0940       stack_after=[pyint],
0941       proto=1,
0942       doc="""Push a two-byte unsigned integer.
0943 
0944       This is a space optimization for pickling small positive ints, in
0945       range(256, 2**16).  Integers in range(256) can also be pickled via
0946       BININT2, but BININT1 instead saves a byte.
0947       """),
0948 
0949     I(name='LONG',
0950       code='L',
0951       arg=decimalnl_long,
0952       stack_before=[],
0953       stack_after=[pylong],
0954       proto=0,
0955       doc="""Push a long integer.
0956 
0957       The same as INT, except that the literal ends with 'L', and always
0958       unpickles to a Python long.  There doesn't seem a real purpose to the
0959       trailing 'L'.
0960 
0961       Note that LONG takes time quadratic in the number of digits when
0962       unpickling (this is simply due to the nature of decimal->binary
0963       conversion).  Proto 2 added linear-time (in C; still quadratic-time
0964       in Python) LONG1 and LONG4 opcodes.
0965       """),
0966 
0967     I(name="LONG1",
0968       code='\x8a',
0969       arg=long1,
0970       stack_before=[],
0971       stack_after=[pylong],
0972       proto=2,
0973       doc="""Long integer using one-byte length.
0974 
0975       A more efficient encoding of a Python long; the long1 encoding
0976       says it all."""),
0977 
0978     I(name="LONG4",
0979       code='\x8b',
0980       arg=long4,
0981       stack_before=[],
0982       stack_after=[pylong],
0983       proto=2,
0984       doc="""Long integer using found-byte length.
0985 
0986       A more efficient encoding of a Python long; the long4 encoding
0987       says it all."""),
0988 
0989     # Ways to spell strings (8-bit, not Unicode).
0990 
0991     I(name='STRING',
0992       code='S',
0993       arg=stringnl,
0994       stack_before=[],
0995       stack_after=[pystring],
0996       proto=0,
0997       doc="""Push a Python string object.
0998 
0999       The argument is a repr-style string, with bracketing quote characters,
1000       and perhaps embedded escapes.  The argument extends until the next
1001       newline character.
1002       """),
1003 
1004     I(name='BINSTRING',
1005       code='T',
1006       arg=string4,
1007       stack_before=[],
1008       stack_after=[pystring],
1009       proto=1,
1010       doc="""Push a Python string object.
1011 
1012       There are two arguments:  the first is a 4-byte little-endian signed int
1013       giving the number of bytes in the string, and the second is that many
1014       bytes, which are taken literally as the string content.
1015       """),
1016 
1017     I(name='SHORT_BINSTRING',
1018       code='U',
1019       arg=string1,
1020       stack_before=[],
1021       stack_after=[pystring],
1022       proto=1,
1023       doc="""Push a Python string object.
1024 
1025       There are two arguments:  the first is a 1-byte unsigned int giving
1026       the number of bytes in the string, and the second is that many bytes,
1027       which are taken literally as the string content.
1028       """),
1029 
1030     # Ways to spell None.
1031 
1032     I(name='NONE',
1033       code='N',
1034       arg=None,
1035       stack_before=[],
1036       stack_after=[pynone],
1037       proto=0,
1038       doc="Push None on the stack."),
1039 
1040     # Ways to spell bools, starting with proto 2.  See INT for how this was
1041     # done before proto 2.
1042 
1043     I(name='NEWTRUE',
1044       code='\x88',
1045       arg=None,
1046       stack_before=[],
1047       stack_after=[pybool],
1048       proto=2,
1049       doc="""True.
1050 
1051       Push True onto the stack."""),
1052 
1053     I(name='NEWFALSE',
1054       code='\x89',
1055       arg=None,
1056       stack_before=[],
1057       stack_after=[pybool],
1058       proto=2,
1059       doc="""True.
1060 
1061       Push False onto the stack."""),
1062 
1063     # Ways to spell Unicode strings.
1064 
1065     I(name='UNICODE',
1066       code='V',
1067       arg=unicodestringnl,
1068       stack_before=[],
1069       stack_after=[pyunicode],
1070       proto=0,  # this may be pure-text, but it's a later addition
1071       doc="""Push a Python Unicode string object.
1072 
1073       The argument is a raw-unicode-escape encoding of a Unicode string,
1074       and so may contain embedded escape sequences.  The argument extends
1075       until the next newline character.
1076       """),
1077 
1078     I(name='BINUNICODE',
1079       code='X',
1080       arg=unicodestring4,
1081       stack_before=[],
1082       stack_after=[pyunicode],
1083       proto=1,
1084       doc="""Push a Python Unicode string object.
1085 
1086       There are two arguments:  the first is a 4-byte little-endian signed int
1087       giving the number of bytes in the string.  The second is that many
1088       bytes, and is the UTF-8 encoding of the Unicode string.
1089       """),
1090 
1091     # Ways to spell floats.
1092 
1093     I(name='FLOAT',
1094       code='F',
1095       arg=floatnl,
1096       stack_before=[],
1097       stack_after=[pyfloat],
1098       proto=0,
1099       doc="""Newline-terminated decimal float literal.
1100 
1101       The argument is repr(a_float), and in general requires 17 significant
1102       digits for roundtrip conversion to be an identity (this is so for
1103       IEEE-754 double precision values, which is what Python float maps to
1104       on most boxes).
1105 
1106       In general, FLOAT cannot be used to transport infinities, NaNs, or
1107       minus zero across boxes (or even on a single box, if the platform C
1108       library can't read the strings it produces for such things -- Windows
1109       is like that), but may do less damage than BINFLOAT on boxes with
1110       greater precision or dynamic range than IEEE-754 double.
1111       """),
1112 
1113     I(name='BINFLOAT',
1114       code='G',
1115       arg=float8,
1116       stack_before=[],
1117       stack_after=[pyfloat],
1118       proto=1,
1119       doc="""Float stored in binary form, with 8 bytes of data.
1120 
1121       This generally requires less than half the space of FLOAT encoding.
1122       In general, BINFLOAT cannot be used to transport infinities, NaNs, or
1123       minus zero, raises an exception if the exponent exceeds the range of
1124       an IEEE-754 double, and retains no more than 53 bits of precision (if
1125       there are more than that, "add a half and chop" rounding is used to
1126       cut it back to 53 significant bits).
1127       """),
1128 
1129     # Ways to build lists.
1130 
1131     I(name='EMPTY_LIST',
1132       code=']',
1133       arg=None,
1134       stack_before=[],
1135       stack_after=[pylist],
1136       proto=1,
1137       doc="Push an empty list."),
1138 
1139     I(name='APPEND',
1140       code='a',
1141       arg=None,
1142       stack_before=[pylist, anyobject],
1143       stack_after=[pylist],
1144       proto=0,
1145       doc="""Append an object to a list.
1146 
1147       Stack before:  ... pylist anyobject
1148       Stack after:   ... pylist+[anyobject]
1149 
1150       although pylist is really extended in-place.
1151       """),
1152 
1153     I(name='APPENDS',
1154       code='e',
1155       arg=None,
1156       stack_before=[pylist, markobject, stackslice],
1157       stack_after=[pylist],
1158       proto=1,
1159       doc="""Extend a list by a slice of stack objects.
1160 
1161       Stack before:  ... pylist markobject stackslice
1162       Stack after:   ... pylist+stackslice
1163 
1164       although pylist is really extended in-place.
1165       """),
1166 
1167     I(name='LIST',
1168       code='l',
1169       arg=None,
1170       stack_before=[markobject, stackslice],
1171       stack_after=[pylist],
1172       proto=0,
1173       doc="""Build a list out of the topmost stack slice, after markobject.
1174 
1175       All the stack entries following the topmost markobject are placed into
1176       a single Python list, which single list object replaces all of the
1177       stack from the topmost markobject onward.  For example,
1178 
1179       Stack before: ... markobject 1 2 3 'abc'
1180       Stack after:  ... [1, 2, 3, 'abc']
1181       """),
1182 
1183     # Ways to build tuples.
1184 
1185     I(name='EMPTY_TUPLE',
1186       code=')',
1187       arg=None,
1188       stack_before=[],
1189       stack_after=[pytuple],
1190       proto=1,
1191       doc="Push an empty tuple."),
1192 
1193     I(name='TUPLE',
1194       code='t',
1195       arg=None,
1196       stack_before=[markobject, stackslice],
1197       stack_after=[pytuple],
1198       proto=0,
1199       doc="""Build a tuple out of the topmost stack slice, after markobject.
1200 
1201       All the stack entries following the topmost markobject are placed into
1202       a single Python tuple, which single tuple object replaces all of the
1203       stack from the topmost markobject onward.  For example,
1204 
1205       Stack before: ... markobject 1 2 3 'abc'
1206       Stack after:  ... (1, 2, 3, 'abc')
1207       """),
1208 
1209     I(name='TUPLE1',
1210       code='\x85',
1211       arg=None,
1212       stack_before=[anyobject],
1213       stack_after=[pytuple],
1214       proto=2,
1215       doc="""One-tuple.
1216 
1217       This code pops one value off the stack and pushes a tuple of
1218       length 1 whose one item is that value back onto it.  IOW:
1219 
1220           stack[-1] = tuple(stack[-1:])
1221       """),
1222 
1223     I(name='TUPLE2',
1224       code='\x86',
1225       arg=None,
1226       stack_before=[anyobject, anyobject],
1227       stack_after=[pytuple],
1228       proto=2,
1229       doc="""One-tuple.
1230 
1231       This code pops two values off the stack and pushes a tuple
1232       of length 2 whose items are those values back onto it.  IOW:
1233 
1234           stack[-2:] = [tuple(stack[-2:])]
1235       """),
1236 
1237     I(name='TUPLE3',
1238       code='\x87',
1239       arg=None,
1240       stack_before=[anyobject, anyobject, anyobject],
1241       stack_after=[pytuple],
1242       proto=2,
1243       doc="""One-tuple.
1244 
1245       This code pops three values off the stack and pushes a tuple
1246       of length 3 whose items are those values back onto it.  IOW:
1247 
1248           stack[-3:] = [tuple(stack[-3:])]
1249       """),
1250 
1251     # Ways to build dicts.
1252 
1253     I(name='EMPTY_DICT',
1254       code='}',
1255       arg=None,
1256       stack_before=[],
1257       stack_after=[pydict],
1258       proto=1,
1259       doc="Push an empty dict."),
1260 
1261     I(name='DICT',
1262       code='d',
1263       arg=None,
1264       stack_before=[markobject, stackslice],
1265       stack_after=[pydict],
1266       proto=0,
1267       doc="""Build a dict out of the topmost stack slice, after markobject.
1268 
1269       All the stack entries following the topmost markobject are placed into
1270       a single Python dict, which single dict object replaces all of the
1271       stack from the topmost markobject onward.  The stack slice alternates
1272       key, value, key, value, ....  For example,
1273 
1274       Stack before: ... markobject 1 2 3 'abc'
1275       Stack after:  ... {1: 2, 3: 'abc'}
1276       """),
1277 
1278     I(name='SETITEM',
1279       code='s',
1280       arg=None,
1281       stack_before=[pydict, anyobject, anyobject],
1282       stack_after=[pydict],
1283       proto=0,
1284       doc="""Add a key+value pair to an existing dict.
1285 
1286       Stack before:  ... pydict key value
1287       Stack after:   ... pydict
1288 
1289       where pydict has been modified via pydict[key] = value.
1290       """),
1291 
1292     I(name='SETITEMS',
1293       code='u',
1294       arg=None,
1295       stack_before=[pydict, markobject, stackslice],
1296       stack_after=[pydict],
1297       proto=1,
1298       doc="""Add an arbitrary number of key+value pairs to an existing dict.
1299 
1300       The slice of the stack following the topmost markobject is taken as
1301       an alternating sequence of keys and values, added to the dict
1302       immediately under the topmost markobject.  Everything at and after the
1303       topmost markobject is popped, leaving the mutated dict at the top
1304       of the stack.
1305 
1306       Stack before:  ... pydict markobject key_1 value_1 ... key_n value_n
1307       Stack after:   ... pydict
1308 
1309       where pydict has been modified via pydict[key_i] = value_i for i in
1310       1, 2, ..., n, and in that order.
1311       """),
1312 
1313     # Stack manipulation.
1314 
1315     I(name='POP',
1316       code='0',
1317       arg=None,
1318       stack_before=[anyobject],
1319       stack_after=[],
1320       proto=0,
1321       doc="Discard the top stack item, shrinking the stack by one item."),
1322 
1323     I(name='DUP',
1324       code='2',
1325       arg=None,
1326       stack_before=[anyobject],
1327       stack_after=[anyobject, anyobject],
1328       proto=0,
1329       doc="Push the top stack item onto the stack again, duplicating it."),
1330 
1331     I(name='MARK',
1332       code='(',
1333       arg=None,
1334       stack_before=[],
1335       stack_after=[markobject],
1336       proto=0,
1337       doc="""Push markobject onto the stack.
1338 
1339       markobject is a unique object, used by other opcodes to identify a
1340       region of the stack containing a variable number of objects for them
1341       to work on.  See markobject.doc for more detail.
1342       """),
1343 
1344     I(name='POP_MARK',
1345       code='1',
1346       arg=None,
1347       stack_before=[markobject, stackslice],
1348       stack_after=[],
1349       proto=0,
1350       doc="""Pop all the stack objects at and above the topmost markobject.
1351 
1352       When an opcode using a variable number of stack objects is done,
1353       POP_MARK is used to remove those objects, and to remove the markobject
1354       that delimited their starting position on the stack.
1355       """),
1356 
1357     # Memo manipulation.  There are really only two operations (get and put),
1358     # each in all-text, "short binary", and "long binary" flavors.
1359 
1360     I(name='GET',
1361       code='g',
1362       arg=decimalnl_short,
1363       stack_before=[],
1364       stack_after=[anyobject],
1365       proto=0,
1366       doc="""Read an object from the memo and push it on the stack.
1367 
1368       The index of the memo object to push is given by the newline-teriminated
1369       decimal string following.  BINGET and LONG_BINGET are space-optimized
1370       versions.
1371       """),
1372 
1373     I(name='BINGET',
1374       code='h',
1375       arg=uint1,
1376       stack_before=[],
1377       stack_after=[anyobject],
1378       proto=1,
1379       doc="""Read an object from the memo and push it on the stack.
1380 
1381       The index of the memo object to push is given by the 1-byte unsigned
1382       integer following.
1383       """),
1384 
1385     I(name='LONG_BINGET',
1386       code='j',
1387       arg=int4,
1388       stack_before=[],
1389       stack_after=[anyobject],
1390       proto=1,
1391       doc="""Read an object from the memo and push it on the stack.
1392 
1393       The index of the memo object to push is given by the 4-byte signed
1394       little-endian integer following.
1395       """),
1396 
1397     I(name='PUT',
1398       code='p',
1399       arg=decimalnl_short,
1400       stack_before=[],
1401       stack_after=[],
1402       proto=0,
1403       doc="""Store the stack top into the memo.  The stack is not popped.
1404 
1405       The index of the memo location to write into is given by the newline-
1406       terminated decimal string following.  BINPUT and LONG_BINPUT are
1407       space-optimized versions.
1408       """),
1409 
1410     I(name='BINPUT',
1411       code='q',
1412       arg=uint1,
1413       stack_before=[],
1414       stack_after=[],
1415       proto=1,
1416       doc="""Store the stack top into the memo.  The stack is not popped.
1417 
1418       The index of the memo location to write into is given by the 1-byte
1419       unsigned integer following.
1420       """),
1421 
1422     I(name='LONG_BINPUT',
1423       code='r',
1424       arg=int4,
1425       stack_before=[],
1426       stack_after=[],
1427       proto=1,
1428       doc="""Store the stack top into the memo.  The stack is not popped.
1429 
1430       The index of the memo location to write into is given by the 4-byte
1431       signed little-endian integer following.
1432       """),
1433 
1434     # Access the extension registry (predefined objects).  Akin to the GET
1435     # family.
1436 
1437     I(name='EXT1',
1438       code='\x82',
1439       arg=uint1,
1440       stack_before=[],
1441       stack_after=[anyobject],
1442       proto=2,
1443       doc="""Extension code.
1444 
1445       This code and the similar EXT2 and EXT4 allow using a registry
1446       of popular objects that are pickled by name, typically classes.
1447       It is envisioned that through a global negotiation and
1448       registration process, third parties can set up a mapping between
1449       ints and object names.
1450 
1451       In order to guarantee pickle interchangeability, the extension
1452       code registry ought to be global, although a range of codes may
1453       be reserved for private use.
1454 
1455       EXT1 has a 1-byte integer argument.  This is used to index into the
1456       extension registry, and the object at that index is pushed on the stack.
1457       """),
1458 
1459     I(name='EXT2',
1460       code='\x83',
1461       arg=uint2,
1462       stack_before=[],
1463       stack_after=[anyobject],
1464       proto=2,
1465       doc="""Extension code.
1466 
1467       See EXT1.  EXT2 has a two-byte integer argument.
1468       """),
1469 
1470     I(name='EXT4',
1471       code='\x84',
1472       arg=int4,
1473       stack_before=[],
1474       stack_after=[anyobject],
1475       proto=2,
1476       doc="""Extension code.
1477 
1478       See EXT1.  EXT4 has a four-byte integer argument.
1479       """),
1480 
1481     # Push a class object, or module function, on the stack, via its module
1482     # and name.
1483 
1484     I(name='GLOBAL',
1485       code='c',
1486       arg=stringnl_noescape_pair,
1487       stack_before=[],
1488       stack_after=[anyobject],
1489       proto=0,
1490       doc="""Push a global object (module.attr) on the stack.
1491 
1492       Two newline-terminated strings follow the GLOBAL opcode.  The first is
1493       taken as a module name, and the second as a class name.  The class
1494       object module.class is pushed on the stack.  More accurately, the
1495       object returned by self.find_class(module, class) is pushed on the
1496       stack, so unpickling subclasses can override this form of lookup.
1497       """),
1498 
1499     # Ways to build objects of classes pickle doesn't know about directly
1500     # (user-defined classes).  I despair of documenting this accurately
1501     # and comprehensibly -- you really have to read the pickle code to
1502     # find all the special cases.
1503 
1504     I(name='REDUCE',
1505       code='R',
1506       arg=None,
1507       stack_before=[anyobject, anyobject],
1508       stack_after=[anyobject],
1509       proto=0,
1510       doc="""Push an object built from a callable and an argument tuple.
1511 
1512       The opcode is named to remind of the __reduce__() method.
1513 
1514       Stack before: ... callable pytuple
1515       Stack after:  ... callable(*pytuple)
1516 
1517       The callable and the argument tuple are the first two items returned
1518       by a __reduce__ method.  Applying the callable to the argtuple is
1519       supposed to reproduce the original object, or at least get it started.
1520       If the __reduce__ method returns a 3-tuple, the last component is an
1521       argument to be passed to the object's __setstate__, and then the REDUCE
1522       opcode is followed by code to create setstate's argument, and then a
1523       BUILD opcode to apply  __setstate__ to that argument.
1524 
1525       There are lots of special cases here.  The argtuple can be None, in
1526       which case callable.__basicnew__() is called instead to produce the
1527       object to be pushed on the stack.  This appears to be a trick unique
1528       to ExtensionClasses, and is deprecated regardless.
1529 
1530       If type(callable) is not ClassType, REDUCE complains unless the
1531       callable has been registered with the copy_reg module's
1532       safe_constructors dict, or the callable has a magic
1533       '__safe_for_unpickling__' attribute with a true value.  I'm not sure
1534       why it does this, but I've sure seen this complaint often enough when
1535       I didn't want to <wink>.
1536       """),
1537 
1538     I(name='BUILD',
1539       code='b',
1540       arg=None,
1541       stack_before=[anyobject, anyobject],
1542       stack_after=[anyobject],
1543       proto=0,
1544       doc="""Finish building an object, via __setstate__ or dict update.
1545 
1546       Stack before: ... anyobject argument
1547       Stack after:  ... anyobject
1548 
1549       where anyobject may have been mutated, as follows:
1550 
1551       If the object has a __setstate__ method,
1552 
1553           anyobject.__setstate__(argument)
1554 
1555       is called.
1556 
1557       Else the argument must be a dict, the object must have a __dict__, and
1558       the object is updated via
1559 
1560           anyobject.__dict__.update(argument)
1561 
1562       This may raise RuntimeError in restricted execution mode (which
1563       disallows access to __dict__ directly); in that case, the object
1564       is updated instead via
1565 
1566           for k, v in argument.items():
1567               anyobject[k] = v
1568       """),
1569 
1570     I(name='INST',
1571       code='i',
1572       arg=stringnl_noescape_pair,
1573       stack_before=[markobject, stackslice],
1574       stack_after=[anyobject],
1575       proto=0,
1576       doc="""Build a class instance.
1577 
1578       This is the protocol 0 version of protocol 1's OBJ opcode.
1579       INST is followed by two newline-terminated strings, giving a
1580       module and class name, just as for the GLOBAL opcode (and see
1581       GLOBAL for more details about that).  self.find_class(module, name)
1582       is used to get a class object.
1583 
1584       In addition, all the objects on the stack following the topmost
1585       markobject are gathered into a tuple and popped (along with the
1586       topmost markobject), just as for the TUPLE opcode.
1587 
1588       Now it gets complicated.  If all of these are true:
1589 
1590         + The argtuple is empty (markobject was at the top of the stack
1591           at the start).
1592 
1593         + It's an old-style class object (the type of the class object is
1594           ClassType).
1595 
1596         + The class object does not have a __getinitargs__ attribute.
1597 
1598       then we want to create an old-style class instance without invoking
1599       its __init__() method (pickle has waffled on this over the years; not
1600       calling __init__() is current wisdom).  In this case, an instance of
1601       an old-style dummy class is created, and then we try to rebind its
1602       __class__ attribute to the desired class object.  If this succeeds,
1603       the new instance object is pushed on the stack, and we're done.  In
1604       restricted execution mode it can fail (assignment to __class__ is
1605       disallowed), and I'm not really sure what happens then -- it looks
1606       like the code ends up calling the class object's __init__ anyway,
1607       via falling into the next case.
1608 
1609       Else (the argtuple is not empty, it's not an old-style class object,
1610       or the class object does have a __getinitargs__ attribute), the code
1611       first insists that the class object have a __safe_for_unpickling__
1612       attribute.  Unlike as for the __safe_for_unpickling__ check in REDUCE,
1613       it doesn't matter whether this attribute has a true or false value, it
1614       only matters whether it exists (XXX this is a bug; cPickle
1615       requires the attribute to be true).  If __safe_for_unpickling__
1616       doesn't exist, UnpicklingError is raised.
1617 
1618       Else (the class object does have a __safe_for_unpickling__ attr),
1619       the class object obtained from INST's arguments is applied to the
1620       argtuple obtained from the stack, and the resulting instance object
1621       is pushed on the stack.
1622 
1623       NOTE:  checks for __safe_for_unpickling__ went away in Python 2.3.
1624       """),
1625 
1626     I(name='OBJ',
1627       code='o',
1628       arg=None,
1629       stack_before=[markobject, anyobject, stackslice],
1630       stack_after=[anyobject],
1631       proto=1,
1632       doc="""Build a class instance.
1633 
1634       This is the protocol 1 version of protocol 0's INST opcode, and is
1635       very much like it.  The major difference is that the class object
1636       is taken off the stack, allowing it to be retrieved from the memo
1637       repeatedly if several instances of the same class are created.  This
1638       can be much more efficient (in both time and space) than repeatedly
1639       embedding the module and class names in INST opcodes.
1640 
1641       Unlike INST, OBJ takes no arguments from the opcode stream.  Instead
1642       the class object is taken off the stack, immediately above the
1643       topmost markobject:
1644 
1645       Stack before: ... markobject classobject stackslice
1646       Stack after:  ... new_instance_object
1647 
1648       As for INST, the remainder of the stack above the markobject is
1649       gathered into an argument tuple, and then the logic seems identical,
1650       except that no __safe_for_unpickling__ check is done (XXX this is
1651       a bug; cPickle does test __safe_for_unpickling__).  See INST for
1652       the gory details.
1653 
1654       NOTE:  In Python 2.3, INST and OBJ are identical except for how they
1655       get the class object.  That was always the intent; the implementations
1656       had diverged for accidental reasons.
1657       """),
1658 
1659     I(name='NEWOBJ',
1660       code='\x81',
1661       arg=None,
1662       stack_before=[anyobject, anyobject],
1663       stack_after=[anyobject],
1664       proto=2,
1665       doc="""Build an object instance.
1666 
1667       The stack before should be thought of as containing a class
1668       object followed by an argument tuple (the tuple being the stack
1669       top).  Call these cls and args.  They are popped off the stack,
1670       and the value returned by cls.__new__(cls, *args) is pushed back
1671       onto the stack.
1672       """),
1673 
1674     # Machine control.
1675 
1676     I(name='PROTO',
1677       code='\x80',
1678       arg=uint1,
1679       stack_before=[],
1680       stack_after=[],
1681       proto=2,
1682       doc="""Protocol version indicator.
1683 
1684       For protocol 2 and above, a pickle must start with this opcode.
1685       The argument is the protocol version, an int in range(2, 256).
1686       """),
1687 
1688     I(name='STOP',
1689       code='.',
1690       arg=None,
1691       stack_before=[anyobject],
1692       stack_after=[],
1693       proto=0,
1694       doc="""Stop the unpickling machine.
1695 
1696       Every pickle ends with this opcode.  The object at the top of the stack
1697       is popped, and that's the result of unpickling.  The stack should be
1698       empty then.
1699       """),
1700 
1701     # Ways to deal with persistent IDs.
1702 
1703     I(name='PERSID',
1704       code='P',
1705       arg=stringnl_noescape,
1706       stack_before=[],
1707       stack_after=[anyobject],
1708       proto=0,
1709       doc="""Push an object identified by a persistent ID.
1710 
1711       The pickle module doesn't define what a persistent ID means.  PERSID's
1712       argument is a newline-terminated str-style (no embedded escapes, no
1713       bracketing quote characters) string, which *is* "the persistent ID".
1714       The unpickler passes this string to self.persistent_load().  Whatever
1715       object that returns is pushed on the stack.  There is no implementation
1716       of persistent_load() in Python's unpickler:  it must be supplied by an
1717       unpickler subclass.
1718       """),
1719 
1720     I(name='BINPERSID',
1721       code='Q',
1722       arg=None,
1723       stack_before=[anyobject],
1724       stack_after=[anyobject],
1725       proto=1,
1726       doc="""Push an object identified by a persistent ID.
1727 
1728       Like PERSID, except the persistent ID is popped off the stack (instead
1729       of being a string embedded in the opcode bytestream).  The persistent
1730       ID is passed to self.persistent_load(), and whatever object that
1731       returns is pushed on the stack.  See PERSID for more detail.
1732       """),
1733 ]
1734 del I
1735 
1736 # Verify uniqueness of .name and .code members.
1737 name2i = {}
1738 code2i = {}
1739 
1740 for i, d in enumerate(opcodes):
1741     if d.name in name2i:
1742         raise ValueError("repeated name %r at indices %d and %d" %
1743                          (d.name, name2i[d.name], i))
1744     if d.code in code2i:
1745         raise ValueError("repeated code %r at indices %d and %d" %
1746                          (d.code, code2i[d.code], i))
1747 
1748     name2i[d.name] = i
1749     code2i[d.code] = i
1750 
1751 del name2i, code2i, i, d
1752 
1753 ##############################################################################
1754 # Build a code2op dict, mapping opcode characters to OpcodeInfo records.
1755 # Also ensure we've got the same stuff as pickle.py, although the
1756 # introspection here is dicey.
1757 
1758 code2op = {}
1759 for d in opcodes:
1760     code2op[d.code] = d
1761 del d
1762 
1763 def assure_pickle_consistency(verbose=False):
1764     import pickle, re
1765 
1766     copy = code2op.copy()
1767     for name in pickle.__all__:
1768         if not re.match("[A-Z][A-Z0-9_]+$", name):
1769             if verbose:
1770                 print "skipping %r: it doesn't look like an opcode name" % name
1771             continue
1772         picklecode = getattr(pickle, name)
1773         if not isinstance(picklecode, str) or len(picklecode) != 1:
1774             if verbose:
1775                 print ("skipping %r: value %r doesn't look like a pickle "
1776                        "code" % (name, picklecode))
1777             continue
1778         if picklecode in copy:
1779             if verbose:
1780                 print "checking name %r w/ code %r for consistency" % (
1781                       name, picklecode)
1782             d = copy[picklecode]
1783             if d.name != name:
1784                 raise ValueError("for pickle code %r, pickle.py uses name %r "
1785                                  "but we're using name %r" % (picklecode,
1786                                                               name,
1787                                                               d.name))
1788             # Forget this one.  Any left over in copy at the end are a problem
1789             # of a different kind.
1790             del copy[picklecode]
1791         else:
1792             raise ValueError("pickle.py appears to have a pickle opcode with "
1793                              "name %r and code %r, but we don't" %
1794                              (name, picklecode))
1795     if copy:
1796         msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
1797         for code, d in copy.items():
1798             msg.append("    name %r with code %r" % (d.name, code))
1799         raise ValueError("\n".join(msg))
1800 
1801 assure_pickle_consistency()
1802 del assure_pickle_consistency
1803 
1804 ##############################################################################
1805 # A pickle opcode generator.
1806 
1807 def genops(pickle):
1808     """Generate all the opcodes in a pickle.
1809 
1810     'pickle' is a file-like object, or string, containing the pickle.
1811 
1812     Each opcode in the pickle is generated, from the current pickle position,
1813     stopping after a STOP opcode is delivered.  A triple is generated for
1814     each opcode:
1815 
1816         opcode, arg, pos
1817 
1818     opcode is an OpcodeInfo record, describing the current opcode.
1819 
1820     If the opcode has an argument embedded in the pickle, arg is its decoded
1821     value, as a Python object.  If the opcode doesn't have an argument, arg
1822     is None.
1823 
1824     If the pickle has a tell() method, pos was the value of pickle.tell()
1825     before reading the current opcode.  If the pickle is a string object,
1826     it's wrapped in a StringIO object, and the latter's tell() result is
1827     used.  Else (the pickle doesn't have a tell(), and it's not obvious how
1828     to query its current position) pos is None.
1829     """
1830 
1831     import cStringIO as StringIO
1832 
1833     if isinstance(pickle, str):
1834         pickle = StringIO.StringIO(pickle)
1835 
1836     if hasattr(pickle, "tell"):
1837         getpos = pickle.tell
1838     else:
1839         getpos = lambda: None
1840 
1841     while True:
1842         pos = getpos()
1843         code = pickle.read(1)
1844         opcode = code2op.get(code)
1845         if opcode is None:
1846             if code == "":
1847                 raise ValueError("pickle exhausted before seeing STOP")
1848             else:
1849                 raise ValueError("at position %s, opcode %r unknown" % (
1850                                  pos is None and "<unknown>" or pos,
1851                                  code))
1852         if opcode.arg is None:
1853             arg = None
1854         else:
1855             arg = opcode.arg.reader(pickle)
1856         yield opcode, arg, pos
1857         if code == '.':
1858             assert opcode.name == 'STOP'
1859             break
1860 
1861 ##############################################################################
1862 # A symbolic pickle disassembler.
1863 
1864 def dis(pickle, out=None, memo=None, indentlevel=4):
1865     """Produce a symbolic disassembly of a pickle.
1866 
1867     'pickle' is a file-like object, or string, containing a (at least one)
1868     pickle.  The pickle is disassembled from the current position, through
1869     the first STOP opcode encountered.
1870 
1871     Optional arg 'out' is a file-like object to which the disassembly is
1872     printed.  It defaults to sys.stdout.
1873 
1874     Optional arg 'memo' is a Python dict, used as the pickle's memo.  It
1875     may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
1876     Passing the same memo object to another dis() call then allows disassembly
1877     to proceed across multiple pickles that were all created by the same
1878     pickler with the same memo.  Ordinarily you don't need to worry about this.
1879 
1880     Optional arg indentlevel is the number of blanks by which to indent
1881     a new MARK level.  It defaults to 4.
1882 
1883     In addition to printing the disassembly, some sanity checks are made:
1884 
1885     + All embedded opcode arguments "make sense".
1886 
1887     + Explicit and implicit pop operations have enough items on the stack.
1888 
1889     + When an opcode implicitly refers to a markobject, a markobject is
1890       actually on the stack.
1891 
1892     + A memo entry isn't referenced before it's defined.
1893 
1894     + The markobject isn't stored in the memo.
1895 
1896     + A memo entry isn't redefined.
1897     """
1898 
1899     # Most of the hair here is for sanity checks, but most of it is needed
1900     # anyway to detect when a protocol 0 POP takes a MARK off the stack
1901     # (which in turn is needed to indent MARK blocks correctly).
1902 
1903     stack = []          # crude emulation of unpickler stack
1904     if memo is None:
1905         memo = {}       # crude emulation of unpicker memo
1906     maxproto = -1       # max protocol number seen
1907     markstack = []      # bytecode positions of MARK opcodes
1908     indentchunk = ' ' * indentlevel
1909     errormsg = None
1910     for opcode, arg, pos in genops(pickle):
1911         if pos is not None:
1912             print >> out, "%5d:" % pos,
1913 
1914         line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
1915                               indentchunk * len(markstack),
1916                               opcode.name)
1917 
1918         maxproto = max(maxproto, opcode.proto)
1919         before = opcode.stack_before    # don't mutate
1920         after = opcode.stack_after      # don't mutate
1921         numtopop = len(before)
1922 
1923         # See whether a MARK should be popped.
1924         markmsg = None
1925         if markobject in before or (opcode.name == "POP" and
1926                                     stack and
1927                                     stack[-1] is markobject):
1928             assert markobject not in after
1929             if __debug__:
1930                 if markobject in before:
1931                     assert before[-1] is stackslice
1932             if markstack:
1933                 markpos = markstack.pop()
1934                 if markpos is None:
1935                     markmsg = "(MARK at unknown opcode offset)"
1936                 else:
1937                     markmsg = "(MARK at %d)" % markpos
1938                 # Pop everything at and after the topmost markobject.
1939                 while stack[-1] is not markobject:
1940                     stack.pop()
1941                 stack.pop()
1942                 # Stop later code from popping too much.
1943                 try:
1944                     numtopop = before.index(markobject)
1945                 except ValueError:
1946                     assert opcode.name == "POP"
1947                     numtopop = 0
1948             else:
1949                 errormsg = markmsg = "no MARK exists on stack"
1950 
1951         # Check for correct memo usage.
1952         if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT"):
1953             assert arg is not None
1954             if arg in memo:
1955                 errormsg = "memo key %r already defined" % arg
1956             elif not stack:
1957                 errormsg = "stack is empty -- can't store into memo"
1958             elif stack[-1] is markobject:
1959                 errormsg = "can't store markobject in the memo"
1960             else:
1961                 memo[arg] = stack[-1]
1962 
1963         elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
1964             if arg in memo:
1965                 assert len(after) == 1
1966                 after = [memo[arg]]     # for better stack emulation
1967             else:
1968                 errormsg = "memo key %r has never been stored into" % arg
1969 
1970         if arg is not None or markmsg:
1971             # make a mild effort to align arguments
1972             line += ' ' * (10 - len(opcode.name))
1973             if arg is not None:
1974                 line += ' ' + repr(arg)
1975             if markmsg:
1976                 line += ' ' + markmsg
1977         print >> out, line
1978 
1979         if errormsg:
1980             # Note that we delayed complaining until the offending opcode
1981             # was printed.
1982             raise ValueError(errormsg)
1983 
1984         # Emulate the stack effects.
1985         if len(stack) < numtopop:
1986             raise ValueError("tries to pop %d items from stack with "
1987                              "only %d items" % (numtopop, len(stack)))
1988         if numtopop:
1989             del stack[-numtopop:]
1990         if markobject in after:
1991             assert markobject not in before
1992             markstack.append(pos)
1993 
1994         stack.extend(after)
1995 
1996     print >> out, "highest protocol among opcodes =", maxproto
1997     if stack:
1998         raise ValueError("stack not empty after STOP: %r" % stack)
1999 
2000 _dis_test = r"""
2001 >>> import pickle
2002 >>> x = [1, 2, (3, 4), {'abc': u"def"}]
2003 >>> pkl = pickle.dumps(x, 0)
2004 >>> dis(pkl)
2005     0: (    MARK
2006     1: l        LIST       (MARK at 0)
2007     2: p    PUT        0
2008     5: I    INT        1
2009     8: a    APPEND
2010     9: I    INT        2
2011    12: a    APPEND
2012    13: (    MARK
2013    14: I        INT        3
2014    17: I        INT        4
2015    20: t        TUPLE      (MARK at 13)
2016    21: p    PUT        1
2017    24: a    APPEND
2018    25: (    MARK
2019    26: d        DICT       (MARK at 25)
2020    27: p    PUT        2
2021    30: S    STRING     'abc'
2022    37: p    PUT        3
2023    40: V    UNICODE    u'def'
2024    45: p    PUT        4
2025    48: s    SETITEM
2026    49: a    APPEND
2027    50: .    STOP
2028 highest protocol among opcodes = 0
2029 
2030 Try again with a "binary" pickle.
2031 
2032 >>> pkl = pickle.dumps(x, 1)
2033 >>> dis(pkl)
2034     0: ]    EMPTY_LIST
2035     1: q    BINPUT     0
2036     3: (    MARK
2037     4: K        BININT1    1
2038     6: K        BININT1    2
2039     8: (        MARK
2040     9: K            BININT1    3
2041    11: K            BININT1    4
2042    13: t            TUPLE      (MARK at 8)
2043    14: q        BINPUT     1
2044    16: }        EMPTY_DICT
2045    17: q        BINPUT     2
2046    19: U        SHORT_BINSTRING 'abc'
2047    24: q        BINPUT     3
2048    26: X        BINUNICODE u'def'
2049    34: q        BINPUT     4
2050    36: s        SETITEM
2051    37: e        APPENDS    (MARK at 3)
2052    38: .    STOP
2053 highest protocol among opcodes = 1
2054 
2055 Exercise the INST/OBJ/BUILD family.
2056 
2057 >>> import random
2058 >>> dis(pickle.dumps(random.random, 0))
2059     0: c    GLOBAL     'random random'
2060    15: p    PUT        0
2061    18: .    STOP
2062 highest protocol among opcodes = 0
2063 
2064 >>> x = [pickle.PicklingError()] * 2
2065 >>> dis(pickle.dumps(x, 0))
2066     0: (    MARK
2067     1: l        LIST       (MARK at 0)
2068     2: p    PUT        0
2069     5: (    MARK
2070     6: i        INST       'pickle PicklingError' (MARK at 5)
2071    28: p    PUT        1
2072    31: (    MARK
2073    32: d        DICT       (MARK at 31)
2074    33: p    PUT        2
2075    36: S    STRING     'args'
2076    44: p    PUT        3
2077    47: (    MARK
2078    48: t        TUPLE      (MARK at 47)
2079    49: s    SETITEM
2080    50: b    BUILD
2081    51: a    APPEND
2082    52: g    GET        1
2083    55: a    APPEND
2084    56: .    STOP
2085 highest protocol among opcodes = 0
2086 
2087 >>> dis(pickle.dumps(x, 1))
2088     0: ]    EMPTY_LIST
2089     1: q    BINPUT     0
2090     3: (    MARK
2091     4: (        MARK
2092     5: c            GLOBAL     'pickle PicklingError'
2093    27: q            BINPUT     1
2094    29: o            OBJ        (MARK at 4)
2095    30: q        BINPUT     2
2096    32: }        EMPTY_DICT
2097    33: q        BINPUT     3
2098    35: U        SHORT_BINSTRING 'args'
2099    41: q        BINPUT     4
2100    43: )        EMPTY_TUPLE
2101    44: s        SETITEM
2102    45: b        BUILD
2103    46: h        BINGET     2
2104    48: e        APPENDS    (MARK at 3)
2105    49: .    STOP
2106 highest protocol among opcodes = 1
2107 
2108 Try "the canonical" recursive-object test.
2109 
2110 >>> L = []
2111 >>> T = L,
2112 >>> L.append(T)
2113 >>> L[0] is T
2114 True
2115 >>> T[0] is L
2116 True
2117 >>> L[0][0] is L
2118 True
2119 >>> T[0][0] is T
2120 True
2121 >>> dis(pickle.dumps(L, 0))
2122     0: (    MARK
2123     1: l        LIST       (MARK at 0)
2124     2: p    PUT        0
2125     5: (    MARK
2126     6: g        GET        0
2127     9: t        TUPLE      (MARK at 5)
2128    10: p    PUT        1
2129    13: a    APPEND
2130    14: .    STOP
2131 highest protocol among opcodes = 0
2132 
2133 >>> dis(pickle.dumps(L, 1))
2134     0: ]    EMPTY_LIST
2135     1: q    BINPUT     0
2136     3: (    MARK
2137     4: h        BINGET     0
2138     6: t        TUPLE      (MARK at 3)
2139     7: q    BINPUT     1
2140     9: a    APPEND
2141    10: .    STOP
2142 highest protocol among opcodes = 1
2143 
2144 Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
2145 has to emulate the stack in order to realize that the POP opcode at 16 gets
2146 rid of the MARK at 0.
2147 
2148 >>> dis(pickle.dumps(T, 0))
2149     0: (    MARK
2150     1: (        MARK
2151     2: l            LIST       (MARK at 1)
2152     3: p        PUT        0
2153     6: (        MARK
2154     7: g            GET        0
2155    10: t            TUPLE      (MARK at 6)
2156    11: p        PUT        1
2157    14: a        APPEND
2158    15: 0        POP
2159    16: 0        POP        (MARK at 0)
2160    17: g    GET        1
2161    20: .    STOP
2162 highest protocol among opcodes = 0
2163 
2164 >>> dis(pickle.dumps(T, 1))
2165     0: (    MARK
2166     1: ]        EMPTY_LIST
2167     2: q        BINPUT     0
2168     4: (        MARK
2169     5: h            BINGET     0
2170     7: t            TUPLE      (MARK at 4)
2171     8: q        BINPUT     1
2172    10: a        APPEND
2173    11: 1        POP_MARK   (MARK at 0)
2174    12: h    BINGET     1
2175    14: .    STOP
2176 highest protocol among opcodes = 1
2177 
2178 Try protocol 2.
2179 
2180 >>> dis(pickle.dumps(L, 2))
2181     0: \x80 PROTO      2
2182     2: ]    EMPTY_LIST
2183     3: q    BINPUT     0
2184     5: h    BINGET     0
2185     7: \x85 TUPLE1
2186     8: q    BINPUT     1
2187    10: a    APPEND
2188    11: .    STOP
2189 highest protocol among opcodes = 2
2190 
2191 >>> dis(pickle.dumps(T, 2))
2192     0: \x80 PROTO      2
2193     2: ]    EMPTY_LIST
2194     3: q    BINPUT     0
2195     5: h    BINGET     0
2196     7: \x85 TUPLE1
2197     8: q    BINPUT     1
2198    10: a    APPEND
2199    11: 0    POP
2200    12: h    BINGET     1
2201    14: .    STOP
2202 highest protocol among opcodes = 2
2203 """
2204 
2205 _memo_test = r"""
2206 >>> import pickle
2207 >>> from StringIO import StringIO
2208 >>> f = StringIO()
2209 >>> p = pickle.Pickler(f, 2)
2210 >>> x = [1, 2, 3]
2211 >>> p.dump(x)
2212 >>> p.dump(x)
2213 >>> f.seek(0)
2214 >>> memo = {}
2215 >>> dis(f, memo=memo)
2216     0: \x80 PROTO      2
2217     2: ]    EMPTY_LIST
2218     3: q    BINPUT     0
2219     5: (    MARK
2220     6: K        BININT1    1
2221     8: K        BININT1    2
2222    10: K        BININT1    3
2223    12: e        APPENDS    (MARK at 5)
2224    13: .    STOP
2225 highest protocol among opcodes = 2
2226 >>> dis(f, memo=memo)
2227    14: \x80 PROTO      2
2228    16: h    BINGET     0
2229    18: .    STOP
2230 highest protocol among opcodes = 2
2231 """
2232 
2233 __test__ = {'disassembler_test': _dis_test,
2234             'disassembler_memo_test': _memo_test,
2235            }
2236 
2237 def _test():
2238     import doctest
2239     return doctest.testmod()
2240 
2241 if __name__ == "__main__":
2242     _test()
2243
Generated by PyXR 0.9.4