Python pickle协议选择? [英] Python pickle protocol choice?

查看:112
本文介绍了Python pickle协议选择?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python 2.7并尝试腌制对象.我想知道泡菜协议之间的真正区别是什么.

I an using python 2.7 and trying to pickle an object. I am wondering what the real difference is between the pickle protocols.

import numpy as np
import pickle

class Data(object):
  def __init__(self):
    self.a = np.zeros((100, 37000, 3), dtype=np.float32)

d = Data()
print("data size: ", d.a.nbytes / 1000000.0)
print("highest protocol: ", pickle.HIGHEST_PROTOCOL)
pickle.dump(d, open("noProt", "w"))
pickle.dump(d, open("prot0", "w"), protocol=0)
pickle.dump(d, open("prot1", "w"), protocol=1)
pickle.dump(d, open("prot2", "w"), protocol=2)


out >> data size:  44.4
out >> highest protocol:  2

然后我发现保存的文件在磁盘上的大小不同:

then I found that the saved files have different sizes on disk:

  • noProt:177.6MB
  • prot0:177.6MB
  • prot1:44.4MB
  • prot2:44.4MB
  • noProt: 177.6MB
  • prot0: 177.6MB
  • prot1: 44.4MB
  • prot2: 44.4MB

我知道prot0是人类可读的文本文件,所以我不想使用它. 我猜协议0是默认提供的.

I know that prot0 is a human readable text file, so I don't want to use it. I guess protocol 0 is the one given by default.

我想知道协议1和协议2有什么区别,为什么我应该选择一个或另一个?

I wonder what's the difference between protocols 1 and 2, is there a reason why I should chose one or another?

picklecPickle有什么更好的用法?

What's is the better to use, pickle or cPickle?

推荐答案

使用支持您希望支持读取数据的最低Python版本的最新协议.较新的协议版本支持新的语言功能并包括优化.

Use the latest protocol that supports the lowest Python version you want to support reading the data. Newer protocol versions support new language features and include optimisations.

pickle模块数据格式文档:

From the pickle module data format documentation:

目前有6种不同的协议可用于酸洗.使用的协议越高,读取生产的泡菜所需的Python版本越新.

There are currently 6 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

  • 协议版本0是原始的人类可读"协议,并且与Python的早期版本向后兼容.
  • 协议版本1是旧的二进制格式,也与Python的早期版本兼容.
  • 协议版本2是在Python 2.3中引入的.它提供了新式类.有关协议2带来的改进的信息,请参见 PEP 307 .
  • 协议版本3已在Python 3.0中添加.它具有对 bytes 对象的显式支持,并且不能被其删除Python2.x.这是Python 3.0–3.7中的默认协议.
  • 协议版本4已在Python 3.4中添加.它增加了对非常大的对象的支持,腌制更多种类的对象以及一些数据格式优化.它是从python 3.8开始的默认协议.有关协议4带来的改进的信息,请参考 PEP 3154 . >
  • 协议版本5已在Python 3.8中添加.它增加了对带外数据的支持和对带内数据的加速.有关协议5带来的改进的信息,请参考 PEP 574 .
  • Protocol version 0 is the original "human-readable" protocol and is backwards compatible with earlier versions of Python.
  • Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
  • Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.
  • Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.
  • Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5.

如果未指定协议,则使用协议0.如果将 protocol 指定为负值或HIGHEST_PROTOCOL,则将使用可用的最高协议版本.

If a protocol is not specified, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version available will be used.

因此,当您想支持使用Python 3.4或更高版本加载腌制的数据时,请选择协议4.如果仍需要支持Python 2.7,请选择协议2,尤其是 (如果您使用的是自定义类)衍生自object(新型类)(如今,任何现代代码都可以做到这一点).

So when you want to support loading the pickled data with Python 3.4 or newer, pick protocol 4. If you need to support Python 2.7 still, pick protocol 2, especially if you are using custom classes derived from object (new-style classes) (which any modern code does, these days).

但是,如果您要与其他Python版本交换腌制数据,或者需要保持与旧Python版本的向后兼容性,那么最简单的做法就是坚持使用最高协议版本,您可以放手:

However, if you are exchanging pickled data with other Python versions or otherwise need to maintain backwards compatibility with older Python versions, it's easiest to just stick with the highest protocol version you can lay your hands on:

with open("prot2", 'wb') as pfile:
    pickle.dump(d, pfile, protocol=pickle.HIGHEST_PROTOCOL)

pickle.HIGHEST_PROTOCOL将始终是当前Python版本的正确版本.因为这是二进制格式,所以请确保使用'wb'作为文件模式!

pickle.HIGHEST_PROTOCOL will always be the right version for the current Python version. Because this is a binary format, make sure to use 'wb' as the file mode!

Python 3不再区分cPicklepickle,在使用Python 3时始终使用pickle.它在内部使用了编译后的C扩展.

Python 3 no longer distinguishes between cPickle and pickle, always use pickle when using Python 3. It uses a compiled C extension under the hood.

如果您仍在使用Python 2,则cPicklepickle大多兼容,区别在于提供的API.对于大多数用例,只需坚持使用cPickle即可;它更快.再次引用文档:

If you are still using Python 2, then cPickle and pickle are mostly compatible, the differences lie in the API offered. For most use-cases, just stick with cPickle; it is faster. Quoting the documentation again:

首先,cPickle的速度比pickle快1000倍,因为前者是用C实现的.其次,在cPickle模块中,可调用项Pickler()Unpickler()是函数,而不是类.这意味着您不能使用它们来派生自定义的酸洗和不酸洗子类.大多数应用程序不需要此功能,应该从cPickle模块的性能大大提高中受益.

First, cPickle can be up to 1000 times faster than pickle because the former is implemented in C. Second, in the cPickle module the callables Pickler() and Unpickler() are functions, not classes. This means that you cannot use them to derive custom pickling and unpickling subclasses. Most applications have no need for this functionality and should benefit from the greatly improved performance of the cPickle module.

这篇关于Python pickle协议选择?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆