pandas 味精包装与泡菜 [英] Pandas msgpack vs pickle

查看:96
本文介绍了 pandas 味精包装与泡菜的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

msgpack应该是pickle的替代品.

根据有关msgpack的熊猫文档:

这是一种轻量级的便携式二进制格式,类似于二进制JSON, 具有很高的空间利用率,并且在 写作(序列化)和阅读(反序列化).

This is a lightweight portable binary format, similar to binary JSON, that is highly space efficient, and provides good performance both on the writing (serialization), and reading (deserialization).

但是,我发现它的性能似乎与泡菜并没有叠加.

I find, however, that its performance does not appear to stack up against pickle.

df = pd.DataFrame(np.random.randn(10000, 100))

>>> %timeit df.to_pickle('test.p')
10 loops, best of 3: 22.4 ms per loop

>>> %timeit df.to_msgpack('test.msg')
10 loops, best of 3: 36.4 ms per loop

>>> %timeit pd.read_pickle('test.p')
100 loops, best of 3: 10.5 ms per loop

>>> %timeit pd.read_msgpack('test.msg')
10 loops, best of 3: 24.6 ms per loop

问题:除了泡菜的潜在安全问题之外,msgpack相对于泡菜有什么好处? pickle仍然是序列化数据的首选方法,还是目前存在更好的替代方法?

Question: Asides from potential security issues with pickle, what are the benefits of msgpack over pickle? Is pickle still the preferred method of serializing data, or do better alternatives currently exist?

推荐答案

Pickle在以下方面更好:

  1. 数值数据或使用缓冲协议的任何数据(numpy数组)(尽管仅当您使用较新的protocol=时)
  2. Python特定的对象,例如类,函数等.(尽管在这里您应该查看cloudpickle)
  1. Numerical data or anything that uses the buffer protocol (numpy arrays) (though only if you use a somewhat recent protocol=)
  2. Python specific objects like classes, functions, etc.. (although here you should look at cloudpickle)

MsgPack在以下方面更胜一筹:

  1. 跨语言互操作.它是JSON的替代品,并做了一些改进
  2. 文本数据和Python对象的性能.在任何设置下,这都是比Pickle更快的体面因素.

如@Jeff上文所述,此博客文章可能属于兴趣

As @Jeff noted above this blogpost may be of interest

这篇关于 pandas 味精包装与泡菜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆