joblib和pickle有哪些不同的用例? [英] What are the different use cases of joblib versus pickle?

查看:421
本文介绍了joblib和pickle有哪些不同的用例?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:我刚刚开始使用scikit-learn,并在页面底部阅读有关 joblib与泡菜.

Background: I'm just getting started with scikit-learn, and read at the bottom of the page about joblib, versus pickle.

使用joblib替换的pickle(joblib.dump& joblib.load)可能更有趣,它在大数据上效率更高,但只能在磁盘上而不是在字符串上腌制

it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big data, but can only pickle to the disk and not to a string

我在Pickle上阅读了此问答, 用Python腌制的常见用例,想知道这里的社区是否可以共享joblib和pickle之间的区别?一个人何时应该使用另一个?

I read this Q&A on Pickle, Common use-cases for pickle in Python and wonder if the community here can share the differences between joblib and pickle? When should one use one over another?

推荐答案

joblib通常在大型numpy数组上要快得多,因为它对numpy数据结构的数组缓冲区有特殊处理.要了解实施细节,可以查看源代码.它还可以在使用zlib或lz4进行酸洗时动态压缩这些数据.

joblib is usually significantly faster on large numpy arrays because it has a special handling for the array buffers of the numpy datastructure. To find about the implementation details you can have a look at the source code. It can also compress that data on the fly while pickling using zlib or lz4.

joblib还可以在加载时对未压缩的joblib腌制numpy数组的数据缓冲区进行内存映射,从而可以在进程之间共享内存.

joblib also makes it possible to memory map the data buffer of an uncompressed joblib-pickled numpy array when loading it which makes it possible to share memory between processes.

请注意,如果您不对大型的numpy数组进行腌制,则常规的腌制可能会显着提高,特别是在大型python小对象集合(例如,大型str对象的集合)上,因为实现了标准库的pickle模块在C中,而joblib是纯python.

Note that if you don't pickle large numpy arrays, then regular pickle can be significantly faster, especially on large collections of small python objects (e.g. a large dict of str objects) because the pickle module of the standard library is implemented in C while joblib is pure python.

请注意,由于PEP 574(Pickle协议5)已在Python 3.8中合并,因此使用标准库对大型numpy数组进行腌制现在效率更高(在内存方面和cpu方面).在这种情况下,大型阵列意味着4GB或更多.

Note that since PEP 574 (Pickle protocol 5) has been merged in Python 3.8, it is now much more efficient (memory-wise and cpu-wise) to pickle large numpy arrays using the standard library. Large arrays in this context means 4GB or more.

但是joblib在Python 3.8中仍然有用,可以使用mmap_mode="r"在内存映射模式下加载具有嵌套numpy数组的对象.

But joblib can still be useful with Python 3.8 to load objects that have nested numpy arrays in memory mapped mode with mmap_mode="r".

这篇关于joblib和pickle有哪些不同的用例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆