joblib 与 pickle 的不同用例是什么? [英] What are the different use cases of joblib versus pickle?
问题描述
背景:我刚刚开始使用 scikit-learn,并在页面底部阅读关于 joblib 对比泡菜.
Background: I'm just getting started with scikit-learn, and read at the bottom of the page about joblib, versus pickle.
用joblib代替pickle可能更有意思(joblib.dump & joblib.load),在大数据上效率更高,但是只能pickle到磁盘,不能到字符串
it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big data, but can only pickle to the disk and not to a string
我在 Pickle 上阅读了这个问答,Python 中 pickle 的常见用例 不知道这里的社区是否可以分享joblib 和pickle 之间的区别?什么时候应该使用一个?
I read this Q&A on Pickle, Common use-cases for pickle in Python and wonder if the community here can share the differences between joblib and pickle? When should one use one over another?
推荐答案
- joblib 在大型 numpy 数组上通常要快得多,因为它对 numpy 数据结构的数组缓冲区进行了特殊处理.要了解实现细节,您可以查看源代码一>.它还可以在使用 zlib 或 lz4 进行酸洗时动态压缩该数据.
- joblib 还可以在加载时对未压缩的 joblib-pickled numpy 数组的数据缓冲区进行内存映射,从而可以在进程之间共享内存.
- 如果你不腌制大型的 numpy 数组,那么常规的 pickle 会明显更快,尤其是在小型 python 对象的大型集合上(例如 str 对象的大型字典),因为 pickle 模块标准库是用 C 实现的,而 joblib 是纯 Python 的.
- 由于 PEP 574(Pickle 协议 5)已合并到 Python 3.8 中,因此现在使用标准库对大型 numpy 数组进行 pickle 的效率要高得多(内存和 CPU 方面).在这种情况下,大型阵列意味着 4GB 或更多.
- 但是joblib 仍然可以在 Python 3.8 中使用
mmap_mode="r"
在内存映射模式下加载具有嵌套 numpy 数组的对象. - joblib is usually significantly faster on large numpy arrays because it has a special handling for the array buffers of the numpy datastructure. To find about the implementation details you can have a look at the source code. It can also compress that data on the fly while pickling using zlib or lz4.
- joblib also makes it possible to memory map the data buffer of an uncompressed joblib-pickled numpy array when loading it which makes it possible to share memory between processes.
- if you don't pickle large numpy arrays, then regular pickle can be significantly faster, especially on large collections of small python objects (e.g. a large dict of str objects) because the pickle module of the standard library is implemented in C while joblib is pure python.
- since PEP 574 (Pickle protocol 5) has been merged in Python 3.8, it is now much more efficient (memory-wise and cpu-wise) to pickle large numpy arrays using the standard library. Large arrays in this context means 4GB or more.
- But joblib can still be useful with Python 3.8 to load objects that have nested numpy arrays in memory mapped mode with
mmap_mode="r"
.
这篇关于joblib 与 pickle 的不同用例是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!