joblib和pickle有哪些不同的用例? [英] What are the different use cases of joblib versus pickle?

查看：421 发布时间：2020/5/27 20:16:14 python pickle scikit-learn

本文介绍了joblib和pickle有哪些不同的用例?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

背景:我刚刚开始使用scikit-learn，并在页面底部阅读有关 joblib与泡菜.

Background: I'm just getting started with scikit-learn, and read at the bottom of the page about joblib, versus pickle.

使用joblib替换的pickle(joblib.dump& joblib.load)可能更有趣，它在大数据上效率更高，但只能在磁盘上而不是在字符串上腌制

it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big data, but can only pickle to the disk and not to a string

我在Pickle上阅读了此问答，用Python腌制的常见用例，想知道这里的社区是否可以共享joblib和pickle之间的区别?一个人何时应该使用另一个?

I read this Q&A on Pickle, Common use-cases for pickle in Python and wonder if the community here can share the differences between joblib and pickle? When should one use one over another?

推荐答案

joblib通常在大型numpy数组上要快得多，因为它对numpy数据结构的数组缓冲区有特殊处理.要了解实施细节，可以查看源代码.它还可以在使用zlib或lz4进行酸洗时动态压缩这些数据.

joblib is usually significantly faster on large numpy arrays because it has a special handling for the array buffers of the numpy datastructure. To find about the implementation details you can have a look at the source code. It can also compress that data on the fly while pickling using zlib or lz4.

joblib还可以在加载时对未压缩的joblib腌制numpy数组的数据缓冲区进行内存映射，从而可以在进程之间共享内存.

joblib also makes it possible to memory map the data buffer of an uncompressed joblib-pickled numpy array when loading it which makes it possible to share memory between processes.

请注意，如果您不对大型的numpy数组进行腌制，则常规的腌制可能会显着提高，特别是在大型python小对象集合(例如，大型str对象的集合)上，因为实现了标准库的pickle模块在C中，而joblib是纯python.

Note that if you don't pickle large numpy arrays, then regular pickle can be significantly faster, especially on large collections of small python objects (e.g. a large dict of str objects) because the pickle module of the standard library is implemented in C while joblib is pure python.

请注意，由于PEP 574(Pickle协议5)已在Python 3.8中合并，因此使用标准库对大型numpy数组进行腌制现在效率更高(在内存方面和cpu方面).在这种情况下，大型阵列意味着4GB或更多.

Note that since PEP 574 (Pickle protocol 5) has been merged in Python 3.8, it is now much more efficient (memory-wise and cpu-wise) to pickle large numpy arrays using the standard library. Large arrays in this context means 4GB or more.

但是joblib在Python 3.8中仍然有用，可以使用mmap_mode="r"在内存映射模式下加载具有嵌套numpy数组的对象.

But joblib can still be useful with Python 3.8 to load objects that have nested numpy arrays in memory mapped mode with mmap_mode="r".

这篇关于joblib和pickle有哪些不同的用例?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

joblib和pickle有哪些不同的用例? [英] What are the different use cases of joblib versus pickle?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

joblib和pickle有哪些不同的用例? [英] What are the different use cases of joblib versus pickle?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭