joblib和pickle有哪些不同的用例? [英] What are the different use cases of joblib versus pickle?
问题描述
背景:我刚刚开始使用scikit-learn,并在页面底部阅读有关 joblib与泡菜.
Background: I'm just getting started with scikit-learn, and read at the bottom of the page about joblib, versus pickle.
使用joblib替换的pickle(joblib.dump& joblib.load)可能更有趣,它在大数据上效率更高,但只能在磁盘上而不是在字符串上腌制
it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big data, but can only pickle to the disk and not to a string
我在Pickle上阅读了此问答, 用Python腌制的常见用例,想知道这里的社区是否可以共享joblib和pickle之间的区别?一个人何时应该使用另一个?
I read this Q&A on Pickle, Common use-cases for pickle in Python and wonder if the community here can share the differences between joblib and pickle? When should one use one over another?
推荐答案
joblib通常在大型numpy数组上要快得多,因为它对numpy数据结构的数组缓冲区有特殊处理.要了解实施细节,可以查看源代码一个>.它还可以在使用zlib或lz4进行酸洗时动态压缩这些数据.
joblib is usually significantly faster on large numpy arrays because it has a special handling for the array buffers of the numpy datastructure. To find about the implementation details you can have a look at the source code. It can also compress that data on the fly while pickling using zlib or lz4.
joblib还可以在加载时对未压缩的joblib腌制numpy数组的数据缓冲区进行内存映射,从而可以在进程之间共享内存.
joblib also makes it possible to memory map the data buffer of an uncompressed joblib-pickled numpy array when loading it which makes it possible to share memory between processes.
请注意,如果您不对大型的numpy数组进行腌制,则常规的腌制可能会显着提高,特别是在大型python小对象集合(例如,大型str对象的集合)上,因为实现了标准库的pickle模块在C中,而joblib是纯python.
Note that if you don't pickle large numpy arrays, then regular pickle can be significantly faster, especially on large collections of small python objects (e.g. a large dict of str objects) because the pickle module of the standard library is implemented in C while joblib is pure python.
请注意,由于PEP 574(Pickle协议5)已在Python 3.8中合并,因此使用标准库对大型numpy数组进行腌制现在效率更高(在内存方面和cpu方面).在这种情况下,大型阵列意味着4GB或更多.
Note that since PEP 574 (Pickle protocol 5) has been merged in Python 3.8, it is now much more efficient (memory-wise and cpu-wise) to pickle large numpy arrays using the standard library. Large arrays in this context means 4GB or more.
但是joblib在Python 3.8中仍然有用,可以使用mmap_mode="r"
在内存映射模式下加载具有嵌套numpy数组的对象.
But joblib can still be useful with Python 3.8 to load objects that have nested numpy arrays in memory mapped mode with mmap_mode="r"
.
这篇关于joblib和pickle有哪些不同的用例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!