为什么泡菜比np.save花费更长的时间? [英] Why does pickle take so much longer than np.save?
问题描述
我要保存一个dict
或数组.
我同时尝试了np.save
和pickle
,发现前者总是花费更少的时间.
I try both with np.save
and with pickle
and see that the former always take much less time.
我的实际数据要大得多,但在这里我只展示一小段用于演示目的:
My actual data is much bigger but I just present a small piece here for demonstration purposes:
import numpy as np
#import numpy.array as array
import time
import pickle
b = {0: [np.array([0, 0, 0, 0])], 1: [np.array([1, 0, 0, 0]), np.array([0, 1, 0, 0]), np.array([0, 0, 1, 0]), np.array([0, 0, 0, 1]), np.array([-1, 0, 0, 0]), np.array([ 0, -1, 0, 0]), np.array([ 0, 0, -1, 0]), np.array([ 0, 0, 0, -1])], 2: [np.array([2, 0, 0, 0]), np.array([1, 1, 0, 0]), np.array([1, 0, 1, 0]), np.array([1, 0, 0, 1]), np.array([ 1, -1, 0, 0]), np.array([ 1, 0, -1, 0]), np.array([ 1, 0, 0, -1])], 3: [np.array([1, 0, 0, 0]), np.array([0, 1, 0, 0]), np.array([0, 0, 1, 0]), np.array([0, 0, 0, 1]), np.array([-1, 0, 0, 0]), np.array([ 0, -1, 0, 0]), np.array([ 0, 0, -1, 0]), np.array([ 0, 0, 0, -1])], 4: [np.array([2, 0, 0, 0]), np.array([1, 1, 0, 0]), np.array([1, 0, 1, 0]), np.array([1, 0, 0, 1]), np.array([ 1, -1, 0, 0]), np.array([ 1, 0, -1, 0]), np.array([ 1, 0, 0, -1])], 5: [np.array([0, 0, 0, 0])], 6: [np.array([1, 0, 0, 0]), np.array([0, 1, 0, 0]), np.array([0, 0, 1, 0]), np.array([0, 0, 0, 1]), np.array([-1, 0, 0, 0]), np.array([ 0, -1, 0, 0]), np.array([ 0, 0, -1, 0]), np.array([ 0, 0, 0, -1])], 2: [np.array([2, 0, 0, 0]), np.array([1, 1, 0, 0]), np.array([1, 0, 1, 0]), np.array([1, 0, 0, 1]), np.array([ 1, -1, 0, 0]), np.array([ 1, 0, -1, 0]), np.array([ 1, 0, 0, -1])], 7: [np.array([1, 0, 0, 0]), np.array([0, 1, 0, 0]), np.array([0, 0, 1, 0]), np.array([0, 0, 0, 1]), np.array([-1, 0, 0, 0]), np.array([ 0, -1, 0, 0]), np.array([ 0, 0, -1, 0]), np.array([ 0, 0, 0, -1])], 8: [np.array([2, 0, 0, 0]), np.array([1, 1, 0, 0]), np.array([1, 0, 1, 0]), np.array([1, 0, 0, 1]), np.array([ 1, -1, 0, 0]), np.array([ 1, 0, -1, 0]), np.array([ 1, 0, 0, -1])]}
start_time = time.time()
with open('testpickle', 'wb') as myfile:
pickle.dump(b, myfile)
print("--- Time to save with pickle: %s milliseconds ---" % (1000*time.time() - 1000*start_time))
start_time = time.time()
np.save('numpy', b)
print("--- Time to save with numpy: %s milliseconds ---" % (1000*time.time() - 1000*start_time))
start_time = time.time()
with open('testpickle', 'rb') as myfile:
g1 = pickle.load(myfile)
print("--- Time to load with pickle: %s milliseconds ---" % (1000*time.time() - 1000*start_time))
start_time = time.time()
g2 = np.load('numpy.npy')
print("--- Time to load with numpy: %s milliseconds ---" % (1000*time.time() - 1000*start_time))
给出输出:
--- Time to save with pickle: 4.0 milliseconds ---
--- Time to save with numpy: 1.0 milliseconds ---
--- Time to load with pickle: 2.0 milliseconds ---
--- Time to load with numpy: 1.0 milliseconds ---
与我的实际大小(字典中的100,000个键)相比,时差更加明显.
The time difference is even more pronounced with my actual size (~100,000 keys in the dict).
为什么在保存和加载时,泡菜要比np.save花费更长的时间?
Why does pickle take longer than np.save, both for saving and for loading?
我什么时候应该使用pickle
?
推荐答案
因为只要书面对象不包含Python数据,
Because as long as the written object contains no Python data,
- numpy对象在内存中的表示方式比Python对象简单得多
- numpy.save用C语言编写
- numpy.save以需要进行最少处理的超简单格式写入
同时
- Python对象有很多开销
- pickle用Python编写
- pickle将数据从内存中的基本表示转换为要写入磁盘的字节
请注意,如果一个numpy数组确实包含Python对象,那么numpy只会腌制该数组,所有的胜利都将出局.
Note that if a numpy array does contain Python objects, then numpy just pickles the array, and all the win goes out the window.
这篇关于为什么泡菜比np.save花费更长的时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!