为什么pickle.dump(obj) 与sys.getsizeof(obj) 有不同的大小?如何将变量保存到文件文件? [英] Why pickle.dump(obj) has different size with sys.getsizeof(obj)? How to save variable to file file?

查看:55
本文介绍了为什么pickle.dump(obj) 与sys.getsizeof(obj) 有不同的大小?如何将变量保存到文件文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用python的scikit lib中的随机森林分类器来做我的练习.结果每次运行时都会改变.所以我运行了 1000 次并得到了平均结果.

I use classifier of random forest from scikit lib of python to do my exercise. The result changes each running time. So I run 1000 times and get the average result.

我将对象 rf 保存到文件中,以便稍后通过 pickle.dump() 进行预测,每个文件大约有 4MB.但是, sys.getsizeof(rf) 只给我 36 个字节

I save object rf into files to predict later by pickle.dump() and get about 4MB each file. However, sys.getsizeof(rf) give me just 36 bytes

rf = RandomForestClassifier(n_estimators = 50)
rf.fit(matX, vecY)
pickle.dump(rf,'var.sav')

我的问题:

  • sys.getsizeof() 在获取 RandomForestClassifier 对象的大小时似乎是错误的,不是吗?为什么?
  • 如何将对象保存在 zip 文件中以使其尺寸更小?

推荐答案

getsizeof() 给你只是对象的内存占用,而不是任何其他值被那个对象引用.您还需要对对象进行递归以找到所有属性的总大小,以及这些属性包含的任何内容等.

getsizeof() gives you the memory footprint of just the object, and not of any other values referenced by that object. You'd need to recurse over the object to find the total size of all attributes too, and anything those attributes hold, etc.

Pickle 是一种序列化格式.序列化需要存储元数据以及对象的内容.内存大小和pickle大小只有粗略的相关性.

Pickling is a serialization format. Serialization needs to store metadata as well as the contents of the object. Memory size and pickle size only have a rough correlation.

Pickles 是字节流,如果需要更紧凑的字节流,请使用压缩.

Pickles are byte streams, if you need to have a more compact bytestream, use compression.

如果您将泡菜存储在 ZIP 文件中,您的数据将已经被压缩;在这种情况下,在将 pickle 存储到 ZIP 之前对其进行压缩将无济于事,因为由于元数据开销和典型压缩数据中缺少重复数据,已经压缩的数据在进行额外的 ZIP 压缩后可能会变大.

If you are storing your pickles in a ZIP file, your data will already be compressed; compressing the pickle before storing it in the ZIP will not help in that case as already compressed data runs the risk to become bigger after additional ZIP compression instead due to metadata overhead and lack of duplicate data in typical compressed data.

这篇关于为什么pickle.dump(obj) 与sys.getsizeof(obj) 有不同的大小?如何将变量保存到文件文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆