泡菜懒惰学习者 [英] Pickle Lazy Learners

查看:70
本文介绍了泡菜懒惰学习者的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Pickle是否为诸如KNeighboursClassifier形式的sci-kit之类的懒惰学习者保存培训数据?如果是这样,我们可以从泡菜对象访问此数据吗?(寻求数据隐私问题)*

Does Pickle save training data for lazy learners like KNeighboursClassifier form sci-kit ? If so, can we access this data from a pickle object ? (Asking for data privacy issues)*

例如:

knn.fit(Xtrain, Ytrain)

saved_model = pickle.dumps(knn)

knn_from_pickle = pickle.loads(saved_model)

#This function works after directly loading pickled object (saved_model) and gives correct and logical output
knn_from_pickle.predict(Xtest)

knn_from_pickle或save_model变量是否包含 Xtrain 数据?由于Knn是一个懒惰的学习者,需要进行距离计算,因此关于训练数据 Xtrain 的新数据到达时.当我打印 knn_from_pickle 时,只显示了传递给KNeighboursClassifier算法的超参数.

Does knn_from_pickle or saved_model variables contain Xtrain data? Since Knn is a lazy learner and requires distance calculations, when new data arrives with respect to training data Xtrain. When I printed knn_from_pickle I was just displayed hyperparameters passed to KNeighboursClassifier algorithm.

正如我观察到的那样,对于具有所有数据转换的65KB数据文件(Xtrain),并将全部数据用于训练,当knn模型被拟合并序列化时是这样的:

As I observed that for a 65KB file of data (Xtrain) with all data transformations and taking this entire data for training, when the knn model was fit and serialized like so:

saved_model = pickle.dumps(knn)
sys.getsizeof(saved_model) 

已占用空间为 238744字节

而其他算法(例如高斯朴素贝叶斯)的腌制对象所占用的空间为:

Whereas space occupied for pickled objects of other algorithms like Gaussian Naive Bayes was:

saved_model = pickle.dumps(gnb)
sys.getsizeof(saved_model)

占用的空间为 6074字节,对于像随机森林这样的繁重算法:

space occupied was 6074 bytes and for heavy algorithms like Random Forest:

saved_model = pickle.dumps(rf)
sys.getsizeof(saved_model)

已占用空间为 48863字节

看到KNN和其他算法的腌制对象之间存在很大的空间差异,腌制必须以某种方式存储KNN的训练数据.如果是,该如何访问?或者knn如何存储在泡菜中,如果否,那么如何使用无配合的预测并给出正确答案来解决未腌制的对象(knn_from_pickle)?

Seeing this much space difference between KNN and other algorithm's pickled objects, pickle must be storing training data somehow for KNN. If yes, how to access it? Or how is knn stored in pickle, if no then how is unpickled object (knn_from_pickle) using predict without fit and giving correct answer?

推荐答案

是的,数据保存在private属性中,但是由于python实际上并不尊重私有方法/属性,因此您需要在在公布拟合模型之前要注意.

Yes, the data is saved in a private attribute, but since python doesn't actually respect private methods/attributes, you'd need to keep privacy concerns in mind before publicizing the fitted model.

对于 KNeighborsClassifier ,该属性为 _fit_X (在撰写本文时;作为私有属性,开发人员不会考虑太多更改).

For KNeighborsClassifier, the attribute is _fit_X (at time of writing; being a private attribute, the developers wouldn't think much of changing that).

这篇关于泡菜懒惰学习者的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆