在sklearn中保存MinMaxScaler模型 [英] Save MinMaxScaler model in sklearn

查看:1326
本文介绍了在sklearn中保存MinMaxScaler模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在sklearn中使用MinMaxScaler模型来规范模型的功能.

training_set = np.random.rand(4,4)*10
training_set

       [[ 6.01144787,  0.59753007,  2.0014852 ,  3.45433657],
       [ 6.03041646,  5.15589559,  6.64992437,  2.63440202],
       [ 2.27733136,  9.29927394,  0.03718093,  7.7679183 ],
       [ 9.86934288,  7.59003904,  6.02363739,  2.78294206]]


scaler = MinMaxScaler()
scaler.fit(training_set)    
scaler.transform(training_set)


   [[ 0.49184811,  0.        ,  0.29704831,  0.15972182],
   [ 0.4943466 ,  0.52384506,  1.        ,  0.        ],
   [ 0.        ,  1.        ,  0.        ,  1.        ],
   [ 1.        ,  0.80357559,  0.9052909 ,  0.02893534]]

现在,我想使用相同的缩放器来标准化测试集:

   [[ 8.31263467,  7.99782295,  0.02031658,  9.43249727],
   [ 1.03761228,  9.53173021,  5.99539478,  4.81456067],
   [ 0.19715961,  5.97702519,  0.53347403,  5.58747666],
   [ 9.67505429,  2.76225253,  7.39944931,  8.46746594]]

但是我不想这样,所以始终将scaler.fit()与训练数据一起使用.有没有一种方法可以保存缩放器并稍后从其他文件中加载它?

解决方案

所以我实际上不是专家,而是经过一些研究和一些有用的

我认为链接也很有帮助.它讨论了创建持久性模型.您要尝试的是:

# could use: import pickle... however let's do something else
from sklearn.externals import joblib 

# this is more efficient than pickle for things like large NumPy arrays
# ... which sklearn models often have.   

# then just 'dump' your file
joblib.dump(clf, 'my_dope_model.pkl') 

在这里,您可以在其中了解有关sklearn外部组件的更多信息. /p>

让我知道这是否无济于事,或者我对您的模型不了解.

注意:不推荐使用sklearn.externals.joblib.安装并使用纯joblib代替

I'm using the MinMaxScaler model in sklearn to normalize the features of a model.

training_set = np.random.rand(4,4)*10
training_set

       [[ 6.01144787,  0.59753007,  2.0014852 ,  3.45433657],
       [ 6.03041646,  5.15589559,  6.64992437,  2.63440202],
       [ 2.27733136,  9.29927394,  0.03718093,  7.7679183 ],
       [ 9.86934288,  7.59003904,  6.02363739,  2.78294206]]


scaler = MinMaxScaler()
scaler.fit(training_set)    
scaler.transform(training_set)


   [[ 0.49184811,  0.        ,  0.29704831,  0.15972182],
   [ 0.4943466 ,  0.52384506,  1.        ,  0.        ],
   [ 0.        ,  1.        ,  0.        ,  1.        ],
   [ 1.        ,  0.80357559,  0.9052909 ,  0.02893534]]

Now I want to use the same scaler to normalize the test set:

   [[ 8.31263467,  7.99782295,  0.02031658,  9.43249727],
   [ 1.03761228,  9.53173021,  5.99539478,  4.81456067],
   [ 0.19715961,  5.97702519,  0.53347403,  5.58747666],
   [ 9.67505429,  2.76225253,  7.39944931,  8.46746594]]

But I don't want so use the scaler.fit() with the training data all the time. Is there a way to save the scaler and load it later from a different file?

解决方案

So I'm actually not an expert with this but from a bit of research and a few helpful links, I think pickle and sklearn.externals.joblib are going to be your friends here.

The package pickle lets you save models or "dump" models to a file.

I think this link is also helpful. It talks about creating a persistence model. Something that you're going to want to try is:

# could use: import pickle... however let's do something else
from sklearn.externals import joblib 

# this is more efficient than pickle for things like large NumPy arrays
# ... which sklearn models often have.   

# then just 'dump' your file
joblib.dump(clf, 'my_dope_model.pkl') 

Here is where you can learn more about the sklearn externals.

Let me know if that doesn't help or I'm not understanding something about your model.

Note: sklearn.externals.joblib is deprecated. Install and use the pure joblib instead

这篇关于在sklearn中保存MinMaxScaler模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆