在sklearn中保存MinMaxScaler模型 [英] Save MinMaxScaler model in sklearn
问题描述
我正在sklearn中使用MinMaxScaler
模型来规范模型的功能.
training_set = np.random.rand(4,4)*10
training_set
[[ 6.01144787, 0.59753007, 2.0014852 , 3.45433657],
[ 6.03041646, 5.15589559, 6.64992437, 2.63440202],
[ 2.27733136, 9.29927394, 0.03718093, 7.7679183 ],
[ 9.86934288, 7.59003904, 6.02363739, 2.78294206]]
scaler = MinMaxScaler()
scaler.fit(training_set)
scaler.transform(training_set)
[[ 0.49184811, 0. , 0.29704831, 0.15972182],
[ 0.4943466 , 0.52384506, 1. , 0. ],
[ 0. , 1. , 0. , 1. ],
[ 1. , 0.80357559, 0.9052909 , 0.02893534]]
现在,我想使用相同的缩放器来标准化测试集:
[[ 8.31263467, 7.99782295, 0.02031658, 9.43249727],
[ 1.03761228, 9.53173021, 5.99539478, 4.81456067],
[ 0.19715961, 5.97702519, 0.53347403, 5.58747666],
[ 9.67505429, 2.76225253, 7.39944931, 8.46746594]]
但是我不想这样,所以始终将scaler.fit()
与训练数据一起使用.有没有一种方法可以保存缩放器并稍后从其他文件中加载它?
所以我实际上不是专家,而是经过一些研究和一些有用的
我认为链接也很有帮助.它讨论了创建持久性模型.您要尝试的是: 在这里,您可以在其中了解有关sklearn外部组件的更多信息. /p>
让我知道这是否无济于事,或者我对您的模型不了解. 注意:不推荐使用 I'm using the Now I want to use the same scaler to normalize the test set: But I don't want so use the So I'm actually not an expert with this but from a bit of research and a few helpful links, I think The package I think this link is also helpful. It talks about creating a persistence model. Something that you're going to want to try is: Here is where you can learn more about the sklearn externals. Let me know if that doesn't help or I'm not understanding something about your model. Note: 这篇关于在sklearn中保存MinMaxScaler模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!# could use: import pickle... however let's do something else
from sklearn.externals import joblib
# this is more efficient than pickle for things like large NumPy arrays
# ... which sklearn models often have.
# then just 'dump' your file
joblib.dump(clf, 'my_dope_model.pkl')
sklearn.externals.joblib
.安装并使用纯joblib
代替MinMaxScaler
model in sklearn to normalize the features of a model.training_set = np.random.rand(4,4)*10
training_set
[[ 6.01144787, 0.59753007, 2.0014852 , 3.45433657],
[ 6.03041646, 5.15589559, 6.64992437, 2.63440202],
[ 2.27733136, 9.29927394, 0.03718093, 7.7679183 ],
[ 9.86934288, 7.59003904, 6.02363739, 2.78294206]]
scaler = MinMaxScaler()
scaler.fit(training_set)
scaler.transform(training_set)
[[ 0.49184811, 0. , 0.29704831, 0.15972182],
[ 0.4943466 , 0.52384506, 1. , 0. ],
[ 0. , 1. , 0. , 1. ],
[ 1. , 0.80357559, 0.9052909 , 0.02893534]]
[[ 8.31263467, 7.99782295, 0.02031658, 9.43249727],
[ 1.03761228, 9.53173021, 5.99539478, 4.81456067],
[ 0.19715961, 5.97702519, 0.53347403, 5.58747666],
[ 9.67505429, 2.76225253, 7.39944931, 8.46746594]]
scaler.fit()
with the training data all the time. Is there a way to save the scaler and load it later from a different file? pickle
and sklearn.externals.joblib
are going to be your friends here.pickle
lets you save models or "dump" models to a file. # could use: import pickle... however let's do something else
from sklearn.externals import joblib
# this is more efficient than pickle for things like large NumPy arrays
# ... which sklearn models often have.
# then just 'dump' your file
joblib.dump(clf, 'my_dope_model.pkl')
sklearn.externals.joblib
is deprecated. Install and use the pure joblib
instead