子类 sklearn LinearSVC 用作 sklearn GridSearchCV 的估计器 [英] Subclassing sklearn LinearSVC for use as estimator with sklearn GridSearchCV
问题描述
我正在尝试从 sklearn.svm.LinearSVC
创建一个子类,用作 sklearn.model_selection.GridSearchCV
的估算器.子类有一个额外的函数,在这个例子中它什么都不做.但是,当我运行它时,我最终遇到了一个我似乎无法调试的错误.如果您复制粘贴代码并运行,它应该重现以 ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
I am trying to create a subclass from sklearn.svm.LinearSVC
for use as an estimator for sklearn.model_selection.GridSearchCV
. The child class has an extra function which in this example doesn't do anything. However, when I run this I end up with an error which I just can't seem to debug. If you copy-paste the code and run, it should reproduce the full error which ends with ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
一旦我开始工作,我希望为方法 transform_this()
添加更多功能.
Once I get his working, I hope to add more functionality to the method transform_this()
.
谁能告诉我我哪里出错了?基于这个 我一开始以为是我的数据有问题.但是,由于我使用 sklearn 内置数据集对其进行了复制,因此情况似乎并非如此.另外,我相信我根据我对上一个问题的回答正确地对其进行了子类化 此处.此外,我了解到 GridSearchCV 似乎并没有以不同的方式初始化估计器(不知何故,它首先使用默认参数,正如我从 这篇文章)
Can someone please tell me where I have gone wrong? Based this I first thought it was due to some issues with my data. However, since I've reproduced it using the sklearn built-in dataset it seems not to be the case. Also, I believe I'm subclassing this properly based on the response I got for my previous question here. Also, I learnt that the GridSearchCV doesn't seem to initialise the estimator in a different way (somehow it first uses default arguments as I see from this post)
from sklearn.datasets import load_breast_cancer
from sklearn.svm import LinearSVC
from sklearn.model_selection import GridSearchCV
RANDOM_STATE = 123
class LinearSVCSub(LinearSVC):
def __init__(self, penalty='l2', loss='squared_hinge', additional_parameter1=1, additional_parameter2=100,
dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1,
class_weight=None, verbose=0, random_state=None, max_iter=1000):
super(LinearSVCSub, self).__init__(penalty=penalty, loss=loss, dual=dual, tol=tol,
C=C, multi_class=multi_class, fit_intercept=fit_intercept,
intercept_scaling=intercept_scaling, class_weight=class_weight,
verbose=verbose, random_state=random_state, max_iter=max_iter)
self.additional_parameter1 = additional_parameter1
self.additional_parameter2 = additional_parameter2
def fit(self, X, y, sample_weight=None):
X = self.transform_this(X)
super(LinearSVCSub, self).fit(X, y, sample_weight)
def predict(self, X):
X = self.transform_this(X)
super(LinearSVCSub, self).predict(X)
def score(self, X, y, sample_weight=None):
X = self.transform_this(X)
super(LinearSVCSub, self).score(X, y, sample_weight)
def decision_function(self, X):
X = self.transform_this(X)
super(LinearSVCSub, self).decision_function(X)
def transform_this(self, X):
return X
if __name__ == '__main__':
data = load_breast_cancer()
X, y = data.data, data.target
# Parameter tuning with custom LinearSVC
param_grid = {'C': [0.00001, 0.0001, 0.0005],
'dual': (True, False), 'random_state': [RANDOM_STATE],
'additional_parameter1': [0.90, 0.80, 0.60, 0.30],
'additional_parameter2': [20, 30]}
gs_model = GridSearchCV(estimator=LinearSVCSub(), verbose=1, param_grid=param_grid,
scoring='roc_auc', n_jobs=-1)
gs_model.fit(X, y)
推荐答案
你有几个问题:
- 定义的方法没有返回语句
- 您选择的数据集与
LinearSVC
不收敛
一旦你纠正了那些你就可以了:
As soon as you correct for those you're fine to go:
from sklearn.datasets import make_classification
from sklearn.svm import LinearSVC
from sklearn.model_selection import GridSearchCV
RANDOM_STATE = 123
class LinearSVCSub(LinearSVC):
def __init__(self, penalty='l2', loss='squared_hinge', additional_parameter1=1, additional_parameter2=100,
dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1,
class_weight=None, verbose=0, random_state=None, max_iter=100000):
super(LinearSVCSub, self).__init__(penalty=penalty, loss=loss, dual=dual, tol=tol,
C=C, multi_class=multi_class, fit_intercept=fit_intercept,
intercept_scaling=intercept_scaling, class_weight=class_weight,
verbose=verbose, random_state=random_state, max_iter=max_iter)
self.additional_parameter1 = additional_parameter1
self.additional_parameter2 = additional_parameter2
def fit(self, X, y, sample_weight=None):
X = self.transform_this(X)
super(LinearSVCSub, self).fit(X, y, sample_weight)
return self
def predict(self, X):
X = self.transform_this(X)
return super(LinearSVCSub, self).predict(X)
def score(self, X, y, sample_weight=None):
X = self.transform_this(X)
return super(LinearSVCSub, self).score(X, y, sample_weight)
def decision_function(self, X):
X = self.transform_this(X)
return super(LinearSVCSub, self).decision_function(X)
def transform_this(self, X):
return X
X, y = make_classification()
# Parameter tuning with custom LinearSVC
param_grid = {'C': [0.00001, 0.0001, 0.0005],
'dual': (True, False), 'random_state': [RANDOM_STATE],
'additional_parameter1': [0.90, 0.80, 0.60, 0.30],
'additional_parameter2': [20, 30]
}
gs_model = GridSearchCV(estimator=LinearSVCSub(), verbose=1, param_grid=param_grid,
scoring='roc_auc', n_jobs=1)
gs_model.fit(X, y)
Fitting 5 folds for each of 48 candidates, totalling 240 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 240 out of 240 | elapsed: 0.9s finished
GridSearchCV(estimator=LinearSVCSub(), n_jobs=1,
param_grid={'C': [1e-05, 0.0001, 0.0005],
'additional_parameter1': [0.9, 0.8, 0.6, 0.3],
'additional_parameter2': [20, 30],
'dual': (True, False), 'random_state': [123]},
scoring='roc_auc', verbose=1)
gs_model.predict(X)
array([0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,
1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,
1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,
0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0,
0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1])
这篇关于子类 sklearn LinearSVC 用作 sklearn GridSearchCV 的估计器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!