如何在 sklearn 中编写自定义估算器并对其使用交叉验证? [英] How to write a custom estimator in sklearn and use cross-validation on it?
问题描述
我想通过交叉验证检查新方法的预测误差.我想知道我是否可以将我的方法传递给 sklearn 的交叉验证函数,以及如何传递.
我想要sklearn.cross_validation(cv=10).mymethod
之类的东西.
我还需要知道如何定义 mymethod
如果它是一个函数以及哪个输入元素和哪个输出
例如,我们可以将 mymethod
视为最小二乘估计器的实现(当然不是 sklearn 中的那些).
我找到了这个教程 link 但我不太清楚.
在文档中,他们使用
<预><代码>>>>将 numpy 导入为 np>>>从 sklearn 导入 cross_validation>>>从 sklearn 导入数据集>>>从 sklearn 导入 svm>>>虹膜 = datasets.load_iris()>>>iris.data.shape, iris.target.shape((150, 4), (150,))>>>clf = svm.SVC(内核=线性",C=1)>>>分数 = cross_validation.cross_val_score(... clf, iris.data, iris.target, cv=5)...>>>分数但问题是他们使用的是由 sklearn 中内置的函数获得的估计器 clf
.我应该如何定义自己的估算器才能将其传递给 cross_validation.cross_val_score
函数?
例如,假设一个简单的估计器使用线性模型 $y=x\beta$,其中 beta 被估计为 X[1,:]+alpha,其中 alpha 是一个参数.我应该如何完成代码?
class my_estimator():定义适合(X,y):beta=X[1,:]+alpha #哪里可以将alpha传递给函数?返回测试版def scorer(estimator, X, y) #scorer 函数应该计算什么?返回 ?????
使用以下代码我收到一个错误:
class my_estimator():def fit(X, y, **kwargs):#alpha = kwargs['alpha']beta=X[1,:]#+alpha返回测试版
<小时><预><代码>>>>cv=cross_validation.cross_val_score(my_estimator,x,y,scoring="mean_squared_error")回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py",第1152行,在cross_val_score对于火车,在 cv 中测试)文件C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\externals\joblib\parallel.py",第516行,在__call__对于可迭代的函数、args、kwargs:文件C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py",第1152行,在<genexpr>对于火车,在 cv 中测试)文件C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\base.py",第43行,克隆% (repr(estimator), type(estimator)))类型错误:无法克隆对象 答案也在于 sklearn 的 文档. 您需要定义两件事: 一个实现 一个评分器函数,或可调用对象,可用于: 参考你的例子:首先, 或者更简单的解决方案:您可以传递一个字符串 另一种可能是使用 至于第二件事,您可以通过 fit_params 在阅读了所有错误消息后,这些消息提供了对缺少什么的清晰概念,这里是一个简单的例子: I would like to check the prediction error of a new method trough cross-validation.
I would like to know if I can pass my method to the cross-validation function of sklearn and in case how. I would like something like I need also to know how to define For example we can consider as I found this tutorial link but it is not very clear to me. In the documentation they use But the problem is that they are using as estimator So for example suppose a simple estimator that use a linear model $y=x\beta$ where beta is estimated as X[1,:]+alpha where alpha is a parameter. How should I complete the code? With the following code I received an error:
fit(X, y)
函数的估计器,X
是输入矩阵,y
是输出向量scorer(estimator, X, y)
并返回给定模型的分数scorer
不应该是估算器的一种方法,它是一个不同的概念.只需创建一个可调用的:def scorer(estimator, X, y)返回 ?????# 计算任何你想要的,这由你来定义# 给定的估计量是好"还是坏"是什么意思
'mean_squared_error'
或 'accuracy'
(完整列表可在 这部分文档) 到 cross_val_score
函数以使用预定义的评分器.make_scorer
工厂函数.dict
参数将参数传递给您的模型.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score" rel="noreferrer">cross_val_score
函数(如文档中所述).这些参数将传递给 fit
函数.class my_estimator():def fit(X, y, **kwargs):alpha = kwargs['alpha']beta=X[1,:]+alpha返回测试版
将 numpy 导入为 np从 sklearn.cross_validation 导入 cross_val_score类正则化回归器:def __init__(self, l = 0.01):self.l = ldef组合(自我,输入):return sum([i*w for (i,w) in zip([1] + inputs, self.weights)])定义预测(自我,X):返回 [self.combine(x) for x in X]定义分类(自我,输入):返回符号(self.predict(输入))def fit(self, X, y, **kwargs):self.l = kwargs['l']X = np.matrix(X)y = np.matrix(y)W = (X.transpose() * X).getI() * X.transpose() * yself.weights = [w[0] for w in W.tolist()]def get_params(self, deep = False):返回 {'l':self.l}X = np.matrix([[0, 0], [1, 0], [0, 1], [1, 1]])y = np.matrix([0, 1, 1, 0]).transpose()打印 cross_val_score(RegularizedRegressor(),X,是,fit_params={'l':0.1},评分 = 'mean_squared_error')
sklearn.cross_validation(cv=10).mymethod
.mymethod
should it be a function and which input element and which output mymethod
an implementation of the least square estimator (of course not the ones in sklearn) .>>> import numpy as np
>>> from sklearn import cross_validation
>>> from sklearn import datasets
>>> from sklearn import svm
>>> iris = datasets.load_iris()
>>> iris.data.shape, iris.target.shape
((150, 4), (150,))
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_validation.cross_val_score(
... clf, iris.data, iris.target, cv=5)
...
>>> scores
clf
that is obtained by a function built in sklearn. How should I define my own estimator in order that I can pass it to the cross_validation.cross_val_score
function?class my_estimator():
def fit(X,y):
beta=X[1,:]+alpha #where can I pass alpha to the function?
return beta
def scorer(estimator, X, y) #what should the scorer function compute?
return ?????
class my_estimator():
def fit(X, y, **kwargs):
#alpha = kwargs['alpha']
beta=X[1,:]#+alpha
return beta
>>> cv=cross_validation.cross_val_score(my_estimator,x,y,scoring="mean_squared_error")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in cross_val_score
for train, test in cv)
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\externals\joblib\parallel.py", line 516, in __call__
for function, args, kwargs in iterable:
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in <genexpr>
for train, test in cv)
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\base.py", line 43, in clone
% (repr(estimator), type(estimator)))
TypeError: Cannot clone object '<class __main__.my_estimator at 0x05ACACA8>' (type <type 'classobj'>): it does not seem to be a scikit-learn estimator a it does not implement a 'get_params' methods.
>>>
The answer also lies in sklearn's documentation.
You need to define two things:
an estimator that implements the
fit(X, y)
function,X
being the matrix with inputs andy
being the vector of outputsa scorer function, or callable object that can be used with:
scorer(estimator, X, y)
and returns the score of given model
Referring to your example: first of all, scorer
shouldn't be a method of the estimator, it's a different notion. Just create a callable:
def scorer(estimator, X, y)
return ????? # compute whatever you want, it's up to you to define
# what does it mean that the given estimator is "good" or "bad"
Or even a more simple solution: you can pass a string 'mean_squared_error'
or 'accuracy'
(full list available in this part of the documentation) to cross_val_score
function to use a predefined scorer.
Another possibility is to use make_scorer
factory function.
As for the second thing, you can pass parameters to your model through the fit_params
dict
parameter of the cross_val_score
function (as mentioned in the documentation). These parameters will be passed to the fit
function.
class my_estimator():
def fit(X, y, **kwargs):
alpha = kwargs['alpha']
beta=X[1,:]+alpha
return beta
After reading all the error messages, which provide quite clear idea of what's missing, here is a simple example:
import numpy as np
from sklearn.cross_validation import cross_val_score
class RegularizedRegressor:
def __init__(self, l = 0.01):
self.l = l
def combine(self, inputs):
return sum([i*w for (i,w) in zip([1] + inputs, self.weights)])
def predict(self, X):
return [self.combine(x) for x in X]
def classify(self, inputs):
return sign(self.predict(inputs))
def fit(self, X, y, **kwargs):
self.l = kwargs['l']
X = np.matrix(X)
y = np.matrix(y)
W = (X.transpose() * X).getI() * X.transpose() * y
self.weights = [w[0] for w in W.tolist()]
def get_params(self, deep = False):
return {'l':self.l}
X = np.matrix([[0, 0], [1, 0], [0, 1], [1, 1]])
y = np.matrix([0, 1, 1, 0]).transpose()
print cross_val_score(RegularizedRegressor(),
X,
y,
fit_params={'l':0.1},
scoring = 'mean_squared_error')
这篇关于如何在 sklearn 中编写自定义估算器并对其使用交叉验证?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!