是否可以在 scikit-learn 中使用网格搜索来调整自定义内核的参数? [英] Is it possible to tune parameters with grid search for custom kernels in scikit-learn?
问题描述
我有一个自定义内核函数,我正在使用带有 SVC(kernel=my_kernel) 的 GridSearchCV 函数.
I have a custom kernel function, and I am using GridSearchCV function with SVC(kernel=my_kernel).
my_kernel 函数需要一个参数 k 来调整,所以我想知道是否可以配置 param_grid 选项来调整我的自定义内核函数的参数.
my_kernel function takes a parameter k to tune, so I was wondering whether it's possible to configure param_grid option to tune the parameter of my custom kernel function.
例如,可以按如下方式调整 RBF 内核的 gamma 参数.我可以为我的自定义内核提供 param_grid=dict(k=k_range) 类型的选项吗?
For example, it's possible to tune gamma parameter for RBF kernel as follows. Can I provide a param_grid=dict(k=k_range) kind of option for my custom kernel?
gamma_range = 10. ** np.arange(-5, 4)
param_grid = dict(gamma=gamma_range)
grid = GridSearchCV(SVC(), param_grid=param_grid, cv=StratifiedKFold(y=Y, k=5))
推荐答案
一种方法是使用 Pipeline
、SVC(kernel='precomputed')
和包装您的自定义内核函数作为 sklearn
估计器(BaseEstimator
和 TransformerMixin
的子类)).
One way to do this is using Pipeline
, SVC(kernel='precomputed')
and wrapping your custom kernel function as a sklearn
estimator (a subclass of BaseEstimator
and TransformerMixin
)).
例如,sklearn
包含一个 自定义核函数 chi2_kernel(X, Y=None, gamma=1.0)
,计算特征向量X
和的核矩阵是
.此函数采用参数 gamma
,最好使用交叉验证设置该参数.我们可以对这个函数的参数进行网格搜索,如下所示:
For example, sklearn
contains a custom kernel function chi2_kernel(X, Y=None, gamma=1.0)
, which computes the kernel matrix of feature vectors X
and Y
.
This function takes a parameter gamma
, which should preferably be set using cross-validation.
We can do grid search on the parameters of this function as follows:
from __future__ import print_function
from __future__ import division
import sys
import numpy as np
import sklearn
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.cross_validation import train_test_split
from sklearn.datasets import load_digits
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.metrics.pairwise import chi2_kernel
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
# Wrapper class for the custom kernel chi2_kernel
class Chi2Kernel(BaseEstimator,TransformerMixin):
def __init__(self, gamma=1.0):
super(Chi2Kernel,self).__init__()
self.gamma = gamma
def transform(self, X):
return chi2_kernel(X, self.X_train_, gamma=self.gamma)
def fit(self, X, y=None, **fit_params):
self.X_train_ = X
return self
def main():
print('python: {}'.format(sys.version))
print('numpy: {}'.format(np.__version__))
print('sklearn: {}'.format(sklearn.__version__))
np.random.seed(0)
# Get some data to evaluate
dataset = load_digits()
X = dataset.data
y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# Create a pipeline where our custom predefined kernel Chi2Kernel
# is run before SVC.
pipe = Pipeline([
('chi2', Chi2Kernel()),
('svm', SVC()),
])
# Set the parameter 'gamma' of our custom kernel by
# using the 'estimator__param' syntax.
cv_params = dict([
('chi2__gamma', 10.0**np.arange(-9,4)),
('svm__kernel', ['precomputed']),
('svm__C', 10.0**np.arange(-2,9)),
])
# Do grid search to get the best parameter value of 'gamma'.
model = GridSearchCV(pipe, cv_params, cv=5, verbose=1, n_jobs=-1)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc_test = accuracy_score(y_test, y_pred)
print("Test accuracy: {}".format(acc_test))
print("Best params:")
print(model.best_params_)
if __name__ == '__main__':
main()
输出:
python: 2.7.3 (default, Dec 18 2014, 19:10:20)
[GCC 4.6.3]
numpy: 1.8.0
sklearn: 0.16.1
Fitting 5 folds for each of 143 candidates, totalling 715 fits
[Parallel(n_jobs=-1)]: Done 1 jobs | elapsed: 0.4s
[Parallel(n_jobs=-1)]: Done 50 jobs | elapsed: 2.7s
[Parallel(n_jobs=-1)]: Done 200 jobs | elapsed: 9.8s
[Parallel(n_jobs=-1)]: Done 450 jobs | elapsed: 21.6s
[Parallel(n_jobs=-1)]: Done 701 out of 715 | elapsed: 34.8s remaining: 0.7s
[Parallel(n_jobs=-1)]: Done 715 out of 715 | elapsed: 35.4s finished
Test accuracy: 0.989898989899
Best params:
{'chi2__gamma': 0.01, 'svm__C': 10.0, 'svm__kernel': 'precomputed'}
在您的情况下,只需将 chi2_kernel
替换为计算内核矩阵的函数.
In your case, just replace chi2_kernel
with your function that computes the kernel matrix.
这篇关于是否可以在 scikit-learn 中使用网格搜索来调整自定义内核的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!