概率 SVM,回归 [英] Probabilistic SVM, regression

查看:168
本文介绍了概率 SVM,回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前已经为二进制类实现了概率(至少我是这么认为的).现在我想扩展这种回归方法,我正在尝试将它用于波士顿数据集.不幸的是,我的算法似乎卡住了,我当前运行的代码如下所示:

I've currently implemented a probabilistic (at least I think so) for binary classes. Now I want to extend this approach for regression, and I'm trying to use it for the Boston dataset. Unfortunately, it seems like my algorithm is stuck, the code I'm currently running is looking like this:

from sklearn import decomposition
from sklearn import svm
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston

boston = load_boston()

X = boston.data
y = boston.target
inputs_train, inputs_test, targets_train, targets_test = train_test_split(X, y, test_size=0.33, random_state=42)

def plotting():
    param_C = [0.01, 0.1]
    param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
    clf = GridSearchCV(svm.SVR(), cv = 5, param_grid= param_grid)
    clf.fit(inputs_train, targets_train)
    clf = SVR(C=clf.best_params_['C'], cache_size=200, class_weight=None, coef0=0.0,
              decision_function_shape='ovr', degree=5, gamma=clf.best_params_['gamma'],
              kernel=clf.best_params_['kernel'],
              max_iter=-1, probability=True, random_state=None, shrinking=True,
              tol=0.001, verbose=False)
    clf.fit(inputs_train, targets_train)
    a = clf.predict(inputs_test[0])
    print(a)


plotting()

有人可以告诉我,这种方法有什么问题,并不是我收到一些错误消息(我知道,我已经在上面禁止了它们),但是代码永远不会停止运行.非常感谢任何建议.

Can someone tell me, what is wrong in this approach, It's not the fact that I get some error message (I know, I've suppresed them above), but the code never stops running. Any suggestions is hugely appreciated.

推荐答案

您的代码存在多个问题.

There are several issues with your code.

  • 首先,需要永远的是第一个 clf.fit(即网格搜索),这就是为什么你没有看到当您在 second clf.fit 中设置 max_itertol 时的任何更改.

  • To start with, what is taking forever is the first clf.fit (i.e. the grid search one), and that's why you didn't see any change when you set max_iter and tol in your second clf.fit.

其次,clf=SVR() 部分将不起作用,因为:

Second, the clf=SVR() part will not work, because:

  • 必须导入,SVR无法识别
  • 那里有一堆非法参数(decision_function_shapeprobabilityrandom_state 等) - 检查文档以获取可接受的 SVR 参数.立>
  • You have to import it, SVR is not recognizable
  • You have a bunch of illegal arguments in there (decision_function_shape, probability, random_state etc) - check the docs for the admissible SVR arguments.

第三,你不需要用最好的参数再次明确拟合;你应该在你的 GridSearchCV 定义中简单地要求 refit=True ,然后使用 clf.best_estimator_ 进行预测(评论后只需 clf.predict 也可以使用).

Third, you don't need to explicitly fit again with the best parameters; you should simply ask for refit=True in your GridSearchCV definition and subsequently use clf.best_estimator_ for your predictions (EDIT after comment: simply clf.predict will also work).

因此,将这些内容移到任何函数定义之外,这是您的代码的工作版本:

So, moving the stuff outside of any function definition, here is a working version of your code:

from sklearn.svm import SVR
# other imports as-is

# data loading & splitting as-is

param_C = [0.01, 0.1]
param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
clf = GridSearchCV(SVR(degree=5, max_iter=10000), cv = 5, param_grid= param_grid, refit=True,)
clf.fit(inputs_train, targets_train)
a = clf.best_estimator_.predict(inputs_test[0])
# a = clf.predict(inputs_test[0]) will also work 
print(a)
# [ 21.89849792]

除了 degree 之外,您正在使用的所有其他可接受的参数值实际上都是各自的默认值,因此您在 SVR 定义中真正需要的唯一参数是degreemax_iter.

Apart from degree, all the other admissible argument values you are are using are actually the respective default values, so the only arguments you really need in your SVR definition are degree and max_iter.

您会收到几个警告(不是错误),即在拟合之后:

You'll get a couple of warnings (not errors), i.e. after fitting:

/databricks/python/lib/python3.5/site-packages/sklearn/svm/base.py:220:ConvergenceWarning:求解器提前终止 (max_iter=10000).考虑使用 StandardScaler 或 MinMaxScaler 预处理您的数据.

/databricks/python/lib/python3.5/site-packages/sklearn/svm/base.py:220: ConvergenceWarning: Solver terminated early (max_iter=10000). Consider pre-processing your data with StandardScaler or MinMaxScaler.

并在预测之后:

/databricks/python/lib/python3.5/site-packages/sklearn/utils/validation.py:395:弃用警告:在 0.17 中不推荐将一维数组作为数据传递并将在 0.19 中引发 ValueError.使用X.reshape(-1, 1) 如果您的数据具有单个特征或 X.reshape(1, -1)如果它包含单个样本.弃用警告)

/databricks/python/lib/python3.5/site-packages/sklearn/utils/validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)

其中已经包含了一些关于下一步做什么的建议......

which already contain some advice for what to do next...

最后但并非最不重要的一点:概率分类器(即产生概率的分类器硬标签)是有效的,但概率"回归模型不是......

Last but not least: a probabilistic classifier (i.e. one that produces probabilities instead of hard labels) is a valid thing, but a "probabilistic" regression model is not...

使用 Python 3.5 和 scikit-learn 0.18.1

Tested with Python 3.5 and scikit-learn 0.18.1

这篇关于概率 SVM,回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆