概率 SVM,回归 [英] Probabilistic SVM, regression
问题描述
我目前已经为二进制类实现了概率(至少我是这么认为的).现在我想扩展这种回归方法,我正在尝试将它用于波士顿数据集.不幸的是,我的算法似乎卡住了,我当前运行的代码如下所示:
I've currently implemented a probabilistic (at least I think so) for binary classes. Now I want to extend this approach for regression, and I'm trying to use it for the Boston dataset. Unfortunately, it seems like my algorithm is stuck, the code I'm currently running is looking like this:
from sklearn import decomposition
from sklearn import svm
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
boston = load_boston()
X = boston.data
y = boston.target
inputs_train, inputs_test, targets_train, targets_test = train_test_split(X, y, test_size=0.33, random_state=42)
def plotting():
param_C = [0.01, 0.1]
param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
clf = GridSearchCV(svm.SVR(), cv = 5, param_grid= param_grid)
clf.fit(inputs_train, targets_train)
clf = SVR(C=clf.best_params_['C'], cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=5, gamma=clf.best_params_['gamma'],
kernel=clf.best_params_['kernel'],
max_iter=-1, probability=True, random_state=None, shrinking=True,
tol=0.001, verbose=False)
clf.fit(inputs_train, targets_train)
a = clf.predict(inputs_test[0])
print(a)
plotting()
有人可以告诉我,这种方法有什么问题,并不是我收到一些错误消息(我知道,我已经在上面禁止了它们),但是代码永远不会停止运行.非常感谢任何建议.
Can someone tell me, what is wrong in this approach, It's not the fact that I get some error message (I know, I've suppresed them above), but the code never stops running. Any suggestions is hugely appreciated.
推荐答案
您的代码存在多个问题.
There are several issues with your code.
首先,需要永远的是第一个
clf.fit
(即网格搜索),这就是为什么你没有看到当您在 secondclf.fit
中设置max_iter
和tol
时的任何更改.
To start with, what is taking forever is the first
clf.fit
(i.e. the grid search one), and that's why you didn't see any change when you setmax_iter
andtol
in your secondclf.fit
.
其次,clf=SVR()
部分将不起作用,因为:
Second, the clf=SVR()
part will not work, because:
- 必须导入,
SVR
无法识别 - 那里有一堆非法参数(
decision_function_shape
、probability
、random_state
等) - 检查文档以获取可接受的SVR
参数.立>
- You have to import it,
SVR
is not recognizable - You have a bunch of illegal arguments in there (
decision_function_shape
,probability
,random_state
etc) - check the docs for the admissibleSVR
arguments.
第三,你不需要用最好的参数再次明确拟合;你应该在你的 GridSearchCV
定义中简单地要求 refit=True
,然后使用 clf.best_estimator_
进行预测(评论后只需 clf.predict
也可以使用).
Third, you don't need to explicitly fit again with the best parameters; you should simply ask for refit=True
in your GridSearchCV
definition and subsequently use clf.best_estimator_
for your predictions (EDIT after comment: simply clf.predict
will also work).
因此,将这些内容移到任何函数定义之外,这是您的代码的工作版本:
So, moving the stuff outside of any function definition, here is a working version of your code:
from sklearn.svm import SVR
# other imports as-is
# data loading & splitting as-is
param_C = [0.01, 0.1]
param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
clf = GridSearchCV(SVR(degree=5, max_iter=10000), cv = 5, param_grid= param_grid, refit=True,)
clf.fit(inputs_train, targets_train)
a = clf.best_estimator_.predict(inputs_test[0])
# a = clf.predict(inputs_test[0]) will also work
print(a)
# [ 21.89849792]
除了 degree
之外,您正在使用的所有其他可接受的参数值实际上都是各自的默认值,因此您在 SVR
定义中真正需要的唯一参数是degree
和 max_iter
.
Apart from degree
, all the other admissible argument values you are are using are actually the respective default values, so the only arguments you really need in your SVR
definition are degree
and max_iter
.
您会收到几个警告(不是错误),即在拟合之后:
You'll get a couple of warnings (not errors), i.e. after fitting:
/databricks/python/lib/python3.5/site-packages/sklearn/svm/base.py:220:ConvergenceWarning:求解器提前终止 (max_iter=10000).考虑使用 StandardScaler 或 MinMaxScaler 预处理您的数据.
/databricks/python/lib/python3.5/site-packages/sklearn/svm/base.py:220: ConvergenceWarning: Solver terminated early (max_iter=10000). Consider pre-processing your data with StandardScaler or MinMaxScaler.
并在预测之后:
/databricks/python/lib/python3.5/site-packages/sklearn/utils/validation.py:395:弃用警告:在 0.17 中不推荐将一维数组作为数据传递并将在 0.19 中引发 ValueError.使用X.reshape(-1, 1) 如果您的数据具有单个特征或 X.reshape(1, -1)如果它包含单个样本.弃用警告)
/databricks/python/lib/python3.5/site-packages/sklearn/utils/validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)
其中已经包含了一些关于下一步做什么的建议......
which already contain some advice for what to do next...
最后但并非最不重要的一点:概率分类器(即产生概率的分类器硬标签)是有效的,但概率"回归模型不是......
Last but not least: a probabilistic classifier (i.e. one that produces probabilities instead of hard labels) is a valid thing, but a "probabilistic" regression model is not...
使用 Python 3.5 和 scikit-learn 0.18.1
Tested with Python 3.5 and scikit-learn 0.18.1
这篇关于概率 SVM,回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!