目标的缩放导致 Scikit-learn SVM 回归崩溃 [英] Scaling of target causes Scikit-learn SVM regression to break down

查看:61
本文介绍了目标的缩放导致 Scikit-learn SVM 回归崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在训练 SVM 回归时,通常建议在训练前缩放输入特征.

When training a SVM regression it is usually advisable to scale the input features before training.

但是如何缩放目标呢?通常这不被认为是必要的,我看不出有什么必要这样做的充分理由.

But how about scaling of the targets? Usually this is not considered necessary, and I do not see a good reason why it should be necessary.

然而,在 SVM 回归的 scikit-learn 示例中:http://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html

However in the scikit-learn example for SVM regression from: http://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html

通过在训练前引入 y=y/1000 行,预测将分解为一个恒定值.在训练前缩放目标变量可以解决问题,但我不明白为什么需要这样做.

By just introducing the line y=y/1000 before training, the prediction will break down to a constant value. Scaling the target variable before training would solve the problem, but I do not understand why it is necessary.

是什么导致了这个问题?

What causes this problem?

import numpy as np
from sklearn.svm import SVR
import matplotlib.pyplot as plt

# Generate sample data
X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()

# Add noise to targets
y[::5] += 3 * (0.5 - np.random.rand(8))

# Added line: this will make the prediction break down
y=y/1000

# Fit regression model
svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)
svr_lin = SVR(kernel='linear', C=1e3)
svr_poly = SVR(kernel='poly', C=1e3, degree=2)
y_rbf = svr_rbf.fit(X, y).predict(X)
y_lin = svr_lin.fit(X, y).predict(X)
y_poly = svr_poly.fit(X, y).predict(X)

# look at the results
plt.scatter(X, y, c='k', label='data')
plt.hold('on')
plt.plot(X, y_rbf, c='g', label='RBF model')
plt.plot(X, y_lin, c='r', label='Linear model')
plt.plot(X, y_poly, c='b', label='Polynomial model')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()

推荐答案

支持向量回归使用的损失函数只有在预测值与目标之间的差异超过某个阈值时才会为正.低于阈值,预测被认为足够好",损失为零.当您缩小目标时,SVM 学习器可以返回一个平面模型,因为它不再招致任何损失.

Support vector regression uses a loss function that is only positive if the difference between the predicted value and the target exceeds some threshold. Below the threshold, the prediction is considered "good enough" and the loss is zero. When you scale down the targets, the SVM learner can get away with returning a flat model, because it no longer incurs any loss.

阈值参数在sklearn.svm.SVR中称为epsilon;对于较小的目标,将其设置为较低的值.此处解释了这背后的数学原理.

The threshold parameter is called epsilon in sklearn.svm.SVR; set it to a lower value for smaller targets. The math behind this is explained here.

这篇关于目标的缩放导致 Scikit-learn SVM 回归崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆