在LibSVM中反转比例值 [英] Reversing scaled values in LibSVM

查看:59
本文介绍了在LibSVM中反转比例值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在LibSVM中使用支持向量回归进行预测.我全力以赴.但是,我的脑海里一直浮现着一个问题.

I am using Support Vector Regression for forecasting in LibSVM. I work it all. However there's one question that sticks in my head.

对于LibSVM,我首先在相同的范围内扩展训练和测试集,然后选择最佳参数.在运行svm-train和svm-predict之后,我以缩放格式获得了测试集的预测值.我正在使用Excel并反转缩放比例并计算平均绝对百分比误差(MAPE).

For LibSVM, I firstly scale my training and testing set in the same range and then select the optimal parameters. After I run svm-train and svm-predict, I get the forecasted values for testing set in a scaled format. I am using Excel and reverse the scaling and calculate Mean Absolute Percentage Error (MAPE).

我非常确定LibSVM的缩放工作方式类似于从最小值中减去该值,然后除以特定功能的范围.但是,我想看看我手动缩放的值和LibSVM缩放的值是否相同.在将数据集分为两组之前,我先找到要素中的最小值和最大值,然后按照上述方法进行缩放.但是,LibSVM提供的用于训练和测试集的缩放比例值与我手工计算的比例值并不完全相同.他们只是差不多.有谁知道为什么他们不一样?

I a pretty sure that scaling in LibSVM works like that subtracting the value from the minimum and then dividing by the range for particular feature. However, I wanted to see whether the values I scaled by hand and the values scaled by LibSVM are the same. Before I divide the dataset into two sets, I find minimum and maximum of values in the feature and then do the scaling in the way I said above. However the scaling values for training and testing sets that LibSVM gives are not exactly the same with the ones I calculate by hand. They are just roughly close. Does anyone know why they are not the same?

另一个问题是:如何在LibSVM中计算MAPE?

Another question is that: How can I calculate MAPE in LibSVM?

推荐答案

如果检查文件 svm-scale.c ,您会发现缩放数据的公式为:

If you inspect the file svm-scale.c you will find that the formula that scales data is:

value = y_lower + (y_upper-y_lower) * (value - y_min)/(y_max-y_min);

y_lower y_uppery缩放限制的地方

因此您可以看到缩放的值无法计算出来,因为您假设从最小值中减去该值,然后除以特定特征的范围".如果要恢复实际价值,只需取消公式即可.

So as you can see the scaled value is not worked out as you were supposing "subtracting the value from the minimum and then dividing by the range for particular feature". If you want to recover the real value you only have to undo the formula.

示例:

如果以libSVM站点中可用的许多数据集为例,例如这样的一个:

If you take one the many datasets that are available in the libSVM site as examples, such as this one: covtype dataset, and you open it, you will see a file such this one:

1 1:2596 2:51 3:3 4:258 6:510 7:221 8:232 9:148 10:6279 11:1 43:1
1 1:2590 2:56 3:2 4:212 5:-6 6:390 7:220 8:235 9:151 10:6225 11:1 43:1
2 1:2804 2:139 3:9 4:268 5:65 6:3180 7:234 8:238 9:135 10:6121 11:1 26:1
2 1:2785 2:155 3:18 4:242 5:118 6:3090 7:238 8:238 9:122 10:6211 11:1 44:1
1 1:2595 2:45 3:2 4:153 5:-1 6:391 7:220 8:234 9:150 10:6172 11:1 43:1
...

现在让我们使用

./svm-scale -s covtype.libsvm.binary.range  covtype.libsvm.binary > covtype.libsvm.binary.scale

这将生成两个文件,.range文件将包含与缩放过程有关的所有信息(每列的最大值和最小值),以及作为输出的.scale文件,如下所示:

This will generate two files, the .range file will contain all the information related to the scale process (max and min per column), and the .scale file which is the output, that will look like:

1 1:-0.262631 2:-0.716667 3:-0.909091 4:-0.630637 5:-0.552972 6:-0.856681 7:0.740157 8:0.826772 9:0.165354 10:0.750732 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
1 1:-0.268634 2:-0.688889 3:-0.939394 4:-0.696492 5:-0.568475 6:-0.890403 7:0.732283 8:0.850394 9:0.188976 10:0.735675 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
2 1:-0.0545273 2:-0.227778 3:-0.727273 4:-0.616321 5:-0.385013 6:-0.106365 7:0.84252 8:0.874016 9:0.0629921 10:0.706678 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:-1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
2 1:-0.0735368 2:-0.138889 3:-0.454545 4:-0.653543 5:-0.248062 6:-0.131657 7:0.874016 8:0.874016 9:-0.0393701 10:0.731772 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:-1 44:1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
1 1:-0.263632 2:-0.75 3:-0.939394 4:-0.780959 5:-0.555556 6:-0.890122 7:0.732283 8:0.84252 9:0.181102 10:0.720898 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1 
...

.range文件如下所示:

x
-1 1
1 1859 3858
2 0 360
3 0 66
4 0 1397
...

因此,考虑到y_lower = -1y_upper = 1,您可以验证第一个元素2596的转换:

So taking into account that y_lower = -1 and y_upper = 1 you can verify for the first element 2596 the conversion:

value = -1 + (1 - (-1)) * (2596 - 1859) / (3858 - 1859) = -0.26263131565782893

哪个是期望值:)

提示:

通常,您使用svm-scale缩放训练集,获取模型(使用k倍交叉验证),最后使用从训练中获得的值(y_maxy_min)执行测试缩放数据.您可以在文件tools/easy.py中看到该过程.

Normally you scale your training set with svm-scale, get your model (using k-fold cross validation) and finally performing testing scaling data with the values (y_max and y_min) obtained from training. You can see the process in the file tools/easy.py.

这篇关于在LibSVM中反转比例值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆