每个项目在svmLight格式中意味着什么 [英] What does each item mean in svmLight Format

查看:470
本文介绍了每个项目在svmLight格式中意味着什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于每个部分在svmLight数据格式中的含义我都非常困惑。
例如:
$ b

(label / target,[(feature,value),...],queryid)

标签是指数据的等级,而queryid是对象的标识符?



例如:
用于以下项目:

2 qid:1 1:4.000000 2:2.772589 3:0.266667 4:0.258154 5:37.330565 6:11.431241 7:37.307017 8:1.213630 9:21.342267 10:10.842279 11:15.634736 12:2.749495 13:-39.467448 14:-37.791635 15:-38.002289 16:14.000000 17:5.634790 18:0.063927 19:0.063290 20:28.303065 21:9.340024 22:24.809801 23:0.231553 24:52.396216 25:1.692954 26: 16.619600 27:2.810583 28:-45.733775 29:-44.612550 30:-44.823263 31:18.000000 32:6.579251 33:0.076923 34:0.076079 35:27.701632 36:9.139690 37:23.819476 38:0.277200 39:67.283604 40:1.847508 41:19.559974 42 :2.973485 43:-44.687666 44:-43.467574 45:-43.302044 #docid = 346319

<2>是否意味着对象的等级/目标值?那么qid或docid对文件意味着什么?



谢谢!

解决方案 qid:1 部分用于约束这些对象之间的成对差异。 docid ,或者说最后的之后的所有内容都是一个信息字符串


可用于向内核传递附加信息(例如非特征向量数据)

)。



每个对象的一般格式在官方的源文件中给出,标题为如何使用:

 < line> 。=。 <目标> <特征>:其中值GT; <特征>:其中值GT; ...<特征>:< value> #< info> 
< target> 。=。 +1 | -1 | 0 | <浮动>
< feature> 。=。 <整数> | qid
< value> 。=。 <浮动>
< info> 。=。 <串GT;

请注意,您指定的格式

(label / target,[(feature,value),...],queryid)

pysvmlight ,一个绑定到SVM-Light支持向量的Python我之前引用的Thorsten Joachims制作的机器库。您需要编写一个解析器来将svmlight本地的数据文件解析为pysvmlight使用的格式。在StackOverflow中至少有一个示例,即使它没有考虑 qid ,但在读取解析器的代码时添加它不应该太难。


I am very confused about what each part means in a svmLight data format. For example:

(label/target, [(feature, value), ...], queryid)

Does the label means the rank of the data and queryid is the id of the object?

For example: for the following item:

2 qid:1 1:4.000000 2:2.772589 3:0.266667 4:0.258154 5:37.330565 6:11.431241 7:37.307017 8:1.213630 9:21.342267 10:10.842279 11:15.634736 12:2.749495 13:-39.467448 14:-37.791635 15:-38.002289 16:14.000000 17:5.634790 18:0.063927 19:0.063290 20:28.303065 21:9.340024 22:24.809801 23:0.231553 24:52.396216 25:1.692954 26:16.619600 27:2.810583 28:-45.733775 29:-44.612550 30:-44.823263 31:18.000000 32:6.579251 33:0.076923 34:0.076079 35:27.701632 36:9.139690 37:23.819476 38:0.277200 39:67.283604 40:1.847508 41:19.559974 42:2.973485 43:-44.687666 44:-43.467574 45:-43.302044 #docid = 346319

Does 2 means the rank/ the target value of the object? Then what does qid or docid means for the file?

Thank you!

解决方案

The leading number is indeed the "target" of this object. The qid:1 part is used in constraining pairwise difference between such objects. The docid, or rather everything after the final # is an info string that

can be used to pass additional information to the kernel (e.g. non feature vector data)

(source).

The general format for each object is given in the official source, under the heading "How to use":

<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string> 

Note that the format you specify

(label/target, [(feature, value), ...], queryid)

is that of pysvmlight, a Python binding to the SVM-Light support vector machine library made by Thorsten Joachims, which I quoted earlier. You'll need to write a parser to parse the datafiles native to svmlight into the format that pysvmlight uses. There is at least one example on StackOverflow, even though it does not take into account the qid, but it shouldn't be too difficult to add when you read that parser's code.

这篇关于每个项目在svmLight格式中意味着什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆