LIBSVM原来我所有的训练矢量为支持向量,为什么? [英] LibSVM turns all my training vectors into support vectors, why?
问题描述
我试图使用 SVM 作为新闻文章分类。
I am trying to use SVM for News article classification.
我创建了一个包含(在文档中发现的唯一的话)的特征行的表。
我创建权重向量绘图使用这些功能。也就是说,如果文章有一句话是这样的特征矢量表的位置被标记为 1
或者部分 0
I created a table that contains the features (unique words found in the documents) as rows.
I created weight vectors mapping with these features. i.e if the article has a word that is part of the feature vector table that location is marked as 1
or else 0
.
例如: - 训练样本生成...
Ex:- Training sample generated...
1 1:1 2:1 3:1 4:1 5:1 6:1 7:1 8:1 9:1 10:1 11:1 12:1 13:1 14:1 15:1 16:1 17:1 18:1 19:1 20:1 21:1 22:1 23:1 24:1 25:1 26:1 27:1 28:1 29:1 30:1
1 1:1 2:1 3:1 4:1 5:1 6:1 7:1 8:1 9:1 10:1 11:1 12:1 13:1 14:1 15:1 16:1 17:1 18:1 19:1 20:1 21:1 22:1 23:1 24:1 25:1 26:1 27:1 28:1 29:1 30:1
由于这是第一个文档中的所有功能都present。
As this is the first document all the features are present.
我使用 1
, 0
作为类的标签。
I am using 1
, 0
as class labels.
我使用的 svm.Net 进行分类。
我给了 300
权重向量手动分类为训练数据和生成的模型正在采取所有向量作为支持向量,这是肯定的过度拟合。
I gave 300
weight vectors manually classified as training data and the model generated is taking all the vectors as support vectors, which is surely overfitting.
我的总的特点(独特字/行计数
的特征矢量数据库表)是 7610
。
My total features (unique words/row count
in feature vector DB table) is 7610
.
可能是什么原因呢?
由于这种过度装修我的项目,现在是pretty的糟糕。它是分类的每一篇文章可以作为一个积极的文章。
Because of this over fitting my project is now in pretty bad shape. It is classifying every article available as a positive article.
在 LIBSVM 二元分类是有类别标签上有任何限制?
In LibSVM binary classification is there any restriction on the class label?
我使用 0
, 1
而不是 1
和 +1
。那是一个问题吗?
I am using 0
, 1
instead of -1
and +1
. Is that a problem?
推荐答案
正如指出的那样,参数搜索是可能做任何其他事情之前是个好主意。
As pointed out, a parameter search is probably a good idea before doing anything else.
我也将调查提供给你不同的内核。你输入数据是二进制的,这一事实可能会出现问题的RBF内核(或可能使它的使用最优化的,相比另一个内核)。我不知道它的内核可能更适合,虽然。尝试使用线性核,而且主动寻找更多的建议/想法:)
I would also investigate the different kernels available to you. The fact that you input data is binary might be problematic for the RBF kernel (or might render it's usage sub-optimal, compared to another kernel). I have no idea which kernel could be better suited, though. Try a linear kernel, and look around for more suggestions/idea :)
有关详细信息,也许更好的答案,看看在stats.stackexchange.com。
For more information and perhaps better answers, look on stats.stackexchange.com.
这篇关于LIBSVM原来我所有的训练矢量为支持向量,为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!