Weka中的一元课程文字分类? [英] unary class text classification in weka?

查看:174
本文介绍了Weka中的一元课程文字分类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个针对特定类别(例如癌症)的训练数据集(文本).我想在weka中为该课程训练一个SVM分类器.但是,当我尝试通过创建文件夹"cancer"并将所有这些训练文件放入该文件夹来执行此操作时,当我运行代码时,出现以下错误: weka.classifiers.functions.SMO:无法处理一元类!

我想做的是,如果分类器找到与癌症"相关的文档,它会正确说出类名,一旦我输入了非癌症文档,它应该说"unknown".

我应该怎么做才能获得这种行为?

解决方案

Weka中的SMO算法仅在两个类之间进行二进制分类.顺序最小优化是一种用于求解SVM的特定算法,在Weka中,这是该算法的基本实现.如果您有一些癌症的例子,而有些则不是,那么那将是二进制的,也许您没有正确标记它们.

但是,如果您使用的训练数据都是癌症的例子,并且希望它告诉您将来的例子是否适合该模式,那么您将尝试进行一类SVM,也就是离群值检测. /p>

Weka中的LibSVM 可以处理一类svm.与Weka SMO实施不同的是, LibSVM 是独立的程序,已被接口加入Weka,并整合了许多不同的SVM变体. Wekalist上的这篇文章解释了如何在Weka中为此使用LibSVM.

I have a training dataset (text) for a particular category (say Cancer). I want to train a SVM classifier for this class in weka. But when i try to do this by creating a folder 'cancer' and putting all those training files to that folder and when i run to code i get the following error: weka.classifiers.functions.SMO: Cannot handle unary class!

what I want to do is if the classifier finds a document related to 'cancer' it says the class name correctly and once i fed a non cancer document it should say something like 'unknown'.

What should I do to get this behavior?

解决方案

The SMO algorithm in Weka only does binary classification between two classes. Sequential Minimal Optimization is a specific algorithm for solving an SVM and in Weka this a basic implementation of this algorithm. If you have some examples that are cancer and some that are not, then that would be binary, perhaps you haven't labeled them correctly.

However, if you are using training data which is all examples of cancer and you want it to tell you whether a future example fits the pattern or not, then you are attempting to do one-class SVM, aka outlier detection.

LibSVM in Weka can handle one-class svm. Unlike the Weka SMO implementation, LibSVM is a standalone program which has been interfaced into Weka and incorporates many different variants of SVM. This post on the Wekalist explains how to use LibSVM for this in Weka.

这篇关于Weka中的一元课程文字分类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆