R 中的 SVM:“预测器必须是数字或有序的." [英] SVM in R: "Predictor must be numeric or ordered."

查看:82
本文介绍了R 中的 SVM:“预测器必须是数字或有序的."的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 新手,遇到了这个问题:我想比较两种预测技术(支持向量机和神经网络)将它们应用于某些数据,我想比较它们的性能.为此,我使用 ROC 曲线.该代码应该计算 ROC 曲线下的面积,但它不起作用.神经网络代码工作正常,但是当 SVM 部分执行时出现此错误:

I'm new to R and I've ran into this problem: I want to compare two prediction techniques (Support Vector Machines and Neural Networks) applying them to some data and I would like to compare their performance. To do this, I use ROC curves. The code is supposed to compute the area under the ROC curve but it is not working. Neural networks code works fine, but when SVM part executes there was this error:

> aucs <- auc((dtest$recid=="SI")*1, lr.pred)

> aucs <- auc((dtest$recid=="SI")*1, lr.pred)

roc.default(response, predictor, auc = TRUE, ...) 中的错误:预测变量必须是数字或有序的.

Error in roc.default(response, predictor, auc = TRUE, ...) : Predictor must be numeric or ordered.

> obj.roc <- roc((dtest$recid=="SI")*1, lr.pred )

> obj.roc <- roc((dtest$recid=="SI")*1, lr.pred )

roc.default((dtest$recid == "SI") * 1, lr.pred) 中的错误:预测变量必须是数字或有序的.

Error in roc.default((dtest$recid == "SI") * 1, lr.pred) : Predictor must be numeric or ordered.

这是我的代码.

library(stats)
library(pROC)
library(nnet)
library(e1071)
library(rpart)

data <- read.table("data.csv", header=T)

set.seed(1234)
ind    <- sample(2, nrow(data), replace=TRUE, prob=c(0.8, 0.2))
dtrain <- data[ind==1,]
dtest  <- data[ind==2,]

# Variables for storing comparison results #
bestAuc = 0
bestIdx = 0

# Support Vector Machines
lr.fit  <- svm(recid~., data=dtrain, cost=1000, gamma=1, probability=TRUE)
lr.pred <- predict(lr.fit, dtest, type="response")
aucs    <- auc((dtest$recid=="SI")*1, lr.pred)
obj.roc <- roc((dtest$recid=="SI")*1, lr.pred)

print("SVN (default)")
bestAuc = aucs # Initialize


# Neural networks
lr.fit  <- nnet(recid~., data=dtrain, size=4, maxit=500, decay=1, trace=FALSE)
lr.pred <- predict(lr.fit, dtest, type="raw")
aucs    <- auc((dtest$recid=="SI")*1, lr.pred)
obj.roc <- roc((dtest$recid=="SI")*1,  lr.pred )

if(aucs > bestAuc) {
  bestAuc <- aucs
  bestIdx <- 1
  print("Neural networks")
}

我一直在寻找信息,但似乎对我使用的方法知之甚少.我看到了一个名为 ROCR 的包,我认为它可能很有用,但我也遇到了性能函数错误.我对所有这些库有点迷茫,所以我试图坚持我的初始解决方案,没有任何改进.我该怎么办?

I've been looking for information but it seems that there is little about the methods I'm using. I saw a package called ROCR which I think could be useful but I also get errors with the performance function. I'm a little bit lost with all this libraries so I tried to stick with my initial solution with no improvements. What should I do?


该解决方案基于 Calimo 的想法.predict 的返回值没有以我想要的格式提供数据,所以我需要使用它:

The solution was based on the idea of Calimo. The return value of predict does not give the data in the format I wanted so I needed to use this:

lr.pred <- attr(lr.pred,"probabilities")[,c("SI")]

句子获取 ROC 曲线中将要分析的列.

The sentence gets the column that is going to be analyzed in the ROC curve.

推荐答案

正如错误消息所说,您需要 lr.pred 中的数值向量或有序因子.这里的问题是 predict(对于 svm)返回预测的类,这使得 ROC 练习几乎毫无用处.

As the error message say, you need a numeric vector or ordered factor in lr.pred. The problem here is that predict (for the svm) returns the predicted class, making the ROC exercise pretty much useless.

你需要的是获得一个内部分数,比如班级概率:

What you need is to get an internal score, like the class probabilities:

lr.pred <- predict(lr.fit, dtest, probability = TRUE)

(对于第一类或第二类,您必须选择获得的概率.另请注意,type = "response" 被忽略.)

(You will have to choose which probability to get, for the first or second class. Also note that type = "response" is ignored.)

这篇关于R 中的 SVM:“预测器必须是数字或有序的."的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆