我的行在 Kaggle 的 SVM 脚本代码中不匹配 [英] My rows are mismatched in my SVM scripting code for Kaggle
问题描述
我正在检查用于 Kaggle Titanic 数据的 SVM 的 e1071
代码.最后我知道,这部分工作正常,但现在我遇到了一个相当奇怪的错误.当我尝试构建我的 data.frame 以便我可以提交给 kaggle 时,我的预测似乎是我的训练集而不是测试集的大小.
I am reviewing my e1071
code for SVM for the Kaggle Titanic data. Last I knew, this part of it was working, but now I'm getting a rather strange error. When I try to build my data.frame so I can submit to kaggle, it seems my prediction is the size of my training set instead of the test set.
问题
数据框架中的错误(PassengerId = test$passengerid, Survived = Prediction):参数意味着不同的行数:418、714
Error in data.frame(PassengerId = test$passengerid, Survived = prediction) : arguments imply differing number of rows: 418, 714
显然,它们都应该是 418,我不明白出了什么问题?
Obviously, they should both be 418 and I do not understand what is going wrong?
详情
这是我的脚本:
setwd("Path\\To\Data")
train <- read.csv("train.csv")
test <- read.csv("test.csv")
library("e1071")
bestModel = svm(Survived ~ Pclass + Sex + Age + Sex * Pclass, data = train, kernel = "linear", cost = 1)
prediction <- predict(bestModel, newData=test, type="response")
prediction[prediction >= 0.5] <- 1
prediction[prediction != 1] <- 0
prediction[is.na(prediction)] <- 0
这是给我错误的行:
predictionSubmit <- data.frame(PassengerId = test$passengerid, Survived = prediction)
尝试
我使用 names(train)
和 names(test)
来验证我的列变量名称是否相同.您可以在此处找到数据.我知道我的预测代码可以优化为一行,但这不是这里的问题.我会很感激在这个问题上的第二双眼睛.我正在考虑使用 kernlab
库,但想知道这里是否存在我忽略的语法糖问题.感谢您的建议和线索.
I have used names(train)
and names(test)
to verify my column variable names are the same. You can find the data here. I know my prediction code can be optimized into one line, but that isn't the issue here. I would appreciate a second pair of eyes on this issue. I am thinking about using the kernlab
library, but was wondering if there was a syntatical sugar issue I was neglecting here. Thanks for your suggestions and clues.
推荐答案
#10 items in training set
y <- sample(0:1, 10, T)
x <- rnorm(10)
bestModel <- svm(y~x,kernel = "linear", cost = 1)
#Six in test set
prediction <- predict(bestModel, newdata=rnorm(6), type="response")
#Output has 10 values (unexpected)
prediction
# 1 2 3 4 5 6 <NA> <NA>
# 0.05163974 0.58048905 0.49524846 0.13524885 0.12592718 0.06082822 0.55393256 1.08488424
# <NA> <NA>
# 0.94836026 0.47679646
#For correct output, remove names with <NA>
prediction[na.omit(names(prediction))]
# 1 2 3 4 5 6
#0.05163974 0.58048905 0.49524846 0.13524885 0.12592718 0.06082822
这篇关于我的行在 Kaggle 的 SVM 脚本代码中不匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!