C5.0 决策树 - c50 代码名为 exit,值为 1 [英] C5.0 decision tree - c50 code called exit with value 1
问题描述
我收到以下错误
c50 代码名为 exit,值为 1
我是根据 Kaggle 提供的泰坦尼克号数据进行的
I am doing this on the titanic data available from Kaggle
# Importing datasets
train <- read.csv("train.csv", sep=",")
# this is the structure
str(train)
输出:-
'data.frame': 891 obs. of 12 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Name : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
$ Embarked : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
然后我尝试使用 C5.0 dtree
Then I tried using C5.0 dtree
# Trying with C5.0 decision tree
library(C50)
#C5.0 models require a factor outcome otherwise error
train$Survived <- factor(train$Survived)
new_model <- C5.0(train[-2],train$Survived)
所以运行上面的行给了我这个错误
So running the above lines gives me this error
c50 code called exit with value 1
我无法弄清楚出了什么问题?我在不同的数据集上使用了类似的代码,并且运行良好.关于如何调试代码的任何想法?
I'm not able to figure out what's going wrong? I was using similar code on different dataset and it was working fine. Any ideas about how can I debug my code?
-谢谢
推荐答案
对于任何有兴趣的人,可以在这里找到数据:http://www.kaggle.com/c/titanic-gettingStarted/data.我认为您需要注册才能下载.
For anyone interested, the data can be found here: http://www.kaggle.com/c/titanic-gettingStarted/data. I think you need to be registered in order to download it.
关于你的问题,首先我认为你是想写
Regarding your problem, first of I think you meant to write
new_model <- C5.0(train[,-2],train$Survived)
接下来,注意Cabin
和Embarked
列的结构.这两个因素有一个空字符作为级别名称(检查 levels(train$Embarked)
).这是 C50
失败的地方.如果您修改您的数据使得
Next, notice the structure of the Cabin
and Embarked
Columns. These two factors have an empty character as a level name (check with levels(train$Embarked)
). This is the point where C50
falls over. If you modify your data such that
levels(train$Cabin)[1] = "missing"
levels(train$Embarked)[1] = "missing"
您的算法现在可以正常运行了.
your algorithm will now run without an error.
这篇关于C5.0 决策树 - c50 代码名为 exit,值为 1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!