使用插入符号库预测 GBM 的概率 [英] Predicting Probabilities for GBM with caret library
问题描述
问了一个类似的问题,但答案中的链接指向随机森林示例,它在我的情况下似乎不起作用.
A similar question was asked however the link in the answer points to random forest example, it doesn't seem to work in my case.
这是我正在尝试做的一个例子:
Here is an example what I'm trying to do:
gbmGrid <- expand.grid(interaction.depth = c(5, 9),
n.trees = (1:3)*200,
shrinkage = c(0.05, 0.1))
fitControl <- trainControl(
method = "cv",
number = 3,
classProbs = TRUE)
gbmFit <- train(strong~.-Id-PlayerName, data = train[1:10000,],
method = "gbm",
trControl = fitControl,
verbose = TRUE,
tuneGrid = gbmGrid)
gbmFit
一切顺利,我得到了最好的参数.现在,如果我进行预测:
Everything goes fine, I get the best parameters. Now if I do the prediction:
predictStrong = predict(gbmFit, newdata=train[11000:50000,])
我得到了一个二元预测向量,这很好:
I get a binary vector of predictions, which is good:
[1] 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 ...
但是,当我尝试获取概率时,出现错误:
However when I try to get probabilities, I get an error:
predictStrong = predict(gbmFit, newdata=train[11000:50000,], type="prob")
Error in `[.data.frame`(out, , obsLevels, drop = FALSE) :
undefined columns selected
问题出在哪里?
附加信息:
traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(out, , obsLevels, drop = FALSE)
3: out[, obsLevels, drop = FALSE]
2: predict.train(gbmFit, newdata = train[11000:50000, ], type = "prob")
1: predict(gbmFit, newdata = train[11000:50000, ], type = "prob")
版本:
R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
caret version: 6.0-29
我看过这个话题 以及我没有收到关于变量名的错误,尽管我有几个带下划线的变量名,我认为它是有效的,因为我使用 make.names
并得到与原版同名.
I've seen this topic as well and I don't get an error about variable names, although I have couple of variable names with underscores, which I assume it's valid, as I use make.names
and get the same names as the original.
colnames(train) == make.names(colnames(train))
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
推荐答案
当请求类概率时,train
将它们放入一个数据框中,每个类都有一个列.如果因子水平不是有效的变量名称,它们会自动更改(例如 "0"
变为 "X0"
).train
在这种情况下发出警告,类似于至少一个类级别不是有效的 R 变量名称.如果生成类概率,这可能会导致错误."
When class probabilities are requested, train
puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0"
becomes "X0"
). train
issues a warning in this case that goes something like "At least one of the class levels are not valid R variables names. This may cause errors if class probabilities are generated."
这篇关于使用插入符号库预测 GBM 的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!