使用插入符号库预测 GBM 的概率 [英] Predicting Probabilities for GBM with caret library

查看:44
本文介绍了使用插入符号库预测 GBM 的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问了一个类似的问题,但答案中的链接指向随机森林示例,它在我的情况下似乎不起作用.

A similar question was asked however the link in the answer points to random forest example, it doesn't seem to work in my case.

这是我正在尝试做的一个例子:

Here is an example what I'm trying to do:

gbmGrid <-  expand.grid(interaction.depth = c(5, 9),
                    n.trees = (1:3)*200,
                    shrinkage = c(0.05, 0.1))

fitControl <- trainControl(
                       method = "cv",
                       number = 3,
                       classProbs = TRUE)

gbmFit <- train(strong~.-Id-PlayerName, data = train[1:10000,],
             method = "gbm",
             trControl = fitControl,
             verbose = TRUE,
             tuneGrid = gbmGrid)
gbmFit

一切顺利,我得到了最好的参数.现在,如果我进行预测:

Everything goes fine, I get the best parameters. Now if I do the prediction:

predictStrong = predict(gbmFit, newdata=train[11000:50000,])

我得到了一个二元预测向量,这很好:

I get a binary vector of predictions, which is good:

[1] 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 ...

但是,当我尝试获取概率时,出现错误:

However when I try to get probabilities, I get an error:

predictStrong = predict(gbmFit, newdata=train[11000:50000,], type="prob")

Error in `[.data.frame`(out, , obsLevels, drop = FALSE) : 
undefined columns selected

问题出在哪里?

附加信息:

traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(out, , obsLevels, drop = FALSE)
3: out[, obsLevels, drop = FALSE]
2: predict.train(gbmFit, newdata = train[11000:50000, ], type = "prob")
1: predict(gbmFit, newdata = train[11000:50000, ], type = "prob")

版本:

R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

caret version: 6.0-29

我看过这个话题 以及我没有收到关于变量名的错误,尽管我有几个带下划线的变量名,我认为它是有效的,因为我使用 make.names 并得到与原版同名.

I've seen this topic as well and I don't get an error about variable names, although I have couple of variable names with underscores, which I assume it's valid, as I use make.names and get the same names as the original.

colnames(train) == make.names(colnames(train))
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

推荐答案

当请求类概率时,train 将它们放入一个数据框中,每个类都有一个列.如果因子水平不是有效的变量名称,它们会自动更改(例如 "0" 变为 "X0").train 在这种情况下发出警告,类似于至少一个类级别不是有效的 R 变量名称.如果生成类概率,这可能会导致错误."

When class probabilities are requested, train puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0" becomes "X0"). train issues a warning in this case that goes something like "At least one of the class levels are not valid R variables names. This may cause errors if class probabilities are generated."

这篇关于使用插入符号库预测 GBM 的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆