插入符号上的R xgboost尝试执行分类而不是回归 [英] R xgboost on caret attempts to perform classification instead of regression
问题描述
每个人.
首先,数据示例在这里:
first, data sample is here:
> str(train)
'data.frame': 30226 obs. of 71 variables:
$ sal : int 2732 2732 2732 2328 2560 3584 5632 5632 3584 2150 ...
$ avg : num 2392 2474 2392 2561 2763 ...
$ med : num 2314 2346 2314 2535 2754 ...
$ jt_category_1 : int 1 1 1 1 1 1 1 1 1 1 ...
$ jt_category_2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ job_num_1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ job_num_2 : int 0 0 0 0 0 0 0 0 0 0 ...
and more 64 variables(type of all is int, 0 or 1 binary values)
列"sal"是标签,它是测试数据(原始数据的70%)
column "sal" is label and it's Test data (70% of raw data)
我在R中使用包"caret"进行回归,并选择方法"xgbTree". 我知道它适用于分类和回归.
I use package "caret" in R for regression, and choice method "xgbTree". I know it works for classification and regression.
问题是,我想回归...但是我不知道该怎么做
我执行完整代码,错误是
i execute the full code, the error is
Error: Metric RMSE not applicable for classification models
但我不尝试进行分类.我想回归.
but i'm not trying to do classification. i wanna do regression.
类型(火车功能的y)是int
,并且还检查了数据类型.
type of my label(y of train function) is int
and data type also checked.
那是错的吗? 它使插入符号将此培训识别为分类吗?
is that wrong? it makes caret recognize this training as classification?
> str(train$sal)
int [1:30226] 2732 2732 2732 2328 2560 3584 5632 5632 3584 2150 ...
> str(train_xg)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:181356] 0 1 2 3 4 5 6 7 8 9 ...
..@ p : int [1:71] 0 30226 60452 90504 90678 90709 90962 93875 95087 96190 ...
..@ Dim : int [1:2] 30226 70
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : chr [1:70] "avg" "med" "jt_category_1" "jt_category_2" ...
..@ x : num [1:181356] 2392 2474 2392 2561 2763 ...
..@ factors : list()
为什么会误认呢?
您知道如何使用xgboost和插入符号执行回归吗?
do u know how to perform regression with xgboost and caret?
先谢谢您,
完整代码在这里:
library(caret)
library(xgboost)
xgb_grid_1 = expand.grid(
nrounds = 1000,
max_depth = c(2, 4, 6, 8, 10),
eta=c(0.5, 0.1, 0.07),
gamma = 0.01,
colsample_bytree=0.5,
min_child_weight=1,
subsample=0.5
)
xgb_trcontrol_1 = trainControl(
method = "cv",
number = 5,
verboseIter = TRUE,
returnData = FALSE,
returnResamp = "all", # save losses across all models
classProbs = TRUE, # set to TRUE for AUC to be computed
summaryFunction = twoClassSummary,
allowParallel = TRUE
)
xgb_train_1 = train(
x = as.matrix(train[ , 2:71]),
y = as.matrix(train$sal),
trControl = xgb_trcontrol_1,
tuneGrid = xgb_grid_1,
method = "xgbTree"
)
更新(18.08.10)
当我删除trainControl
函数的两个参数(classProbs = TRUE, summaryFunction = twoClassSummary
)时,结果是相同的...:
when i delete two parameters (classProbs = TRUE, summaryFunction = twoClassSummary
) of trainControl
function, the result is the same...:
> xgb_grid_1 = expand.grid(
+ nrounds = 1000,
+ max_depth = c(2, 4, 6, 8, 10),
+ eta=c(0.5, 0.1, 0.07),
+ gamma = 0.01,
+ colsample_bytree=0.5,
+ min_child_weight=1,
+ subsample=0.5
+ )
>
> xgb_trcontrol_1 = trainControl(
+ method = "cv",
+ number = 5,
+ allowParallel = TRUE
+ )
>
> xgb_train_1 = train(
+ x = as.matrix(train[ , 2:71]),
+ y = as.matrix(train$sal),
+ trControl = xgb_trcontrol_1,
+ tuneGrid = xgb_grid_1,
+ method = "xgbTree"
+ )
Error: Metric RMSE not applicable for classification models
推荐答案
caret
认为您正在要求分类并不奇怪,因为您实际上是在trainControl
函数的以下两行中这样做的:>
It's not strange that caret
thinks you are asking for classification, because you are actually doing so in these 2 lines of your trainControl
function:
classProbs = TRUE,
summaryFunction = twoClassSummary
删除这两行(以使它们采用默认值-请参见函数文档),那应该没事.
Remove both these lines (so as they take their default values - see the function documentation), and you should be fine.
还请注意,AUC仅适用于分类问题.
Notice also that AUC is only applicable to classification problems.
更新(在注释后):似乎目标变量为整数会导致问题;使用
UPDATE (after comments): Seems that the target variable being integer causes the problem; convert it to double before running the model with
train$sal <- as.double(train$sal)
这篇关于插入符号上的R xgboost尝试执行分类而不是回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!