插入符号上的R xgboost尝试执行分类而不是回归 [英] R xgboost on caret attempts to perform classification instead of regression

查看:167
本文介绍了插入符号上的R xgboost尝试执行分类而不是回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每个人.

首先,数据示例在这里:

first, data sample is here:

> str(train)
'data.frame':   30226 obs. of  71 variables:
 $ sal              : int  2732 2732 2732 2328 2560 3584 5632 5632 3584 2150 ...
 $ avg              : num  2392 2474 2392 2561 2763 ...
 $ med              : num  2314 2346 2314 2535 2754 ...
 $ jt_category_1    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ jt_category_2    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ job_num_1        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ job_num_2        : int  0 0 0 0 0 0 0 0 0 0 ...

and more 64 variables(type of all is int, 0 or 1 binary values) 

列"sal"是标签,它是测试数据(原始数据的70%)

column "sal" is label and it's Test data (70% of raw data)

我在R中使用包"caret"进行回归,并选择方法"xgbTree". 我知道它适用于分类和回归.

I use package "caret" in R for regression, and choice method "xgbTree". I know it works for classification and regression.

问题是,我想回归...但是我不知道该怎么做

我执行完整代码,错误是

i execute the full code, the error is

Error: Metric RMSE not applicable for classification models

但我不尝试进行分类.我想回归.

but i'm not trying to do classification. i wanna do regression.

类型(火车功能的y)是int,并且还检查了数据类型.

type of my label(y of train function) is int and data type also checked.

那是错的吗? 它使插入符号将此培训识别为分类吗?

is that wrong? it makes caret recognize this training as classification?

> str(train$sal)
 int [1:30226] 2732 2732 2732 2328 2560 3584 5632 5632 3584 2150 ...

> str(train_xg)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:181356] 0 1 2 3 4 5 6 7 8 9 ...
  ..@ p       : int [1:71] 0 30226 60452 90504 90678 90709 90962 93875 95087 96190 ...
  ..@ Dim     : int [1:2] 30226 70
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:70] "avg" "med" "jt_category_1" "jt_category_2" ...
  ..@ x       : num [1:181356] 2392 2474 2392 2561 2763 ...
  ..@ factors : list()

为什么会误认呢?

您知道如何使用xgboost和插入符号执行回归吗?

do u know how to perform regression with xgboost and caret?

先谢谢您,

完整代码在这里:

library(caret)
library(xgboost)

xgb_grid_1 = expand.grid(
  nrounds = 1000,
  max_depth = c(2, 4, 6, 8, 10),
  eta=c(0.5, 0.1, 0.07),
  gamma = 0.01,
  colsample_bytree=0.5,
  min_child_weight=1,
  subsample=0.5
)

xgb_trcontrol_1 = trainControl(
  method = "cv",
  number = 5,
  verboseIter = TRUE,
  returnData = FALSE,
  returnResamp = "all",                                                        # save losses across all models
  classProbs = TRUE,                                                           # set to TRUE for AUC to be computed
  summaryFunction = twoClassSummary,
  allowParallel = TRUE
)

    xgb_train_1 = train(
  x = as.matrix(train[ , 2:71]),
  y = as.matrix(train$sal),
  trControl = xgb_trcontrol_1,
  tuneGrid = xgb_grid_1,
  method = "xgbTree"
)

更新(18.08.10)

当我删除trainControl函数的两个参数(classProbs = TRUE, summaryFunction = twoClassSummary)时,结果是相同的...:

when i delete two parameters (classProbs = TRUE, summaryFunction = twoClassSummary) of trainControl function, the result is the same...:

> xgb_grid_1 = expand.grid(
+   nrounds = 1000,
+   max_depth = c(2, 4, 6, 8, 10),
+   eta=c(0.5, 0.1, 0.07),
+   gamma = 0.01,
+   colsample_bytree=0.5,
+   min_child_weight=1,
+   subsample=0.5
+ )
> 
> xgb_trcontrol_1 = trainControl(
+   method = "cv",
+   number = 5,
+   allowParallel = TRUE
+ )
> 
> xgb_train_1 = train(
+   x = as.matrix(train[ , 2:71]),
+   y = as.matrix(train$sal),
+   trControl = xgb_trcontrol_1,
+   tuneGrid = xgb_grid_1,
+   method = "xgbTree"
+ )
Error: Metric RMSE not applicable for classification models

推荐答案

caret认为您正在要求分类并不奇怪,因为您实际上是在trainControl函数的以下两行中这样做的:

It's not strange that caret thinks you are asking for classification, because you are actually doing so in these 2 lines of your trainControl function:

classProbs = TRUE,     
summaryFunction = twoClassSummary

删除这两行(以使它们采用默认值-请参见函数文档),那应该没事.

Remove both these lines (so as they take their default values - see the function documentation), and you should be fine.

还请注意,AUC仅适用于分类问题.

Notice also that AUC is only applicable to classification problems.

更新(在注释后):似乎目标变量为整数会导致问题;使用

UPDATE (after comments): Seems that the target variable being integer causes the problem; convert it to double before running the model with

train$sal <- as.double(train$sal)

这篇关于插入符号上的R xgboost尝试执行分类而不是回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆