R 中的 XGBoost 预测器为所有行预测相同的值 [英] XGBoost predictor in R predicts the same value for all rows

查看:44
本文介绍了R 中的 XGBoost 预测器为所有行预测相同的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Python 中查看了关于同一件事的帖子,但我想要在 R 中的解决方案.我正在处理来自 Kaggle 的 Titanic 数据集,它看起来像这样:

I looked into the the post on the same thing in Python, but I want a solution in R. I'm working on the Titanic dataset from Kaggle, and it looks like this:

    'data.frame':   891 obs. of  13 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : num  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : Factor w/ 3 levels "1","2","3": 3 1 3 1 3 3 1 3 3 2 ...
 $ Age        : num  22 38 26 35 35 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Child      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ...
 $ Embarked.C : num  0 1 0 0 0 0 0 0 0 1 ...
 $ Embarked.Q : num  0 0 0 0 0 1 0 0 0 0 ...
 $ Embarked.S : num  1 0 1 1 1 0 1 1 1 0 ...
 $ Sex.female : num  0 1 1 1 0 0 0 0 1 1 ...
 $ Sex.male   : num  1 0 0 0 1 1 1 1 0 0 ...

这是在我使用虚拟变量之后.我的测试集:

This is after I used dummy variables. My test set:

'data.frame':   418 obs. of  12 variables:
 $ PassengerId: int  892 893 894 895 896 897 898 899 900 901 ...
 $ Pclass     : Factor w/ 3 levels "1","2","3": 3 3 2 3 3 3 3 2 3 3 ...
 $ Age        : num  34.5 47 62 27 22 14 30 26 18 21 ...
 $ SibSp      : int  0 1 0 0 1 0 0 1 0 2 ...
 $ Parch      : int  0 0 0 0 1 0 0 1 0 0 ...
 $ Fare       : num  7.83 7 9.69 8.66 12.29 ...
 $ Child      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ Embarked.C : num  0 0 0 0 0 0 0 0 1 0 ...
 $ Embarked.Q : num  1 0 1 0 0 0 1 0 0 0 ...
 $ Embarked.S : num  0 1 0 1 1 1 0 1 0 1 ...
 $ Sex.female : num  0 1 0 0 1 0 1 0 1 0 ...
 $ Sex.male   : num  1 0 1 1 0 1 0 1 0 1 ...

我使用以下代码运行 xgboost:

I ran xgboost using the following code:

> param <- list("objective" = "multi:softprob",
    +               "max.depth" = 25)
    > xgb = xgboost(param, data = trmat, label = y, nround = 7)
    [0] train-rmse:0.350336
    [1] train-rmse:0.245470
    [2] train-rmse:0.171994
    [3] train-rmse:0.120511
    [4] train-rmse:0.084439
    [5] train-rmse:0.059164
    [6] train-rmse:0.041455

trmat 是:

trmat = data.matrix(train)

和 temat 是:

temat = data.matrix(test)

y 是幸存的变量:

y = train$Survived

但是我运行了预测函数:

But wen i run the predict function:

> x = predict(xgb, newdata = temat)
> x[1:10]
 [1] 0.9584613 0.9584613 0.9584613 0.9584613 0.9584613 0.9584613 0.9584613
 [8] 0.9584613 0.9584613 0.9584613

所有概率都被预测为相同.在 python 问题中,有人说增加 max.depth 会起作用,但它没有.我做错了什么?

All probabilities are being predicted to be the same. In the python question, someone said increasing max.depth would work, but it didn't. What am I doing wrong?

推荐答案

您必须删除测试集中的 Survived 变量才能使用 xgboost,因为这是您想要预测的变量.

You must remove the Survived variable in your test set in order to use xgboost, since this is the variable you want to predict.

trmat = data.matrix(train[, colnames(train) != "Survived"])

它应该可以解决您的问题.

It should solve your problem.

这篇关于R 中的 XGBoost 预测器为所有行预测相同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆