XGBOOST-多类预测.预测矩阵是一组类别的概率.如何执行混淆矩阵 [英] XGBOOST-Multi class prediction. Prediction matrix is set of probabilities for classes. How to perform confusion matrix

查看:45
本文介绍了XGBOOST-多类预测.预测矩阵是一组类别的概率.如何执行混淆矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 XGBOOST 进行多类标签预测.

I have used XGBOOST for multi-class label prediction.

这是一个多标签预测.即我的目标值包含 8 个类,我使用了大约 6 个特征,因为它们与目标值高度相关.

This is a multi-label prediction. i.e my target value contains 8 classes and I have about 6 features that I am using since they are very highly correlated to the target value.

我已经创建了我的预测数据集.我已使用 as.data.frame

I have created my prediction data set. I have converted into the data frame from matrix using as.data.frame

我想检查我的预测的准确性.我不确定自从 col 名称发生变化之后我的数据集中没有级别.我使用的所有数据类型都是整数和数字.

I wanted to check the accuracy of my prediction. I am not sure how since col names changes and there are no levels in my data set. All data types I am using are integers and numerics.

 Response <- train$Response
 label <- as.integer(train$Response)-1
 train$Response <- NULL

 train.index = sample(n,floor(0.75*n))
 train.data = as.matrix(train[train.index,])
 train.label = label[train.index]`
 test.data = as.matrix(train[-train.index,])
 test.label = label[-train.index]

 View(train.label)

 # Transform the two data sets into xgb.Matrix
 xgb.train = xgb.DMatrix(data=train.data,label=train.label)
 xgb.test = xgb.DMatrix(data=test.data,label=test.label)




  params = list(
          booster="gbtree",
          eta=0.001,
          max_depth=5,
          gamma=3,
          subsample=0.75,
          colsample_bytree=1,
          objective="multi:softprob",
          eval_metric="mlogloss",
          num_class=8)

    xgb.fit <-xgb.train(
    params=params,
    data=xgb.train,
    nrounds=10000,
    nthreads=1,
    early_stopping_rounds=10,
    watchlist=list(val1=xgb.train,val2=xgb.test),
    verbose=0
      )

   xgb.fit



  xgb.pred = predict(xgb.fit,test.data,reshape = T)
  class(xgb.pred)
  xgb.pred = as.data.frame(xgb.pred)

   """

现在我得到了以下形式的预测概率,因为 8 个类我有 8 个概率.我不知道哪个概率属于哪个变量.

Now I got my prediction probabilities in the below form, Since 8 classes I have 8 probabilities. I don't know which probability belongs to which variable.

1   0.12233257  0.07373134  0.044682350 0.0810693502    0.06272415  0.134308174 0.066143863 0.415008187

我想将它们转换为有意义的标签.这是我无法做到的.执行混淆矩阵

I want to convert them to meaningful labels. which I am not able to do. To perform confusion matrix

推荐答案

假设您的数据是这样的:

Let's say your data is something like this:

train = data.frame(
  Medical_History_23 = sample(1:5,2000,replace=TRUE), 
  Medical_Keyword_3 = sample(1:5,2000,replace=TRUE), 
  Medical_Keyword_15 = sample(1:5,2000,replace=TRUE), 
  BMI = rnorm(2000), 
  Wt = rnorm(2000), 
  Medical_History_4 = sample(1:5,2000,replace=TRUE), 
  Ins_Age = rnorm(2000), 
  Response = sample(1:8,2000,replace=TRUE)) 

然后我们进行训练和测试:

And we do the train and test:

library(xgboost)
label <- as.integer(train$Response)-1
train$Response <- NULL
n = nrow(train)
train.index = sample(n,floor(0.75*n))
train.data = as.matrix(train[train.index,])
train.label = label[train.index]
test.data = as.matrix(train[-train.index,])
test.label = label[-train.index]
xgb.train = xgb.DMatrix(data=train.data,label=train.label)
xgb.test = xgb.DMatrix(data=test.data,label=test.label)

params = list(booster="gbtree",eta=0.001,
          max_depth=5,gamma=3,subsample=0.75,
          colsample_bytree=1,objective="multi:softprob",
          eval_metric="mlogloss",num_class=8)

xgb.fit <-xgb.train(params=params,data=xgb.train,
    nrounds=10000,nthreads=1,early_stopping_rounds=10,
    watchlist=list(val1=xgb.train,val2=xgb.test),
    verbose=0
      )

xgb.pred = predict(xgb.fit,test.data,reshape = T)

您的预测如下所示,每一列都是 1,2...8 的概率

Your prediction looks like below, each column is the probability of being 1,2...8

> head(xgb.pred)
         V1        V2        V3        V4        V5        V6        V7        V8
1 0.1254475 0.1252269 0.1249843 0.1247929 0.1246919 0.1248430 0.1248226 0.1251909
2 0.1255558 0.1249674 0.1250741 0.1250397 0.1249939 0.1247931 0.1248649 0.1247111
3 0.1249737 0.1250508 0.1249501 0.1250445 0.1250142 0.1249630 0.1249194 0.1250844

为了得到预测标签,我们做

To get the prediction label, we do

predicted_labels= factor(max.col(xgb.pred),levels=1:8)
obs_labels = factor(test.label,levels=1:8)

获取混淆矩阵:

caret::confusionMatrix(obs_labels,predicted_labels)

当然,我的这个例子的准确性会很低,因为变量中没有有用的信息,但代码应该适合你.

Of course this example I have will be low accuracy because there's no useful information in the variables but the code should work for you.

这篇关于XGBOOST-多类预测.预测矩阵是一组类别的概率.如何执行混淆矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆