如何在R中直接绘制H2O模型对象的ROC [英] How to directly plot ROC of h2o model object in R

查看:127
本文介绍了如何在R中直接绘制H2O模型对象的ROC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我遗漏了一些明显的东西,我深表歉意.在过去的几天中,我非常享受使用R界面使用h2o的乐趣.我想通过绘制ROC来评估我的模型,比如说一个随机森林.该文档似乎暗示有一种简单的方法可以做到这一点:

My apologies if I'm missing something obvious. I've been thoroughly enjoying working with h2o in the last few days using R interface. I would like to evaluate my model, say a random forest, by plotting an ROC. The documentation seems to suggest that there is a straightforward way to do that:

解释DRF模型

  • 默认情况下,显示以下输出:
  • 模型参数(隐藏)
  • 得分历史记录(树木数量与训练的MSE)
  • ROC曲线图(TPR与FPR)
  • 重要性变化图 ...
  • By default, the following output displays:
  • Model parameters (hidden)
  • A graph of the scoring history (number of trees vs. training MSE)
  • A graph of the ROC curve (TPR vs. FPR)
  • A graph of the variable importances ...

我还看到,在python中,您可以应用 roc 函数

I've also seen that in python you can apply roc function here. But I can't seem to be able to find the way to do the same in R interface. Currently I'm extracting predictions from the model using h2o.cross_validation_holdout_predictions and then use pROC package from R to plot the ROC. But I would like to be able to do it directly from the H2O model object, or, perhaps, a H2OModelMetrics object.

非常感谢!

推荐答案

天真的解决方案是使用plot()泛型函数绘制H2OMetrics对象:

A naive solution is to use plot() generic function to plot a H2OMetrics object:

logit_fit <- h2o.glm(colnames(training)[-1],'y',training_frame =
    training.hex,validation_frame=validation.hex,family = 'binomial')
plot(h2o.performance(logit_fit),valid=T),type='roc')

这将给我们一个情节:

但是很难自定义,尤其是更改线型,因为type参数已经被当作'roc'.另外,我还没有找到一种方法可以在一个图上同时绘制多个模型的ROC曲线.我想出了一种方法,可以从H2OMetrics对象中提取正确率和错误率,然后使用ggplot2自己将ROC曲线绘制在一个图上.这是示例代码(使用很多tidyverse语法):

But it is hard to customize, especially to change the line type, since the type parameter is already taken as 'roc'. Also I have not found a way to plot multiple models' ROC curves together on one plot. I have come up with a method to extract true positive rate and false positive rate from the H2OMetrics object and use ggplot2 to plot the ROC curves on one plot by myself. Here is the example code(uses a lot of tidyverse syntax):

# for example I have 4 H2OModels
list(logit_fit,dt_fit,rf_fit,xgb_fit) %>% 
  # map a function to each element in the list
  map(function(x) x %>% h2o.performance(valid=T) %>% 
        # from all these 'paths' in the object
        .@metrics %>% .$thresholds_and_metric_scores %>% 
        # extracting true positive rate and false positive rate
        .[c('tpr','fpr')] %>% 
        # add (0,0) and (1,1) for the start and end point of ROC curve
        add_row(tpr=0,fpr=0,.before=T) %>% 
        add_row(tpr=0,fpr=0,.before=F)) %>% 
  # add a column of model name for future grouping in ggplot2
  map2(c('Logistic Regression','Decision Tree','Random Forest','Gradient Boosting'),
        function(x,y) x %>% add_column(model=y)) %>% 
  # reduce four data.frame to one
  reduce(rbind) %>% 
  # plot fpr and tpr, map model to color as grouping
  ggplot(aes(fpr,tpr,col=model))+
  geom_line()+
  geom_segment(aes(x=0,y=0,xend = 1, yend = 1),linetype = 2,col='grey')+
  xlab('False Positive Rate')+
  ylab('True Positive Rate')+
  ggtitle('ROC Curve for Four Models')

则ROC曲线为:

这篇关于如何在R中直接绘制H2O模型对象的ROC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆