通过h2o集成模型找到每个特征的贡献以做出特定的预测 [英] Finding contribution by each feature into making particular prediction by h2o ensemble model

查看:133
本文介绍了通过h2o集成模型找到每个特征的贡献以做出特定的预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解释h2o GBM模型所做出的决定.根据想法: https://medium. com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211 我想计算每个功能对测试时做出决定的贡献. 是否可以从ensable中获取每个单独的树以及每个节点上的对数奇数? 在进行预测时,还需要按模型遍历每棵树的路径.

解决方案

H2O没有等效的xgboostExplainer软件包.但是,有一种方法可以使某些东西接近.

1)如果您想知道单行/观察采取的决策路径,则可以使用h2o.predict_leaf_node_assignment(model, frame)获取具有叶节点分配的H2OFrame,这将生成类似于以下内容的内容(显示路径在以下情况下构建的每棵树,您都可以看到构建了5棵树):

2)您可以使用H2O的 MOJO ,一旦您构建了GBM或XGBoost模型,便可以下载该文件,其外观类似于以下内容:

3)在即将发布的版本中,您将能够使用GBM获得每个叶节点的预测值(对此的拉取请求为此处)

I am trying to explain the decision taken by h2o GBM model. based on idea:https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211 I want to calculate the contribution by each feature into making a certain decision at test time. Is it possible to get each individual tree from the ensable along with the log-odds at every node? also be needing the path traverse for each tree by model while making the prediction.

解决方案

H2O doesn't have an equivalent xgboostExplainer package. However, there is a way to get something close.

1) if you want to know what decision path was taken for a single row/observation you can use h2o.predict_leaf_node_assignment(model, frame) to get an H2OFrame with the leaf node assignments which will generate something that looks like the following (showing the path for each tree built in the following case you can see that 5 trees were built):

2) you can visualize individual trees using H2O's MOJO which you can download once you've built your GBM or XGBoost model, which will look something like the following:

3) in an upcoming release you will be able to get the prediction value for each leaf node using the GBM (the pull request for this is here)

Putting all these steps together should get you pretty close to getting the values you want so you can add them up for your individual feature impact.(For a python jupyter notebook with examples on how to generate the leaf node assignments and visualize a tree look here)

这篇关于通过h2o集成模型找到每个特征的贡献以做出特定的预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆