从R中的GBM提取模型 [英] Extracting Model from GBM in R

查看:331
本文介绍了从R中的GBM提取模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人熟悉如何弄清楚R中gbm模型内部的情况吗?

is anyone familiar with how to figure out what's going on inside a gbm model in R?

比方说,我们想看看如何预测虹膜中的Petal.Length.为了简单起见,我运行了:

Let's say we wanted to see how to predict the Petal.Length in iris. Just to keep it simple I ran:

tg=gbm(Petal.Length~.,data=iris)

这可行,并且在您运行时:

This works and when you run:

summary(tg)

然后您得到:

Hit <Return> to see next plot: 
                      var rel.inf
Petal.Width   Petal.Width   67.39
Species           Species   32.61
Sepal.Length Sepal.Length    0.00
Sepal.Width   Sepal.Width    0.00

这很直观.当您运行pretty.gbm.tree(tg)时,您会得到:

This makes sense intuitively. When you run pretty.gbm.tree(tg) You get:

  SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight    Prediction
0        2  0.8000000000        1         2           3       184.6764     75  0.0001366667
1       -1 -0.0022989091       -1        -1          -1         0.0000     22 -0.0022989091
2       -1  0.0011476604       -1        -1          -1         0.0000     53  0.0011476604
3       -1  0.0001366667       -1        -1          -1         0.0000     75  0.0001366667

因此,很显然,gbm认为您被变量#2分割并获得了三个单独的回归.我假设SplitVar==2Petal.Width,因为您在str(iris)中看到的顺序很有意义.

So clearly gbm thinks that you split by Variable #2 and get back three separate regressions. I assume that SplitVar==2 is Petal.Width since the order you see in str(iris) makes sense.

但是缺少什么数据? iris没有丢失的数据.然后,我们如何看待所创建的三个节点中的每个节点正在发生什么?

But what data is missing? iris has no missing data. And then how do we see what is going on in each of the three nodes that were created?

比方说,我想用C ++编写代码进行生产,除了知道根据Petal.Width >.8是否应该做不同的事情之外,我不知道该如何编码.

Let's say I wanted to code this up in C++ for production, I don't get how one would know what to code beyond knowing that you should do something differently depending on if Petal.Width >.8.

谢谢

乔什

推荐答案

请参见包 mlmeta中的函数gbm2sas ,它使用元编程将R对象转换为SAS格式.

See the function gbm2sas in the package mlmeta, which uses metaprogramming to convert the R object to SAS format.

SAS格式类似于C ++,因此既易于阅读又容易被C ++入侵.

The SAS format is similar to C++, so it is both easy to read and easy hack to C++.

这篇关于从R中的GBM提取模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆