从R中的GBM提取模型 [英] Extracting Model from GBM in R
问题描述
有人熟悉如何弄清楚R中gbm
模型内部的情况吗?
is anyone familiar with how to figure out what's going on inside a gbm
model in R?
比方说,我们想看看如何预测虹膜中的Petal.Length
.为了简单起见,我运行了:
Let's say we wanted to see how to predict the Petal.Length
in iris. Just to keep it simple I ran:
tg=gbm(Petal.Length~.,data=iris)
这可行,并且在您运行时:
This works and when you run:
summary(tg)
然后您得到:
Hit <Return> to see next plot:
var rel.inf
Petal.Width Petal.Width 67.39
Species Species 32.61
Sepal.Length Sepal.Length 0.00
Sepal.Width Sepal.Width 0.00
这很直观.当您运行pretty.gbm.tree(tg)
时,您会得到:
This makes sense intuitively. When you run pretty.gbm.tree(tg)
You get:
SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight Prediction
0 2 0.8000000000 1 2 3 184.6764 75 0.0001366667
1 -1 -0.0022989091 -1 -1 -1 0.0000 22 -0.0022989091
2 -1 0.0011476604 -1 -1 -1 0.0000 53 0.0011476604
3 -1 0.0001366667 -1 -1 -1 0.0000 75 0.0001366667
因此,很显然,gbm认为您被变量#2分割并获得了三个单独的回归.我假设SplitVar==2
是Petal.Width
,因为您在str(iris)
中看到的顺序很有意义.
So clearly gbm thinks that you split by Variable #2 and get back three separate regressions. I assume that SplitVar==2
is Petal.Width
since the order you see in str(iris)
makes sense.
但是缺少什么数据? iris
没有丢失的数据.然后,我们如何看待所创建的三个节点中的每个节点正在发生什么?
But what data is missing? iris
has no missing data. And then how do we see what is going on in each of the three nodes that were created?
比方说,我想用C ++编写代码进行生产,除了知道根据Petal.Width >.8
是否应该做不同的事情之外,我不知道该如何编码.
Let's say I wanted to code this up in C++ for production, I don't get how one would know what to code beyond knowing that you should do something differently depending on if Petal.Width >.8
.
谢谢
乔什
推荐答案
请参见包 mlmeta中的函数gbm2sas
,它使用元编程将R对象转换为SAS格式.
See the function gbm2sas
in the package mlmeta, which uses metaprogramming to convert the R object to SAS format.
SAS格式类似于C ++,因此既易于阅读又容易被C ++入侵.
The SAS format is similar to C++, so it is both easy to read and easy hack to C++.
这篇关于从R中的GBM提取模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!