R gbm处理缺失值 [英] R gbm handling of missing values

查看:138
本文介绍了R gbm处理缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道R中的gbm如何处理缺失值吗?我似乎找不到使用Google的任何解释.

Does anyone know how gbm in R handles missing values? I can't seem to find any explanation using google.

推荐答案

要解释gbm对缺少的预测变量的作用,我们首先将gbm对象的一棵树可视化.

To explain what gbm does with missing predictors, let's first visualize a single tree of a gbm object.

假设您有一个gbm对象mygbm.使用pretty.gbm.tree(mygbm, i.tree=1)您可以可视化mygbm上的第一棵树,例如:

Suppose you have a gbm object mygbm. Using pretty.gbm.tree(mygbm, i.tree=1) you can visualize the first tree on mygbm, e.g.:

  SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight    Prediction
0       46  1.629728e+01        1         5           9      26.462908   1585 -4.396393e-06
1       45  1.850000e+01        2         3           4      11.363868    939 -4.370936e-04
2       -1  2.602236e-04       -1        -1          -1       0.000000    271  2.602236e-04
3       -1 -7.199873e-04       -1        -1          -1       0.000000    668 -7.199873e-04
4       -1 -4.370936e-04       -1        -1          -1       0.000000    939 -4.370936e-04
5       20  0.000000e+00        6         7           8       8.638042    646  6.245552e-04
6       -1  3.533436e-04       -1        -1          -1       0.000000    483  3.533436e-04
7       -1  1.428207e-03       -1        -1          -1       0.000000    163  1.428207e-03
8       -1  6.245552e-04       -1        -1          -1       0.000000    646  6.245552e-04
9       -1 -4.396393e-06       -1        -1          -1       0.000000   1585 -4.396393e-06

有关详细信息,请参见gbm文档.每行对应一个节点,第一列(未命名)是节点号.我们看到每个节点都有一个左节点和右节点(如果节点是叶子,则将其设置为-1).我们还看到每个节点都关联了一个MissingNode.

See the gbm documentation for details. Each row corresponds to a node, and the first (unnamed) column is the node number. We see that each node has a left and right node (which are set to -1 in case the node is a leaf). We also see each node has associated a MissingNode.

要在树上进行观察,我们从节点0开始.如果观察值在SplitVar = 46上缺少值,则将其沿树发送到节点MissingNode = 9.用于此类观察的树的树的SplitCodePred = -4.396393e-06,这与树在对节点零进行任何拆分之前的预测相同(对于节点零,Prediction = -4.396393e-06).

To run an observation down the tree, we start at node 0. If an observation has a missing value on SplitVar = 46, then it will be sent down the tree to the node MissingNode = 9. The prediction of the tree for such observation will be SplitCodePred = -4.396393e-06, which is the same prediction the tree had before any split is made to node zero (Prediction = -4.396393e-06 for node zero).

其他节点和拆分变量的过程类似.

The procedure is similar for other nodes and split variables.

这篇关于R gbm处理缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆