在scikit-learn中获取DecisionTreeRegressor的叶节点处的值分布 [英] Getting the distribution of values at the leaf node for a DecisionTreeRegressor in scikit-learn

查看:115
本文介绍了在scikit-learn中获取DecisionTreeRegressor的叶节点处的值分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

默认情况下,scikit-learning DecisionTreeRegressor返回给定叶节点中训练集中所有目标值的平均值.

By default, a scikit-learn DecisionTreeRegressor returns the mean of all target values from the training set in a given leaf node.

但是,我有兴趣从我的训练集中检索落入预测叶节点的目标值列表.这将使我能够量化分布,还可以计算其他指标(例如标准差).

However, I am interested in getting back the list of target values from my training set that fell into the predicted leaf node. This will allow me to quantify the distribution, and also calculate other metrics like standard deviation.

使用scikit-learn是否可能?

Is this possible using scikit-learn?

推荐答案

我认为您正在寻找的是tree对象的apply方法. 在此处查看源文件.这是一个示例:

I think what you're looking for is the apply method of the tree object. See here for the source. Here's an example:

import numpy as np
from sklearn.tree import DecisionTreeRegressor

rs = np.random.RandomState(1234)
x  = rs.randn(10,2)
y  = rs.randn(10)

md  = rs.randint(1, 5)
dtr = DecisionTreeRegressor(max_depth=md)
dtr.fit(x, y)

# The `tree_` object's methods seem to complain if you don't use `float32.
leaf_ids = dtr.tree_.apply(x.astype(np.float32))

print leaf_ids
# => [5 6 6 5 2 6 3 6 6 3]

# Should be probably be equal for small depths.
print 2**md, np.unique(leaf_ids).shape[0]
# => 4, 4

这篇关于在scikit-learn中获取DecisionTreeRegressor的叶节点处的值分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆