scikit-learn决策树回归:检索叶子的所有样本(不是均值) [英] scikit-learn Decision trees Regression: retrieve all samples for leaf (not mean)

查看:155
本文介绍了scikit-learn决策树回归:检索叶子的所有样本(不是均值)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经开始使用scikit-learn决策树,到目前为止,它运行良好,但是我需要做的一件事是检索叶节点的样本Y值集合,尤其是在运行预测时。给定一个输入特征向量X,我想知道叶节点处对应的Y值的集合,而不仅是这些值的平均值(或中位数)的回归值。当然,人们希望样本均值具有较小的方差,但是我确实想提取实际的Y值集并进行一些统计/创建PDF。我曾经使用过这样的代码如何提取来自scikit-learn决策树的决策规则?
要打印决策树,但值的输出是表示平均值的单个浮点数。我有一个很大的数据集,所以将叶子大小限制为例如100,我要访问这100个值...

I have started using scikit-learn Decision Trees and so far it is working out quite well but one thing I need to do is retrieve the set of sample Y values for the leaf node, especially when running a prediction. That is given an input feature vector X, I want to know the set of corresponding Y values at the leaf node instead of just the regression value which is the mean (or median) of those values. Of course one would want the sample mean to have a small variance but I do want to extract the actual set of Y values and do some statistics/create a PDF. I have used code like this how to extract the decision rules from scikit-learn decision-tree? To print the decision tree but the output of the 'value' is the single float representing the mean. I have a large dataset so limit the leaf size to e.g. 100, I want to access those 100 values...

推荐答案

另一种解决方案是使用(未记录的)功能sklearn DecisionTreeRegressor对象是.tree.impurity
,它返回每片叶子的值的标准差

another solution is to use an (undocumented?) feature of the sklearn DecisionTreeRegressor object which is .tree.impurity it returns the standard deviation of the values per each leaf

这篇关于scikit-learn决策树回归:检索叶子的所有样本(不是均值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆