随机森林分类器的决策路径 [英] Decision path for a Random Forest Classifier

查看:528
本文介绍了随机森林分类器的决策路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码,可以在您的环境中运行它,我正在使用RandomForestClassifier,并且试图找出随机森林分类器中所选样本的 decision_path .

Here is my code to run it in your environment, I am using RandomForestClassifier and I am trying to figure out the decision_path for a selected sample in the random forest classifier

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

X, y = make_classification(n_samples=1000,
                           n_features=6,
                           n_informative=3,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)

# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
                                  'Feature 2':X[:,1],
                                  'Feature 3':X[:,2],
                                  'Feature 4':X[:,3],
                                  'Feature 5':X[:,4],
                                  'Feature 6':X[:,5],
                                  'Class':y})


y_train = df['Class']
X_train = df.drop('Class',axis = 1)

rf = RandomForestClassifier(n_estimators=50,
                               random_state=0)

rf.fit(X_train, y_train)

我得到的最远的是这个

#Extracting the decision path for instance i = 12
i_data = X_train.iloc[12].values.reshape(1,-1)
d_path = rf.decision_path(i_data)

print(d_path)

外出没有多大意义

(< 1x7046类型的稀疏矩阵 具有486个以压缩稀疏行格式存储的元素>,array([0,133,282,415,588,761,910,1041,1182,1309,1432, 1569、1728、1869、2000、2143、2284、2419、2572、2711、2856、2987, 3128、3261、3430、3549、3704、3839、3980、4127、4258、4389、4534, 4671、4808、4947、5088、5247、5378、5517、5640、5769、5956、6079, 6226、6385、6524、6655、6780、6925、7046],dtype = int32))

(<1x7046 sparse matrix of type '' with 486 stored elements in Compressed Sparse Row format>, array([ 0, 133, 282, 415, 588, 761, 910, 1041, 1182, 1309, 1432, 1569, 1728, 1869, 2000, 2143, 2284, 2419, 2572, 2711, 2856, 2987, 3128, 3261, 3430, 3549, 3704, 3839, 3980, 4127, 4258, 4389, 4534, 4671, 4808, 4947, 5088, 5247, 5378, 5517, 5640, 5769, 5956, 6079, 6226, 6385, 6524, 6655, 6780, 6925, 7046], dtype=int32))

我正在尝试找出数据帧中粒子样本的决策路径.谁能告诉我该怎么做?

I am trying to figure out the decision path for a particle sample in the dataframe. Can anyone tell me how to do that ?

想法是要有这样的东西

http://scikit-learn.org/stable/auto_examples/tree /plot_unveil_tree_structure.html

推荐答案

RandomForestClassifier.decision_path方法返回tuple(indicator, n_nodes_ptr). 请参阅文档: 此处

RandomForestClassifier.decision_path method returns a tuple of (indicator, n_nodes_ptr). see the documentation : here

因此,变量node_indicator是一个元组,而不是您认为的那样. 元组对象没有属性索引",这就是为什么这样做时会出错的原因:

So your variable node_indicator is a tuple and not what you think it is. A tuple object has no attribute 'indices' that's why you get the error when you do :

node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
                                    node_indicator.indptr[sample_id + 1]]

尝试:

(node_indicator, _) = rf.decision_path(X_train)


您还可以为单个样本ID绘制森林中每棵树的决策树:


You can also plot the decision tree of each tree of the forest for a single sample id :

X_train = X_train.values

sample_id = 0

for j, tree in enumerate(rf.estimators_):

    n_nodes = tree.tree_.node_count
    children_left = tree.tree_.children_left
    children_right = tree.tree_.children_right
    feature = tree.tree_.feature
    threshold = tree.tree_.threshold

    print("Decision path for DecisionTree {0}".format(j))
    node_indicator = tree.decision_path(X_train)
    leave_id = tree.apply(X_train)
    node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
                                        node_indicator.indptr[sample_id + 1]]



    print('Rules used to predict sample %s: ' % sample_id)
    for node_id in node_index:
        if leave_id[sample_id] != node_id:
            continue

        if (X_train[sample_id, feature[node_id]] <= threshold[node_id]):
            threshold_sign = "<="
        else:
            threshold_sign = ">"

        print("decision id node %s : (X_train[%s, %s] (= %s) %s %s)"
              % (node_id,
                 sample_id,
                 feature[node_id],
                 X_train[sample_id, feature[node_id]],
                 threshold_sign,
                 threshold[node_id]))

请注意,根据您的情况,您有50个估算器,因此阅读起来可能有些无聊.

Note that in your case, you have 50 estimators so it might be a bit boring to read.

这篇关于随机森林分类器的决策路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆