如何使Python决策树更易于理解? [英] How to make Python decision tree more understandable?

查看:79
本文介绍了如何使Python决策树更易于理解?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据文件。数据的最后一列具有+1和-1区分变量。我在单独的文件中也有每列的ID名称。

I have a data file. The last column of the data has +1 and -1 distinguishing variables. I also have the id names of each column in a separate file.

例如

1 2 3 4 1
5 6 7 8 1
9 1 2 3 -1
4 5 6 7 -1
8 9 1 2 -1

对于每个列,我分别具有Q1,Q2,Q3,Q4,Q5名称。

and for each column I have Q1, Q2, Q3, Q4, Q5 names respectively.

我想实现决策树分类器,所以我写了以下代码:

I want to implement decision tree classifier so I wrote the following code:

import numpy
from sklearn import tree

print('Reading data from ' + fileName);
data = numpy.loadtxt(fileName);
print('Getting ids from ', idFile)
idArray = numpy.genfromtxt('cleanedID.csv', dtype='str')

print('Adding ids')
print('data dimensions: ', data.shape)
print('idArray dimensions: ', idArray.shape)
data = numpy.append(idArray, data, axis = 0)


y = data[:,-1]
x = data[:, 1:-1]

classifier = tree.DecisionTreeClassifier(max_depth = depth)
classifier = classifier.fit(x, y)

with open('graph.dot', 'w') as file:
    tree.export_graphviz(classifier, out_file = file)

file.close()

我使用了graphviz将.dot文件转换为.png文件。

I used graphviz to convert .dot file to .png file.

问题在于决策树看起来像:

The problem is that the decision tree which looks something like:

我不明白X [number]是什么意思。我经过搜索发现,值= [5 0]表示类别5有0个对象,类别0有5个对象,但是我只有+1和-1区分变量。无论如何,我是否可以调整此决策树,以便可以在决策树图片中看到列名(Q1,Q2,Q3 ....),所以我可以理解这意味着什么?

I don't get what X[number] means. I searched and found that value = [5 0] means class 5 has 0 objects and class 0 has 5 objects but I have only +1 and -1 distinguishing variables. Is there anyway I can tweak this decision tree so that I can see the column names (Q1, Q2, Q3....) in the decision tree picture so I can understand that what this means?

谢谢

推荐答案

Value = [5 0] 表示第一类具有5个成员,第二类具有0个成员。对于您来说,类的顺序可能是 [-1 1]

Value = [5 0] means that the first class has 5 members and the second class has 0 members. For you, the class order is probably [-1 1].

列名:yangjie指出出来, X [158] 表示第159列(零索引)。规则已经很清楚了: X [168]< = 1.5 表示对于给定的行,树根据的值决定是左移还是右移第168列及其与1.5的比较。

As for column names: As yangjie pointed out, X[158] means the 159th column (zero-indexing). The rule is pretty spelled out already: X[168]<=1.5 means for a given row, the tree is deciding whether to go left or right based on the value of the 168th column and how it compares to 1.5.

您可以使用 feature_names 可选参数< a href = http://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html#sklearn.tree.export_graphviz rel = nofollow> export_graphviz

这篇关于如何使Python决策树更易于理解?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆