如何使Python决策树更易于理解? [英] How to make Python decision tree more understandable?
问题描述
我有一个数据文件。数据的最后一列具有+1和-1区分变量。我在单独的文件中也有每列的ID名称。
I have a data file. The last column of the data has +1 and -1 distinguishing variables. I also have the id names of each column in a separate file.
例如
1 2 3 4 1
5 6 7 8 1
9 1 2 3 -1
4 5 6 7 -1
8 9 1 2 -1
对于每个列,我分别具有Q1,Q2,Q3,Q4,Q5名称。
and for each column I have Q1, Q2, Q3, Q4, Q5 names respectively.
我想实现决策树分类器,所以我写了以下代码:
I want to implement decision tree classifier so I wrote the following code:
import numpy
from sklearn import tree
print('Reading data from ' + fileName);
data = numpy.loadtxt(fileName);
print('Getting ids from ', idFile)
idArray = numpy.genfromtxt('cleanedID.csv', dtype='str')
print('Adding ids')
print('data dimensions: ', data.shape)
print('idArray dimensions: ', idArray.shape)
data = numpy.append(idArray, data, axis = 0)
y = data[:,-1]
x = data[:, 1:-1]
classifier = tree.DecisionTreeClassifier(max_depth = depth)
classifier = classifier.fit(x, y)
with open('graph.dot', 'w') as file:
tree.export_graphviz(classifier, out_file = file)
file.close()
我使用了graphviz将.dot文件转换为.png文件。
I used graphviz to convert .dot file to .png file.
问题在于决策树看起来像:
The problem is that the decision tree which looks something like:
我不明白X [number]是什么意思。我经过搜索发现,值= [5 0]表示类别5有0个对象,类别0有5个对象,但是我只有+1和-1区分变量。无论如何,我是否可以调整此决策树,以便可以在决策树图片中看到列名(Q1,Q2,Q3 ....),所以我可以理解这意味着什么?
I don't get what X[number] means. I searched and found that value = [5 0] means class 5 has 0 objects and class 0 has 5 objects but I have only +1 and -1 distinguishing variables. Is there anyway I can tweak this decision tree so that I can see the column names (Q1, Q2, Q3....) in the decision tree picture so I can understand that what this means?
谢谢
推荐答案
Value = [5 0]
表示第一类具有5个成员,第二类具有0个成员。对于您来说,类的顺序可能是 [-1 1]
。
Value = [5 0]
means that the first class has 5 members and the second class has 0 members. For you, the class order is probably [-1 1]
.
列名:yangjie指出出来, X [158]
表示第159列(零索引)。规则已经很清楚了: X [168]< = 1.5
表示对于给定的行,树根据的值决定是左移还是右移第168列及其与1.5的比较。
As for column names: As yangjie pointed out, X[158]
means the 159th column (zero-indexing). The rule is pretty spelled out already: X[168]<=1.5
means for a given row, the tree is deciding whether to go left or right based on the value of the 168th column and how it compares to 1.5.
您可以使用 feature_names
可选参数< a href = http://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html#sklearn.tree.export_graphviz rel = nofollow> export_graphviz
这篇关于如何使Python决策树更易于理解?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!