决策树生成具有相同类的终端叶 [英] Decision Tree generating terminal leaves with same classes

查看：52 发布时间：2021/7/16 20:07:42 python scikit-learn decision-tree

本文介绍了决策树生成具有相同类的终端叶的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对决策树比较陌生，并且坚持使用我的决策树算法.我正在使用交叉验证和参数调整来优化以下示例的分类:https://medium.com/@haydar_ai/learning-data-science-day-22-cross-validation-and-parameter-tuning-b14bcbc6b012.但是无论我如何调整我的参数，我总是得到这样的结果(这里只是一个小树的例子):

I'm relatively new to Decision Trees and I'm stuck with my decision tree algorithm. I'm using cross-validation and parameter tuning to optimize the classification following this example: https://medium.com/@haydar_ai/learning-data-science-day-22-cross-validation-and-parameter-tuning-b14bcbc6b012. But however I tune my parameters I always get results looking like this (here just an example for a small tree):

小型决策树示例

我不明白这种行为的原因.为什么树生成具有相同类(此处为 class2)的叶子?为什么它不在 a<=0.375 = TRUE 之后简单地停止并切割具有相同类别的叶子(参见图片红色矩形)?有没有办法防止这种情况并使算法在此时停止?或者对这种行为有合理的解释吗?任何帮助或想法将不胜感激！谢谢！

I don't understand the reasons for this behaviour. Why does the tree generate leaves with the same class (here class2)? Why does it not simply stop after a<=0.375 = TRUE and cut of the leaves with the same class (see picture red rectangle)? Is there a way to prevent this and make the algorithm stop at this point? Or is there a reasonable explanation for this behaviour? Any help or ideas would be highly appreciated! Thanks!

这是我的代码:

     def load_csv(filename):
           dataset = list()
           with open(filename, 'r') as file:
               csv_reader = reader(file)
               for row in csv_reader:
                   if not row:
                       continue
                   dataset.append(row)
           return dataset

    # Convert string column to float
    def str_column_to_float(dataset, column):
        for row in dataset:
            row[column] = float(row[column].strip())


    # Load dataset
    filename = 'C:/Test.csv'
    dataset = load_csv(filename)


    # convert string columns to float
    for i in range(len(dataset[0])):
        str_column_to_float(dataset, i)

    # Transform to x and y
    x = []
    xpart = []
    y = []
    for row in dataset:
        for i in range(len(row)):
            if i != (len(row) - 1):
                xpart.append(row[i])
            else:
                y.append(row[i])
        x.append(xpart)
        xpart = []

    features_names = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
    labels = ['class1', 'class2']

    #here I tried to tune the parameters 
    #(I changed them several times, this is just an example to show, how the code looks like). 
    # However, I always ended up with terminal leaves with same classes
    """dtree=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=5,
        max_features=8, max_leaf_nodes=None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf=1,
        min_samples_split=2, min_weight_fraction_leaf=0.0,
        presort=False, random_state=None, splitter='random')"""

    #here, I created the small example
    dtree = DecisionTreeClassifier(max_depth=2)
    dtree.fit(x,y)

    dot_data = tree.export_graphviz(dtree, out_file=None) 
    graph = graphviz.Source(dot_data) 
    graph.render("Result") 

    dot_data = tree.export_graphviz(dtree, out_file=None, 
                     feature_names= features_names,  
                     class_names=labels,  
                     filled=True, rounded=True,  
                     special_characters=True)  
    graph = graphviz.Source(dot_data)  
    graph.format = 'png'
    graph.render('Result', view = True)

...以及我的数据快照:

... and a snapshot of my Data:

在此处输入图片描述

决策树生成具有相同类的终端叶 [英] Decision Tree generating terminal leaves with same classes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

决策树生成具有相同类的终端叶 [英] Decision Tree generating terminal leaves with same classes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭