建立决策树 [英] Building a Decision Tree

查看:94
本文介绍了建立决策树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在构建决策树时,我们在每个节点上选择最佳功能,然后为该功能选择最佳分割位置. 但是,当当前节点/set中最佳特征的所有值均为0时,该怎么办?所有样本一直分组到一侧(< = 0分支),并且发生无限循环. 例如:

When building a decision tree, at each node, we select the best feature, and then the best splitting position for that feature. However, when all values for the best feature is 0 for samples in the current node /set, what do I do? All samples keep being grouped to one side (the <= 0 branch), and an infinite loop occurs. For example:

#left: 1500, #right: 0

然后

#left: 1500, #right: 0

以此类推...

仅供参考,我遵循以下伪代码.

Just for reference, I'm following the following pseudo-code.

GrowTree(S)
if (y_i = C for all i in S and some class C) then {
 return new leaf(C)                             
 } else {
 choose best splitting feature j and splitting point beta (*)
 I choose the one that gives me the max entropy drop
 S_l = {i : X_ij < beta}                           
 S_r = {i : X_ij >= beta}
 return new node(j, beta, GrowTree(S_l), GrowTree(S_r))

}

推荐答案

这根本是不可能的.您应该选择导致最大程度提高模型确定性的阈值.使用将每个单个实例放在同一分支中的阈值可以使模型确定性增加0,因此这不是最好的划分.当且仅当此特征中的杂质/熵已经为0时,才应该发生这种情况,但这是在决策树中创建叶子的停止标准.

This is simply impossible. You are supposed to select threshold which leads to the biggest incrase of model certainty. Using threshold which puts every single instance in the same branch gives you 0 increase in models certainty, thus this is not the best split. This should happen if and only if, the impurity/entropy is already 0 in this feature, but then it is a stopping criterion for creating leaves in decision tree.

这篇关于建立决策树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆