如何处理C4.5(J48)决策树中缺少的属性值? [英] How to deal with missing attribute values in C4.5 (J48) decision tree?

查看:468
本文介绍了如何处理C4.5(J48)决策树中缺少的属性值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

用Weka的C4.5(J48)决策树处理缺失要素属性值的最佳方法是什么?在训练和分类过程中都会出现缺少值的问题.

What's the best way to handle missing feature attribute values with Weka's C4.5 (J48) decision tree? The problem of missing values occurs during both training and classification.

  1. 如果训练实例中缺少值,那么我假设放置了?"是否正确?功能的价值?

  1. If values are missing from training instances, am I correct in assuming that I place a '?' value for the feature?

假设我能够成功构建决策树,然后从Weka的树结构中以C ++或Java创建自己的树代码.在分类期间,如果我想对新实例进行分类,对于缺少值的要素,我应该赋予什么值?我如何将树下降经过一个值未知的决策节点?

Suppose that I am able to successfully build the decision tree and then create my own tree code in C++ or Java from Weka's tree structure. During classification time, if I am trying to classify a new instance, what value do I put for features that have missing values? How would I descend the tree past a decision node for which I have an unknown value?

使用朴素贝叶斯(Naive Bayes)能更好地处理缺失值吗?我会为他们分配一个非常小的非零概率,对吧?

Would using Naive Bayes be better for handling missing values? I would just assign a very small non-zero probability for them, right?

推荐答案

摘自华盛顿大学Pedro Domingos的ML课程:

From Pedro Domingos' ML course in University of Washington:

佩德罗建议使用以下三种方法来弥补A的缺失值:

Here are three approaches what Pedro suggests for missing value of A:

    在其他示例中,
  • 将最常见的A值分配给节点n
  • 为其他目标值相同的示例分配A的最常用值
  • 将概率p_i分配给A的每个可能值v_i;将示例的分数p_i分配给树中的每个后代.
  • Assign most common value of A among other examples sorted to node n
  • Assign most common value of A among other examples with same target value
  • Assign probability p_i to each possible value v_i of A; Assign fraction p_i of example to each descendant in tree.

现在可以在此处查看幻灯片和视频.

The slides and video is now viewable at here.

这篇关于如何处理C4.5(J48)决策树中缺少的属性值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆