标称属性的虚拟编码 - 使用该K傻瓜，属性选择的影响的影响 [英] Dummy Coding of Nominal Attributes - Effect of Using K Dummies, Effect of Attribute Selection

查看：281 发布时间：2016/7/21 22:29:38 attributes statistics data-mining weka regression

本文介绍了标称属性的虚拟编码 - 使用该K傻瓜，属性选择的影响的影响的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

总结我的话题的理解哑编码通常被理解为编码具有K可能值标称属性作为K-1的二进制假人。的K值的使用会导致冗余，并会产生不利的影响例如在回归，据我了解到的。那么远，一切都清晰。

Summing up my understanding of the topic 'Dummy Coding' is usually understood as coding a nominal attribute with K possible values as K-1 binary dummies. The usage of K values would cause redundancy and would have a negative impact e.g. on logistic regression, as far as I learned it. That far, everything's clear to me.

然而，有两个问题是我不清楚：的

Yet, two issues are unclear to me:

1）铭记问题如上所述，我迷茫了的'物流'的分类在WEKA实际使用ķ假人（见图片）。的为什么会出现这种情况？的

1) Bearing in mind the issue stated above, I am confused that the 'Logistic' classifier in WEKA actually uses K dummies (see picture). Why would that be the case?

2）的一个问题，只要我认为属性选择出现。其中左出属性值被隐式包括在其中所有假人为零，如果所有的假人实际用于模型的情况下，它没有清楚地包含了，如果一个虚设缺失（如在属性选择不选择）。这个问题是非常容易与我上传草图理解。的如何这个问题可以治疗吗？的

2) An issue arises as soon as I consider attribute selection. Where the left-out attribute value is implicitly included as the case where all dummies are zero if all dummies are actually used for the model, it isn't included clearly anymore, if one dummy is missing (as not selected in attribute selection). The issue is much easy to understand with the sketch I uploaded. How can that issue be treated?

其次

的图片的

Images

WEKA输出：的Logistic算法对UCI数据集德语信用卡，其中第一个属性的可能值是A11，A12，A13，A14运行。所有这些都包括在逻辑回归模型。 http://abload.de/img/bildschirmfoto2013-089out9.png

WEKA Output: The Logistic algorithm was run on the UCI dataset German Credit, where the possible values of the first attribute are A11,A12,A13,A14. All of them are included in the logistic regression model. http://abload.de/img/bildschirmfoto2013-089out9.png

决策树示例：草图显示，当涉及到与属性选择后dummy- codeD情况下的数据集运行决策树的问题。 http://abload.de/img/sketchziu5s.jpg

Decision Tree Example: Sketch showing the issue when it comes to running decision trees on datasets with dummy-coded instances after attribute selection. http://abload.de/img/sketchziu5s.jpg

标称属性的虚拟编码 - 使用该K傻瓜，属性选择的影响的影响 [英] Dummy Coding of Nominal Attributes - Effect of Using K Dummies, Effect of Attribute Selection

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

标称属性的虚拟编码 - 使用该K傻瓜，属性选择的影响的影响 [英] Dummy Coding of Nominal Attributes - Effect of Using K Dummies, Effect of Attribute Selection

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭