熵和信息增益 [英] Entropy and Information Gain

查看:76
本文介绍了熵和信息增益的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望有一个简单的问题.

Simple question I hope.

如果我有一组这样的数据:

If I have a set of data like this:

Classification  attribute-1  attribute-2

Correct         dog          dog 
Correct         dog          dog
Wrong           dog          cat 
Correct         cat          cat
Wrong           cat          dog
Wrong           cat          dog

那,attribute-2相对于attribute-1的信息增益是多少?

Then what is the information gain of attribute-2 relative to attribute-1?

我已经计算出整个数据集的熵:-(3/6)log2(3/6)-(3/6)log2(3/6)= 1

I've computed the entropy of the whole data set: -(3/6)log2(3/6)-(3/6)log2(3/6)=1

然后我被卡住了!我认为您也需要计算属性1和属性2的熵?然后在信息增益计算中使用这三个计算吗?

Then I'm stuck! I think you need to calculate entropies of attribute-1 and attribute-2 too? Then use these three calculations in an information gain calculation?

任何帮助都会很棒,

谢谢:).

推荐答案

首先,您必须计算每个属性的熵.之后,您可以计算信息增益.请稍等一下,我会告诉您应该怎么做.

Well first you have to calculate the entropy for each of the attributes. After that you calculate the information gain. Just give me a moment and I'll show how it should be done.

对于属性1

attr-1=dog:
info([2c,1w])=entropy(2/3,1/3)

attr-1=cat
info([1c,2w])=entropy(1/3,2/3)

属性1的值:

info([2c,1w],[1c,2w])=(3/6)*info([2c,1w])+(3/6)*info([1c,2w])

属性1的收益:

gain("attr-1")=info[3c,3w]-info([2c,1w],[1c,2w])

对于下一个属性,您必须执行相同的操作.

And you have to do the same for the next attribute.

这篇关于熵和信息增益的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆