Theil不等式索引在python中的实现 [英] Implementation of Theil inequality index in python
问题描述
我正在尝试实施Theil的索引( http://en.wikipedia.org/wiki/Theil_index )来衡量列表中收入的不平等.
I am trying to implement Theil's index (http://en.wikipedia.org/wiki/Theil_index) in Python to measure inequality of revenue in a list.
该公式基本上是香农的熵,因此它处理对数.我的问题是我的列表中的收入为0,而log(0)使我的公式不满意.我相信在log(tinyFloat)= -inf处添加一个小的float值将不起作用,这会使我的索引混乱.
The formula is basically Shannon's entropy, so it deals with log. My problem is that I have a few revenues at 0 in my list, and log(0) makes my formula unhappy. I believe adding a tiny float to 0 wouldn't work as log(tinyFloat) = -inf, and that would mess my index up.
这是一个代码段(摘自另一个更清洁且可免费获得的实现)
Here's a snippet (taken from another, much cleaner -and freely available-, implementation)
def error_if_not_in_range01(value):
if (value <= 0) or (value > 1):
raise Exception, \
str(value) + ' is not in [0,1)!'
def H(x)
n = len(x)
entropy = 0.0
sum = 0.0
for x_i in x: # work on all x[i]
print x_i
error_if_not_in_range01(x_i)
sum += x_i
group_negentropy = x_i*log(x_i)
entropy += group_negentropy
error_if_not_1(sum)
return -entropy
def T(x):
print x
n = len(x)
maximum_entropy = log(n)
actual_entropy = H(x)
redundancy = maximum_entropy - actual_entropy
inequality = 1 - exp(-redundancy)
return redundancy,inequality
有什么办法解决这个问题?
Is there any way out of this problem?
推荐答案
如果我对您的理解正确,那么您尝试实现的公式如下:
If I understand you correctly, the formula you are trying to implement is the following:
在这种情况下,您的问题是计算Xi = 0
时Xi / mean(X)
的自然对数.
In this case, your problem is calculating the natural logarithm of Xi / mean(X)
, when Xi = 0
.
但是,由于必须先将其乘以Xi / mean(X)
,因此如果Xi == 0
的值,则ln(Xi / mean(X))
的值无关紧要,因为它将被乘以零.您可以将该条目的公式的值视为零,然后完全跳过对数的计算.
However, since that has to be multiplied by Xi / mean(X)
first, if Xi == 0
the value of ln(Xi / mean(X))
doesn't matter because it will be multiplied by zero. You can treat the value of the formula for that entry as zero, and skip calculating the logarithm entirely.
在直接实现香农公式的情况下,同样适用:
In the case that you are implementing Shannon's formula directly, the same holds:
在第一种和第二种形式中,如果Pi == 0
,则不需要计算对数,因为无论它是什么值,它都将被乘以零.
In both the first and second form, calculating the log is not necessary if Pi == 0
, because whatever value it is, it will have been multiplied by zero.
更新:
鉴于您引用的代码,可以将x_i*log(x_i)
替换为以下功能:
Given the code you quoted, you can replace x_i*log(x_i)
with a function as follows:
def Group_negentropy(x_i):
if x_i == 0:
return 0
else:
return x_i*log(x_i)
def H(x)
n = len(x)
entropy = 0.0
sum = 0.0
for x_i in x: # work on all x[i]
print x_i
error_if_not_in_range01(x_i)
sum += x_i
group_negentropy = Group_negentropy(x_i)
entropy += group_negentropy
error_if_not_1(sum)
return -entropy
这篇关于Theil不等式索引在python中的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!