Theil不等式索引在python中的实现 [英] Implementation of Theil inequality index in python

查看:155
本文介绍了Theil不等式索引在python中的实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实施Theil的索引( http://en.wikipedia.org/wiki/Theil_index )来衡量列表中收入的不平等.

I am trying to implement Theil's index (http://en.wikipedia.org/wiki/Theil_index) in Python to measure inequality of revenue in a list.

该公式基本上是香农的熵,因此它处理对数.我的问题是我的列表中的收入为0,而log(0)使我的公式不满意.我相信在log(tinyFloat)= -inf处添加一个小的float值将不起作用,这会使我的索引混乱.

The formula is basically Shannon's entropy, so it deals with log. My problem is that I have a few revenues at 0 in my list, and log(0) makes my formula unhappy. I believe adding a tiny float to 0 wouldn't work as log(tinyFloat) = -inf, and that would mess my index up.

这是一个代码段(摘自另一个更清洁且可免费获得的实现)

Here's a snippet (taken from another, much cleaner -and freely available-, implementation)

    def error_if_not_in_range01(value):
        if (value <= 0) or (value > 1):
            raise Exception, \
                str(value) + ' is not in [0,1)!'
    def H(x)
        n = len(x)
        entropy = 0.0
        sum = 0.0
        for x_i in x: # work on all x[i]
            print x_i
            error_if_not_in_range01(x_i)
            sum += x_i
            group_negentropy = x_i*log(x_i)
            entropy += group_negentropy
        error_if_not_1(sum)
        return -entropy
    def T(x):
        print x
        n = len(x)
        maximum_entropy = log(n)
        actual_entropy = H(x)
        redundancy = maximum_entropy - actual_entropy
        inequality = 1 - exp(-redundancy)
        return redundancy,inequality

有什么办法解决这个问题?

Is there any way out of this problem?

推荐答案

如果我对您的理解正确,那么您尝试实现的公式如下:

If I understand you correctly, the formula you are trying to implement is the following:

在这种情况下,您的问题是计算Xi = 0Xi / mean(X)的自然对数.

In this case, your problem is calculating the natural logarithm of Xi / mean(X), when Xi = 0.

但是,由于必须先将其乘以Xi / mean(X),因此如果Xi == 0的值,则ln(Xi / mean(X))的值无关紧要,因为它将被乘以零.您可以将该条目的公式的值视为零,然后完全跳过对数的计算.

However, since that has to be multiplied by Xi / mean(X) first, if Xi == 0 the value of ln(Xi / mean(X)) doesn't matter because it will be multiplied by zero. You can treat the value of the formula for that entry as zero, and skip calculating the logarithm entirely.

在直接实现香农公式的情况下,同样适用:

In the case that you are implementing Shannon's formula directly, the same holds:

在第一种和第二种形式中,如果Pi == 0,则不需要计算对数,因为无论它是什么值,它都将被乘以零.

In both the first and second form, calculating the log is not necessary if Pi == 0, because whatever value it is, it will have been multiplied by zero.

更新:

鉴于您引用的代码,可以将x_i*log(x_i)替换为以下功能:

Given the code you quoted, you can replace x_i*log(x_i) with a function as follows:

def Group_negentropy(x_i):
    if x_i == 0:
        return 0
    else:
        return x_i*log(x_i)

def H(x)
    n = len(x)
    entropy = 0.0
    sum = 0.0
    for x_i in x: # work on all x[i]
        print x_i
        error_if_not_in_range01(x_i)
        sum += x_i
        group_negentropy = Group_negentropy(x_i)
        entropy += group_negentropy
    error_if_not_1(sum)
    return -entropy

这篇关于Theil不等式索引在python中的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆