Python-使用Numpy计算基尼系数 [英] Python - Gini coefficient calculation using Numpy

查看:1429
本文介绍了Python-使用Numpy计算基尼系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个新手,首先,我刚开始学习Python,我正在尝试编写一些代码来计算假国家的基尼指数.我提出了以下建议:

I'm a newbie, first of all, just started learning Python and I'm trying to write some code to calculate the Gini index for a fake country. I've came up with the following:

GDP = (653200000000)
A = (0.49 * GDP) / 100 # Poorest 10%
B = (0.59 * GDP) / 100
C = (0.69 * GDP) / 100
D = (0.79 * GDP) / 100
E = (1.89 * GDP) / 100
F = (2.55 * GDP) / 100
G = (5.0 * GDP) / 100
H = (10.0 * GDP) / 100
I = (18.0 * GDP) / 100
J = (60.0 * GDP) / 100 # Richest 10%

# Divide into quintiles and total income within each quintile
Q1 = float(A + B) # lowest quintile
Q2 = float(C + D) # second quintile
Q3 = float(E + F) # third quintile
Q4 = float(G + H) # fourth quintile
Q5 = float(I + J) # fifth quintile

# Calculate the percent of total income in each quintile
T1 = float((100 * Q1) / GDP) / 100
T2 = float((100 * Q2) / GDP) / 100
T3 = float((100 * Q3) / GDP) / 100
T4 = float((100 * Q4) / GDP) / 100
T5 = float((100 * Q5) / GDP) / 100

TR = float(T1 + T2 + T3 + T4 + T5)

# Calculate the cumulative percentage of household income
H1 = float(T1)
H2 = float(T1+T2)
H3 = float(T1+T2+T3)
H4 = float(T1+T2+T3+T4)
H5 = float(T1+T2+T3+T4+T5)

# Magic! Using numpy to calculate area under Lorenz curve.
# Problem might be here?
import numpy as np 
from numpy import trapz

# The y values. Cumulative percentage of incomes
y = np.array([Q1,Q2,Q3,Q4,Q5])

# Compute the area using the composite trapezoidal rule.
area_lorenz = trapz(y, dx=5)

# Calculate the area below the perfect equality line.
area_perfect = (Q5 * H5) / 2

# Seems to work fine until here. 
# Manually calculated Gini using the values given for the areas above 
# turns out at .58 which seems reasonable?

Gini = area_perfect - area_lorenz

# Prints utter nonsense.
print Gini

Gini = area_perfect - area_lorenz的结果毫无意义.我已经取出了面积变量给定的值,并手动进行了数学运算,结果还可以,但是当我尝试让程序执行此操作时,它给了我一个完全的答案?值(-1.7198 ...).我想念什么?有人可以指出我正确的方向吗?

The result of Gini = area_perfect - area_lorenz just makes no sense. I've took out the values given by the area variables and did the math by hand and it came out fairly ok, but when i try to get the program to do it, it gives me a completely ??? value (-1.7198...). What am I missing? Can someone point me in the right direction?

谢谢!

推荐答案

第一个问题不是正确考虑基尼系数方程:

A first issue is not factoring for the equation for the Gini coefficient correctly:

基尼=(洛伦兹曲线和完全相等之间的面积)/(下面积 完全平等)

gini = (area between Lorenz curve and perfect equality) / (area under perfect equality)

您未在计算中包括分母,并且在等距线下方的面积使用了不正确的方程式(有关使用

You didn't include the denominator in your calculations, and also are using an incorrect equation for the area under the line of equality (see code for a method using np.linspace and np.trapz).

还有一个问题是缺少Lorenz曲线的第一个点(您需要从0开始,而不是第一个五分位数的份额).尽管Lorenz曲线下的面积在0和第一个五分位数之间较小,但与扩展后的等价线下的面积之比非常大.

There is also the issue that the first point of the Lorenz curve is missing (you need to start at 0, not the first quintile's share). Although the area under the Lorenz curve is small between 0 and the first quintile, its ratio to the area under the line of equality after that is extended is quite large.

以下内容提供了答案中给出的方法的等效答案这个问题:

The following provides an equivalent answer to the methods given in the answers to this question:

import numpy as np

GDP = 653200000000 # you don't actually need this value

# Decile percents of global GDP
gdp_decile_percents = [0.49, 0.59, 0.69, 0.79, 1.89, 2.55, 5.0, 10.0, 18.0, 60.0]
print('Percents sum to 100:', sum(gdp_decile_percents) == 100)

gdp_decile_shares = [i/100 for i in gdp_decile_percents]

# Convert to quintile shares of total GDP
gdp_quintile_shares = [(gdp_decile_shares[i] + gdp_decile_shares[i+1]) for i in range(0, len(gdp_decile_shares), 2)]

# Insert 0 for the first value in the Lorenz curve
gdp_quintile_shares.insert(0, 0)

# Cumulative sum of shares (Lorenz curve values)
shares_cumsum = np.cumsum(gdp_quintile_shares)

# Perfect equality line
pe_line = np.linspace(start=0.0, stop=1.0, num=len(shares_cumsum))

area_under_lorenz = np.trapz(y=shares_cumsum, dx=1/len(shares_cumsum))
area_under_pe = np.trapz(y=pe_line, dx=1/len(shares_cumsum))

gini = (area_under_pe - area_under_lorenz) / area_under_pe

print('Gini coefficient:', gini)

np.trapz计算的面积得出的系数为0.67.在没有洛伦兹曲线的第一点并且使用陷阱的情况下计算的值为0.59.现在,我们对全局不平等的计算大致等于上面问题中方法提供的计算(您无需在这些方法的列表/数组中添加0).请注意,使用 scipy.integrate.simps 给出0.69,这意味着另一个问题中的方法与梯形的重合要比Simpson积分的重合.

The areas calculated with np.trapz give a coefficient of 0.67. The value calculated without the first point of the Lorenz curve and using trapz was 0.59. Our calculation of global inequality is now roughly equal to that provided by the methods in the question linked above (you do not need to add 0 to the lists/arrays in those methods). Note that using scipy.integrate.simps gives 0.69, meaning the methods in the other question coincide more with trapezoidal than Simpson integration.

以下是绘图,其中包括plt.fill_between用以在洛伦兹曲线下着色:

Here's the plot, which includes plt.fill_between to color under the Lorenz curve:

from matplotlib import pyplot as plt

plt.plot(pe_line, income_cumsum, label='lorenz_curve')
plt.plot(pe_line, pe_line, label='perfect_equality')
plt.fill_between(pe_line, income_cumsum)
plt.title('Gini: {}'.format(gini), fontsize=20)
plt.ylabel('Cummulative Share of Global GDP', fontsize=15)
plt.xlabel('Income Quintiles (Lowest to Highest)', fontsize=15)
plt.legend()
plt.tight_layout()
plt.show()

这篇关于Python-使用Numpy计算基尼系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆