多指数组由 pandas 数据帧 [英] MultiIndex Group By in Pandas Data Frame

查看:123
本文介绍了多指数组由 pandas 数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,包含按国家统计的经济指标的年份,如下所示:

I have a data set that contains countries and statistics on economic indicators by year, organized like so:

Country  Metric           2011   2012   2013  2014
  USA     GDP               7      4     0      2
  USA     Pop.              2      3     0      3
  GB      GDP               8      7     0      7
  GB      Pop.              2      6     0      0
  FR      GDP               5      0     0      1
  FR      Pop.              1      1     0      5

如何在熊猫中使用MultiIndex创建仅显示GDP的数据框架每个国家的年份?

How can I use MultiIndex in pandas to create a data frame that only shows GDP by Year for each country?

我试过:

df = data.groupby(['Country', 'Metric'])

但它没有正常工作。

推荐答案

在这种情况下,实际上并不需要 groupby 。您也没有 MultiIndex 。您可以这样做:

In this case, you don't actually need a groupby. You also don't have a MultiIndex. You can make one like this:

import pandas
from io import StringIO

datastring = StringIO("""\
Country  Metric           2011   2012   2013  2014
USA     GDP               7      4     0      2
USA     Pop.              2      3     0      3
GB      GDP               8      7     0      7
GB      Pop.              2      6     0      0
FR      GDP               5      0     0      1
FR      Pop.              1      1     0      5
""")
data = pandas.read_table(datastring, sep='\s\s+')
data.set_index(['Country', 'Metric'], inplace=True)

然后数据如下所示:

                2011  2012  2013  2014
Country Metric                        
USA     GDP        7     4     0     2
        Pop.       2     3     0     3
GB      GDP        8     7     0     7
        Pop.       2     6     0     0
FR      GDP        5     0     0     1
        Pop.       1     1     0     5

现在要获得GDP,您可以通过数据框的横截面 xs 方法:

Now to get the GDPs, you can take a cross-section of the dataframe via the xs method:

data.xs('GDP', level='Metric')

         2011  2012  2013  2014
Country                        
USA         7     4     0     2
GB          8     7     0     7
FR          5     0     0     1

这很简单,因为你的数据已经被摆放/解除了。如果他们没有,看起来像这样:

It's so easy because your data are already pivoted/unstacked. IF they weren't and looked like this:

data.columns.names = ['Year']
data = data.stack()
data

Country  Metric  Year
USA      GDP     2011    7
                 2012    4
                 2013    0
                 2014    2
         Pop.    2011    2
                 2012    3
                 2013    0
                 2014    3
GB       GDP     2011    8
                 2012    7
                 2013    0
                 2014    7
         Pop.    2011    2
                 2012    6
                 2013    0
                 2014    0
FR       GDP     2011    5
                 2012    0
                 2013    0
                 2014    1
         Pop.    2011    1
                 2012    1
                 2013    0
                 2014    5

你然后可以使用 groupby 来告诉你整个世界:

You could then use groupby to tell you something about the world as a whole:

data.groupby(level=['Metric', 'Year']).sum()
Metric  Year
GDP     2011    20
        2012    11
        2013     0
        2014    10
Pop.    2011     5
        2012    10
        2013     0
        2014     8

或得到真正的幻想:

data.groupby(level=['Metric', 'Year']).sum().unstack(level='Metric')
Metric  GDP  Pop.
Year             
2011     20     5
2012     11    10
2013      0     0
2014     10     8

这篇关于多指数组由 pandas 数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆