pandas groupby和value_counts [英] Pandas groupby and value_counts

查看:99
本文介绍了 pandas groupby和value_counts的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对每列中的不同值(用pd.value_counts进行计数)按MultiIndex中的某个级别对数据进行分组.使用groupby(level=参数可以处理multiindex,但是apply会引发ValueError

I want to count distinct values per column (with pd.value_counts I guess) grouping data by some level in MultiIndex. The multiindex is taken care of with groupby(level= parameter, but apply raises a ValueError

原始数据框:

>>> df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)),
                 columns=['c1','c2','c3','c4','c5'], 
                 index=pd.MultiIndex.from_product([['foo', 'bar'], 
                                                   ['w','y','x','y','z']]))



      c1 c2 c3 c4 c5
foo w  C  C  B  A  A
    y  A  A  C  B  A
    x  A  B  C  C  C
    y  A  B  C  C  C
    z  A  C  B  C  B
bar w  B  C  C  A  C
    y  A  A  C  A  A
    x  A  B  B  B  A
    y  A  A  C  A  B
    z  A  B  B  C  B

我想要什么:

       c1  c2  c3  c4  c5
foo A   4   2   0   3   2
    B   1   2   2   1   2
    C   0   1   3   1   1
bar A   4   1   0   1   2
    B   0   2   2   1   1
    C   1   2   3   3   2

我尝试做:

>>> df.groupby(level=0).apply(pd.value_counts)

ValueError: could not broadcast input array from shape (5,5) into shape (5)

我可以自己手动完成此操作,但我认为这必须是一种更明显的方法.

I can do it myself manually, but I think it must be a more obvious way.

groups = [g.apply(pd.value_counts).fillna(0) for n, g in df.groupby(level=0)]
index = df.index.get_level_values(0).unique()
correct_result = pd.concat(groups, keys=index)   # THIS WORKS AS EXPECTED

我的意思是,写这本书的时间并不长,但是我觉得自己正在重新发明轮子.这种操作不是由groupby函数完成的吗?

I mean, this isn't that long to write, but I feel like I'm reinventing the wheel. Aren't this kind of operations done by groupby function?

除了自己自己进行split-apply-combine之外,还有其他更直接的方法吗?

Is there a more straightforward way of doing this, other than doing the split-apply-combine myself?

推荐答案

使用 stack 表示MultiIndex Series,然后 unstack 表示DataFrame:

np.random.seed(123)

df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)),
                 columns=['c1','c2','c3','c4','c5'], 
                 index=pd.MultiIndex.from_product([['foo', 'bar'], 
                                                   ['w','y','x','y','z']]))
print (df)
      c1 c2 c3 c4 c5
foo w  C  B  C  C  A
    y  C  C  B  C  B
    x  C  B  A  B  C
    y  B  A  C  A  B
    z  C  B  A  A  A
bar w  A  B  C  A  C
    y  A  A  B  A  B
    x  A  A  A  C  B
    y  B  C  C  C  B
    z  A  A  C  B  A

df1 = df.stack().groupby(level=[0,2]).value_counts().unstack(1, fill_value=0)
print (df1)
       c1  c2  c3  c4  c5
bar A   4   3   1   2   1
    B   1   1   1   1   3
    C   0   1   3   2   1
foo A   0   1   2   2   2
    B   1   3   1   1   2
    C   4   1   2   2   1

这篇关于 pandas groupby和value_counts的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆