pandas groupby对象的汇总 [英] Aggregation of pandas groupby objects

查看:78
本文介绍了 pandas groupby对象的汇总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试汇总来自groupby对象的一些统计信息.我必须对数据进行分块,因为有很多(1800万)行.我想找到每个组中每个组中的行数,然后将它们加在一起.我可以添加groupby对象,但是当一个学期中没有一个组时,将显示NaN.看到这种情况:

I am trying to aggregate some statistics from a groupby object on chunks of data. I have to chunk the data because there are many (18 million) rows. I want to find the number of rows in each group in each chunk, then sum them together. I can add groupby objects but when a group is not present in one term, a NaN is the result. See this case:

>>> df = pd.DataFrame({'X': ['A','B','C','A','B','C','B','C','D','B','C','D'],
                       'Y': range(12)})
>>> df
    X   Y
0   A   0
1   B   1
2   C   2
3   A   3
4   B   4
5   C   5
6   B   6
7   C   7
8   D   8
9   B   9
10  C  10
11  D  11
>>> df[0:6].groupby(['X']).count() + df[6:].groupby(['X']).count()
    Y
X    
A NaN
B   4
C   4
D NaN

但是我想看看:

>>> df[0:6].groupby(['X']).count() + df[6:].groupby(['X']).count()
    Y
X    
A   2
B   4
C   4
D   2

有没有很好的方法可以做到这一点?请注意,在实际代码中,我正在遍历每groupby一百万行的分块迭代器.

Is there a good way to do this? Note in the real code I am looping through a chunked iterator of a million rows per groupby.

推荐答案

调用

Call add and pass fill_value=0 you could iteratively add whilst chunking I guess:

In [98]:

df = pd.DataFrame({'X': ['A','B','C','A','B','C','B','C','D','B','C','D'],
                       'Y': np.arange(12)})
df[0:6].groupby(['X']).count().add(df[6:].groupby(['X']).count(), fill_value=0)
Out[98]:
   Y
X   
A  2
B  4
C  4
D  2

这篇关于 pandas groupby对象的汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆