pandas groupby对象的汇总 [英] Aggregation of pandas groupby objects
问题描述
我正在尝试汇总来自groupby对象的一些统计信息.我必须对数据进行分块,因为有很多(1800万)行.我想找到每个组中每个组中的行数,然后将它们加在一起.我可以添加groupby对象,但是当一个学期中没有一个组时,将显示NaN.看到这种情况:
I am trying to aggregate some statistics from a groupby object on chunks of data. I have to chunk the data because there are many (18 million) rows. I want to find the number of rows in each group in each chunk, then sum them together. I can add groupby objects but when a group is not present in one term, a NaN is the result. See this case:
>>> df = pd.DataFrame({'X': ['A','B','C','A','B','C','B','C','D','B','C','D'],
'Y': range(12)})
>>> df
X Y
0 A 0
1 B 1
2 C 2
3 A 3
4 B 4
5 C 5
6 B 6
7 C 7
8 D 8
9 B 9
10 C 10
11 D 11
>>> df[0:6].groupby(['X']).count() + df[6:].groupby(['X']).count()
Y
X
A NaN
B 4
C 4
D NaN
但是我想看看:
>>> df[0:6].groupby(['X']).count() + df[6:].groupby(['X']).count()
Y
X
A 2
B 4
C 4
D 2
有没有很好的方法可以做到这一点?请注意,在实际代码中,我正在遍历每groupby一百万行的分块迭代器.
Is there a good way to do this? Note in the real code I am looping through a chunked iterator of a million rows per groupby.
推荐答案
Call add
and pass fill_value=0
you could iteratively add whilst chunking I guess:
In [98]:
df = pd.DataFrame({'X': ['A','B','C','A','B','C','B','C','D','B','C','D'],
'Y': np.arange(12)})
df[0:6].groupby(['X']).count().add(df[6:].groupby(['X']).count(), fill_value=0)
Out[98]:
Y
X
A 2
B 4
C 4
D 2
这篇关于 pandas groupby对象的汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!