用groupby pandas 计算行数 [英] count number rows with groupby pandas
问题描述
我在熊猫0.17中具有以下功能:
I had the following function in pandas 0.17:
df['numberrows'] = df.groupby(['column1','column2','column3'], as_index=False)[['column1']].transform('count').astype('int')
但是我今天升级了大熊猫,现在我得到了错误:
But I upgraded pandas today and now I get the error:
File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py",
第3810行,插入 引发ValueError('无法插入{},已经存在'.format(item))
line 3810, in insert raise ValueError('cannot insert {}, already exists'.format(item))
ValueError:无法插入column1,已经存在
ValueError: cannot insert column1, already exists
更新中有哪些更改导致该功能不再起作用?
What has changed in the update which causes this function to not work anymore?
我想对列进行分组,并添加具有分组依据的数量或行的列.
I want to groupby the columns and add a column which has the amount or rows of the groupby.
如果我以前做的不是一个好的函数,那么也欢迎使用另一种分组方式,同时获取已分组的行数.
If what I did before was not a good function, another way of grouping while getting the amount of rows that were grouped is also welcome.
小型数据集:
column1 column2 column3
0 test car1 1
1 test2 car5 2
2 test car1 1
3 test4 car2 1
4 test2 car1 1
结果将是:
column1 column2 column3 numberrows
0 test car1 1 2
1 test2 car5 2 1
3 test4 car2 1 1
4 test2 car1 1 1
推荐答案
请考虑以下方法:
In [18]: df['new'] = df.groupby(['column1','column2','column3'])['column1'] \
.transform('count')
In [19]: df
Out[19]:
column1 column2 column3 new
0 test car1 1 2
1 test2 car5 2 1
2 test car1 1 2
3 test4 car2 1 1
4 test2 car1 1 1
更新:
In [26]: df.groupby(['column1','column2','column3'])['column1'] \
.count().reset_index(name='numberrows')
Out[26]:
column1 column2 column3 numberrows
0 test car1 1 2
1 test2 car1 1 1
2 test2 car5 2 1
3 test4 car2 1 1
这篇关于用groupby pandas 计算行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!