仅当 Pandas 的一列中存在某个值时,才进行分组计数 [英] Groupby count only when a certain value is present in one of the column in pandas
问题描述
我有一个类似于下面提到的数据库的数据框:
I have a dataframe similar to the below mentioned database:
<代码>+------------+-----+--------+|时间 |身份证 |状态 |+------------+-----+--------+|1451606400 |id1 |是 ||1451606400 |id1 |是 ||1456790400 |id2 |没有 ||1456790400 |id2 |是 ||1456790400 |id2 |没有 |+------------+-----+--------+
我正在按上述所有列进行分组,并且我能够使用以下命令成功获取名为 'count'
的不同列中的计数:
I'm grouping by all the columns mentioned above and i'm able to get the count in a different column named 'count'
successfully using the below command:
df.groupby(['time','id', 'status']).size().reset_index(name='count')
但我只希望上述数据框中的计数在 status = 'Yes'
的那些行中,其余的应该是 '0'
But I want the count in the above dataframe only in those rows with status = 'Yes'
and rest should be '0'
期望输出:
<代码>+------------+-----+--------+---------+|时间 |身份证 |状态 |计数 |+------------+-----+--------+---------+|1451606400 |id1 |是 |2 ||1456790400 |id2 |是 |1 ||1456790400 |id2 |没有 |0 |+------------+-----+--------+---------+
我尝试使用以下代码计算 status = 'Yes'
:
I tried to count for status = 'Yes'
with the below code:
df[df['status']=='Yes'].groupby(['time','id','status']).size().reset_index(name='count')
这显然给了我 status = 'Yes'
的那些行并丢弃了其余的行.我想要被丢弃的 count = 0
which obviously gives me those rows with status = 'Yes'
and discarded the rest. I want the discarded ones with count = 0
有什么办法可以得到结果吗?
Is there any way to get the result?
提前致谢!
推荐答案
在 apply
和计数 sum
boolena True
值过程,如 1
:
Use lambda function with apply
and for count sum
boolena True
values proccesses like 1
:
df1 = (df.groupby(['time','id','status'])
.apply(lambda x: (x['status']== 'Yes').sum())
.reset_index(name='count'))
或者创建新列并聚合sum
:
df1 = (df.assign(A=df['status']=='Yes')
.groupby(['time','id','status'])['A']
.sum()
.astype(int)
.reset_index(name='count'))
非常相似的解决方案,没有新列,但可读性更差:
Very similar solution with no new column, but worse readable a bit:
df1 = ((df['status']=='Yes')
.groupby([df['time'],df['id'],df['status']])
.sum()
.astype(int)
.reset_index(name='count'))
print (df)
time id status count
0 1451606400 id1 Yes 2
1 1456790400 id2 No 0
2 1456790400 id2 Yes 1
这篇关于仅当 Pandas 的一列中存在某个值时,才进行分组计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!