pandas - 每列计数不同的值 [英] Pandas - count distinct values per column

查看:78
本文介绍了 pandas - 每列计数不同的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框:

  Id ActivityId ActivityCode 

1 2 3
1 2 4
1 3 2

我需要计算在上面的例子中,id 1会返回2,因为这个id有两个不同的活动ID。

p>

SQL看起来就是这样:

  SELECT COUNT(DISTINCT ActivityId) FROM table GROUP BY Id 

我如何在熊猫中做到这一点?



(如果可能的话,我想知道是否有方法在字典中获得结果,而不需要手动迭代)

解决方案

我认为你需要 groupby nunique

  print(df)
Id ActivityId ActivityCode
0 1 2 3
1 1 2 4
2 1 3 2
3 2 8 7

df = df.groupby('Id')['ActivityId']。nunique()
print(df)
Id
1 2
2 1
名称:ActivityId,dtype:int64

dict add Series.to_dict

  d = df .groupby('Id')['ActivityId']。nunique()。to_dict()
print(d)
{1:2,2:1}


I have a dataframe that looks like this:

Id ActivityId ActivityCode

1   2           3
1   2           4
1   3           2

I need to get a count of the distinct Activity IDs that the Id is related to.

In the example above, id 1 would return 2 since there're 2 distinct activity ids for that id.

The SQL would look this way:

SELECT COUNT(DISTINCT ActivityId) FROM table GROUP BY Id

How do I do this in pandas?

(And if possible, I'd like to know if there's a way to get the result in a dictionary, without iterating manually)

解决方案

I think you need groupby with nunique :

print (df)
   Id  ActivityId  ActivityCode
0   1           2             3
1   1           2             4
2   1           3             2
3   2           8             7

df = df.groupby('Id')['ActivityId'].nunique()
print (df)
Id
1    2
2    1
Name: ActivityId, dtype: int64

And for dict add Series.to_dict:

d = df.groupby('Id')['ActivityId'].nunique().to_dict()
print (d)
{1: 2, 2: 1}

这篇关于 pandas - 每列计数不同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆