按两列分组,并计算每种组合在 pandas 中出现的次数 [英] Group by two columns and count the occurrences of each combination in pandas
本文介绍了按两列分组,并计算每种组合在 pandas 中出现的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下数据框:
data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
product_id user_id
p1 a1
p1 a1
p2 a1
p1 a2
p1 a2
p1 a2
p2 a3
p2 a3
p3 a3
在实际情况下,可能还会有其他一些列,但是我需要做的是按product_id和user_id列对数据帧进行分组,并对每种组合的数量进行计数,并将其添加为新的dat帧中的新列
in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame
输出应该是这样的:
user_id product_id count
a1 p1 2
a1 p2 1
a2 p1 3
a3 p2 2
a3 p3 1
我尝试了以下代码:
grouped=data.groupby(['user_id','product_id']).count()
但是结果是:
user_id product_id
a1 p1
p2
a2 p1
a3 p2
p3
实际上,对我来说最重要的事情是让具有发生次数的列名计数,我以后需要使用该列.
actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.
推荐答案
也许这就是您想要的?
>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64
这篇关于按两列分组,并计算每种组合在 pandas 中出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文