在对另一列进行分组之后,查找一列值的最大出现 [英] Finding max occurrence of a column's value, after group-by on another column
问题描述
我有一个熊猫数据框:
id city
000.tushar@gmail.com Bangalore
00078r@gmail.com Mumbai
0007ayan@gmail.com Jamshedpur
0007ayan@gmail.com Jamshedpur
000.tushar@gmail.com Bangalore
00078r@gmail.com Mumbai
00078r@gmail.com Vijayawada
00078r@gmail.com Vijayawada
00078r@gmail.com Vijayawada
我想逐个查找最大出现的城市名称.这样,对于给定的ID,我可以知道-这是他最喜欢的城市:
I want to find id-wise the maximum occurring city name. So that for a given id I can tell that - this is his favorite city:
id city
000.tushar@gmail.com Bangalore
00078r@gmail.com Vijayawada
0007ayan@gmail.com Jamshedpur
使用groupby id和城市给出:
Using groupby id and city gives:
id city count
0 000.tushar@gmail.com Bangalore 2
1 00078r@gmail.com Mumbai 2
2 00078r@gmail.com Vijayawada 3
3 0007ayan@gmail.com Jamshedpur 2
如何进一步进行?我相信一些按组申请会做到这一点,但不知道到底是什么会成功.所以请提出建议.
How to proceed further? I believe some group-by apply will do that but unaware of what exactly will do the trick. So please suggest.
如果两个或三个城市的ID计数相同,则可以返回其中任何一个城市.
If some id has same count for two or three cities I am ok with returning any of those cities.
推荐答案
您可以使用size
和
You can try double groupby
with size
and idxmax
. Output is list of tuples (because MultiIndex
), so use apply
:
df = df.groupby(['id','city']).size().groupby(level=0).idxmax()
.apply(lambda x: x[1]).reset_index(name='city')
另一种解决方案:
s = df.groupby(['id','city']).size()
df = s.loc[s.groupby(level=0).idxmax()].reset_index().drop(0,axis=1)
或者:
df = df.groupby(['id'])['city'].apply(lambda x: x.value_counts().index[0]).reset_index()
print (df)
id city
0 000.tushar@gmail.com Bangalore
1 00078r@gmail.com Vijayawada
2 0007ayan@gmail.com Jamshedpur
这篇关于在对另一列进行分组之后,查找一列值的最大出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!