在对另一列进行分组之后,查找一列值的最大出现 [英] Finding max occurrence of a column's value, after group-by on another column

查看:79
本文介绍了在对另一列进行分组之后,查找一列值的最大出现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框:

        id                city
 000.tushar@gmail.com   Bangalore
 00078r@gmail.com       Mumbai
0007ayan@gmail.com      Jamshedpur
0007ayan@gmail.com      Jamshedpur
000.tushar@gmail.com    Bangalore
  00078r@gmail.com      Mumbai
  00078r@gmail.com      Vijayawada
  00078r@gmail.com      Vijayawada
  00078r@gmail.com      Vijayawada

我想逐个查找最大出现的城市名称.这样,对于给定的ID,我可以知道-这是他最喜欢的城市:

I want to find id-wise the maximum occurring city name. So that for a given id I can tell that - this is his favorite city:

         id             city
000.tushar@gmail.com   Bangalore
00078r@gmail.com       Vijayawada
0007ayan@gmail.com     Jamshedpur

使用groupby id和城市给出:

Using groupby id and city gives:

         id                   city       count
0  000.tushar@gmail.com       Bangalore    2
1      00078r@gmail.com        Mumbai      2
2      00078r@gmail.com      Vijayawada    3
3    0007ayan@gmail.com      Jamshedpur    2

如何进一步进行?我相信一些按组申请会做到这一点,但不知道到底是什么会成功.所以请提出建议.

How to proceed further? I believe some group-by apply will do that but unaware of what exactly will do the trick. So please suggest.

如果两个或三个城市的ID计数相同,则可以返回其中任何一个城市.

If some id has same count for two or three cities I am ok with returning any of those cities.

推荐答案

您可以使用groupby加倍. core.groupby.GroupBy.size.html"rel =" nofollow noreferrer> size

You can try double groupby with size and idxmax. Output is list of tuples (because MultiIndex), so use apply:

df = df.groupby(['id','city']).size().groupby(level=0).idxmax()
                              .apply(lambda x: x[1]).reset_index(name='city')

另一种解决方案:

s = df.groupby(['id','city']).size()
df = s.loc[s.groupby(level=0).idxmax()].reset_index().drop(0,axis=1)

或者:

df = df.groupby(['id'])['city'].apply(lambda x: x.value_counts().index[0]).reset_index()


print (df)
                     id        city
0  000.tushar@gmail.com   Bangalore
1      00078r@gmail.com  Vijayawada
2    0007ayan@gmail.com  Jamshedpur

这篇关于在对另一列进行分组之后,查找一列值的最大出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆