删除重复项,使该行在另一列中保持最高值 [英] Drop duplicates keeping the row with the highest value in another column

查看:101
本文介绍了删除重复项,使该行在另一列中保持最高值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

a = [['John', 'Mary', 'John'], [10,22,50]])
df1 = pd.DataFrame(a, columns=['Name', 'Count'])

给出这样的数据框,我想将"Name"的所有相似字符串值与"Count"值进行比较以确定最高值.我不确定如何在Python的数据框中执行此操作.

Given a data frame like this I want to compare all similar string values of "Name" against the "Count" value to determine the highest. I'm not sure how to do this in a dataframe in Python.

例如:在上述情况下,答案是:

Ex: In the case above the Answer would be:

  • 姓名计数
  • 3月22日
  • 约翰50

John 10的较低值已被删除(基于名称"的相同值,我只想看到"Count"的最大值).

The lower value John 10 has been dropped (I only want to see the highest value of "Count" based on the same value for "Name").

在SQL中,它将类似于Select Case查询(其中,我选择Name == Name& Count>的情况)以递归计数以确定最高编号.或者为每个名称提供一个For循环,但据我所知由于对象的性质,在DataFrames中使用它是一个坏主意.

In SQL it would be something like a Select Case query (wherein I select the Case where Name == Name & Count > Count recursively to determine the highest number. Or a For loop for each name, but as I understand loops in DataFrames is a bad idea due to the nature of the object.

是否可以使用Python中的DF执行此操作?我可以为每个变量创建一个新的数据帧(一个只有John的变量,然后获得最大值(df.value()[:1]或类似的值.)但是我有成百上千个唯一的条目,这似乎是一个糟糕的解决方案. :D

Is there a way to do this with a DF in Python? I could create a new data frame with each variable (one with Only John and then get the highest value (df.value()[:1] or similar. But as I have many hundreds of unique entries that seems like a terrible solution. :D

推荐答案

sort_valuesdrop_duplicates

df1.sort_values('Count').drop_duplicates('Name', keep='last')

   Name  Count
1  Mary     22
2  John     50

或者,就像miradulo所说的,groupbymax.

Or, like miradulo said, groupby and max.

df1.groupby('Name')['Count'].max().reset_index()

   Name  Count
0  John     50
1  Mary     22

这篇关于删除重复项,使该行在另一列中保持最高值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆