删除重复项,使该行在另一列中保持最高值 [英] Drop duplicates keeping the row with the highest value in another column
问题描述
a = [['John', 'Mary', 'John'], [10,22,50]])
df1 = pd.DataFrame(a, columns=['Name', 'Count'])
给出这样的数据框,我想将"Name"的所有相似字符串值与"Count"值进行比较以确定最高值.我不确定如何在Python的数据框中执行此操作.
Given a data frame like this I want to compare all similar string values of "Name" against the "Count" value to determine the highest. I'm not sure how to do this in a dataframe in Python.
例如:在上述情况下,答案是:
Ex: In the case above the Answer would be:
- 姓名计数
- 3月22日
- 约翰50
John 10的较低值已被删除(基于名称"的相同值,我只想看到"Count"的最大值).
The lower value John 10 has been dropped (I only want to see the highest value of "Count" based on the same value for "Name").
在SQL中,它将类似于Select Case查询(其中,我选择Name == Name& Count>的情况)以递归计数以确定最高编号.或者为每个名称提供一个For循环,但据我所知由于对象的性质,在DataFrames中使用它是一个坏主意.
In SQL it would be something like a Select Case query (wherein I select the Case where Name == Name & Count > Count recursively to determine the highest number. Or a For loop for each name, but as I understand loops in DataFrames is a bad idea due to the nature of the object.
是否可以使用Python中的DF执行此操作?我可以为每个变量创建一个新的数据帧(一个只有John的变量,然后获得最大值(df.value()[:1]或类似的值.)但是我有成百上千个唯一的条目,这似乎是一个糟糕的解决方案. :D
Is there a way to do this with a DF in Python? I could create a new data frame with each variable (one with Only John and then get the highest value (df.value()[:1] or similar. But as I have many hundreds of unique entries that seems like a terrible solution. :D
推荐答案
sort_values
和drop_duplicates
,
df1.sort_values('Count').drop_duplicates('Name', keep='last')
Name Count
1 Mary 22
2 John 50
或者,就像miradulo所说的,groupby
和max
.
Or, like miradulo said, groupby
and max
.
df1.groupby('Name')['Count'].max().reset_index()
Name Count
0 John 50
1 Mary 22
这篇关于删除重复项,使该行在另一列中保持最高值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!