python-如何删除每行( pandas )中的重复列表? [英] python - how to delete duplicate list in each row (pandas)?
本文介绍了python-如何删除每行( pandas )中的重复列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在每一行中都有一个列表,我想通过保留分数中的最高值来删除重复的元素.
I have a list contained in each row and I would like to delete duplicated element by keeping the highest value from a score.
这是我来自数据帧df1的数据
here is my data from data frame df1
pair score
0 [A , A ] 1.0000
1 [A , F ] 0.9990
2 [A , G ] 0.9985
3 [A , G ] 0.9975
4 [A , H ] 0.9985
5 [A , H ] 0.9990
我希望看到的结果是
pair score
0 [A , A ] 1.0000
1 [A , F ] 0.9990
2 [A , G ] 0.9985
4 [A , H ] 0.9990
我尝试使用分组依据并设置分数=最大值,但不起作用
I have tried to use group by and set a score = max, but it's not working
推荐答案
First I think working with list
s in pandas is not good idea.
如果将列表转换为带有元组的帮助器列,则解决方案有效-然后 drop_duplicates
:
Solution working if convert lists to helper column with tuples - then sort_values
with drop_duplicates
:
df['new'] = df.pair.apply(tuple)
df = df.sort_values('score', ascending=False).drop_duplicates('new')
print (df)
pair score new
0 [A, A] 1.0000 (A, A)
1 [A, F] 0.9990 (A, F)
5 [A, H] 0.9990 (A, H)
2 [A, G] 0.9985 (A, G)
或添加2个新列:
df[['a', 'b']] = pd.DataFrame(df.pair.values.tolist())
df = df.sort_values('score', ascending=False).drop_duplicates(['a', 'b'])
print (df)
pair score a b
0 [A, A] 1.0000 A A
1 [A, F] 0.9990 A F
5 [A, H] 0.9990 A H
2 [A, G] 0.9985 A G
这篇关于python-如何删除每行( pandas )中的重复列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文