python-如何删除每行( pandas )中的重复列表? [英] python - how to delete duplicate list in each row (pandas)?

查看:53
本文介绍了python-如何删除每行( pandas )中的重复列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在每一行中都有一个列表,我想通过保留分数中的最高值来删除重复的元素.

I have a list contained in each row and I would like to delete duplicated element by keeping the highest value from a score.

这是我来自数据帧df1的数据

here is my data from data frame df1

        pair    score
0   [A , A ]    1.0000
1   [A , F ]    0.9990
2   [A , G ]    0.9985
3   [A , G ]    0.9975
4   [A , H ]    0.9985
5   [A , H ]    0.9990

我希望看到的结果是

            pair    score
    0   [A , A ]    1.0000
    1   [A , F ]    0.9990
    2   [A , G ]    0.9985
    4   [A , H ]    0.9990

我尝试使用分组依据并设置分数=最大值,但不起作用

I have tried to use group by and set a score = max, but it's not working

推荐答案

首先,我认为在熊猫中使用 list s不是

First I think working with lists in pandas is not good idea.

如果将列表转换为带有元组的帮助器列,则解决方案有效-然后

Solution working if convert lists to helper column with tuples - then sort_values with drop_duplicates:

df['new'] = df.pair.apply(tuple)
df = df.sort_values('score', ascending=False).drop_duplicates('new')
print (df)
     pair   score     new
0  [A, A]  1.0000  (A, A)
1  [A, F]  0.9990  (A, F)
5  [A, H]  0.9990  (A, H)
2  [A, G]  0.9985  (A, G)

或添加2个新列:

df[['a', 'b']] = pd.DataFrame(df.pair.values.tolist())
df = df.sort_values('score', ascending=False).drop_duplicates(['a', 'b'])
print (df)
     pair   score  a  b
0  [A, A]  1.0000  A  A
1  [A, F]  0.9990  A  F
5  [A, H]  0.9990  A  H
2  [A, G]  0.9985  A  G

这篇关于python-如何删除每行( pandas )中的重复列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆