Python-删除基于两个列组合的数据框中的重复项? [英] Python - Delete duplicates in a dataframe based on two columns combinations?

查看：345 发布时间：2020/5/23 21:17:57 python pandas sorting dataframe

本文介绍了Python-删除基于两个列组合的数据框中的重复项?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Python中有一个包含3列的数据框:

I have a dataframe with 3 columns in Python:

Name1 Name2 Value
Juan  Ale   1
Ale   Juan  1

，并希望消除基于Name1和Name2组合列的重复项.

and would like to eliminate the duplicates based on columns Name1 and Name2 combinations.

在我的示例中，两行相等(但是顺序不同)，我想删除第二行并保留第一行，所以最终结果应该是:

In my example both rows are equal (but they are in different order), and I would like to delete the second row and just keep the first one, so the end result should be:

Name1 Name2 Value
Juan  Ale   1

任何想法都将不胜感激！

Any idea will be really appreciated!

推荐答案

您可以转换为frozenset并使用

You can convert to frozenset and use pd.DataFrame.duplicated.

res = df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]

print(res)

  Name1 Name2  Value
0  Juan   Ale      1

因为duplicated使用散列检查重复项，所以

frozenset而不是set是必需的.

frozenset is necessary instead of set since duplicated uses hashing to check for duplicates.

与行相比，对列的缩放更好.对于大量行，请使用@Wen的基于排序的算法.

Scales better with columns than rows. For a large number of rows, use @Wen's sort-based algorithm.

这篇关于Python-删除基于两个列组合的数据框中的重复项?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python-删除基于两个列组合的数据框中的重复项? [英] Python - Delete duplicates in a dataframe based on two columns combinations?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python-删除基于两个列组合的数据框中的重复项? [英] Python - Delete duplicates in a dataframe based on two columns combinations?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭