大 pandas 丢弃重复;值相反的顺序 [英] Pandas drop duplicates; values in reverse order

查看:123
本文介绍了大 pandas 丢弃重复;值相反的顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到一种方法来利用熊猫 drop_duplicates()来识别当这些值是相反的顺序时,这些行是重复的。

I'm trying to find a way to utilize pandas drop_duplicates() to recognize that rows are duplicates when the values are in reverse order.

一个例子是如果我试图找到客户购买苹果和香蕉的交易,但数据收集顺序可能已经扭转了这些项目。换句话说,当作为完整订单组合时,交易被视为重复,因为它由相同的项目组成。

An example is if I am trying to find transactions where customers purchases both apples and bananas, but the data collection order may have reversed the items. In other words, when combined as a full order the transaction is seen as a duplicate because it is made up up of the same items.

我想要将以下内容识别为重复:

I want the following to be recognized as duplicates:

Item1   Item2
Apple   Banana
Banana  Apple


推荐答案

p>首先按行排列 应用 排序然后 drop_duplicates

First sort by rows with apply sorted and then drop_duplicates:

df = df.apply(sorted, axis=1).drop_duplicates()
print (df)
   Item1   Item2
0  Apple  Banana







#if need specify columns
cols = ['Item1','Item2']
df[cols] = df[cols].apply(sorted, axis=1)
df = df.drop_duplicates(subset=cols)
print (df)
   Item1   Item2
0  Apple  Banana

另一个解决方案是 numpy.sort DataFrame 构造函数:

Another solution with numpy.sort and DataFrame constructor:

df = pd.DataFrame(np.sort(df.values, axis=1), index=df.index, columns=df.columns)
       .drop_duplicates()
print (df)
   Item1   Item2
0  Apple  Banana

这篇关于大 pandas 丢弃重复;值相反的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆