大 pandas 丢弃重复;值相反的顺序 [英] Pandas drop duplicates; values in reverse order
问题描述
我试图找到一种方法来利用熊猫 drop_duplicates()
来识别当这些值是相反的顺序时,这些行是重复的。
I'm trying to find a way to utilize pandas drop_duplicates()
to recognize that rows are duplicates when the values are in reverse order.
一个例子是如果我试图找到客户购买苹果和香蕉的交易,但数据收集顺序可能已经扭转了这些项目。换句话说,当作为完整订单组合时,交易被视为重复,因为它由相同的项目组成。
An example is if I am trying to find transactions where customers purchases both apples and bananas, but the data collection order may have reversed the items. In other words, when combined as a full order the transaction is seen as a duplicate because it is made up up of the same items.
我想要将以下内容识别为重复:
I want the following to be recognized as duplicates:
Item1 Item2
Apple Banana
Banana Apple
推荐答案
p>首先按行排列 应用
排序
然后 drop_duplicates
:
First sort by rows with apply
sorted
and then drop_duplicates
:
df = df.apply(sorted, axis=1).drop_duplicates()
print (df)
Item1 Item2
0 Apple Banana
#if need specify columns
cols = ['Item1','Item2']
df[cols] = df[cols].apply(sorted, axis=1)
df = df.drop_duplicates(subset=cols)
print (df)
Item1 Item2
0 Apple Banana
另一个解决方案是 numpy.sort
和 DataFrame
构造函数:
Another solution with numpy.sort
and DataFrame
constructor:
df = pd.DataFrame(np.sort(df.values, axis=1), index=df.index, columns=df.columns)
.drop_duplicates()
print (df)
Item1 Item2
0 Apple Banana
这篇关于大 pandas 丢弃重复;值相反的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!