Pandas:删除以任何顺序存在的重复项 [英] Pandas: remove duplicates that exist in any order
问题描述
我的问题类似于熊猫:从数据框中删除反向重复,但是我有一个额外的要求.我需要维护行值对.
My question is similar to Pandas: remove reverse duplicates from dataframe but I have an additional requirement. I need to maintain row value pairs.
例如:
我有 data
,其中 A
列对应于 C
列,B
列对应于 列D
.
I have data
where column A
corresponds to column C
and column B
corresponds to column D
.
import pandas as pd
# Initial data frame
data = pd.DataFrame({'A': [0, 10, 11, 21, 22, 35, 5, 50],
'B': [50, 22, 35, 5, 10, 11, 21, 0],
'C': ["a", "b", "r", "x", "c", "w", "z", "y"],
'D': ["y", "c", "w", "z", "b", "r", "x", "a"]})
data
# A B C D
#0 0 50 a y
#1 10 22 b c
#2 11 35 r w
#3 21 5 x z
#4 22 10 c b
#5 35 11 w r
#6 5 21 z x
#7 50 0 y a
我想删除 A
和 B
列中存在的重复项,但我需要在 C
和 C
列中保留它们对应的字母值D
.
I would like to remove duplicates that exist in columns A
and B
but I need to preserve their corresponding letter value in columns C
and D
.
我在这里有一个解决方案,但有没有更优雅的方法来做到这一点?
I have a solution here but is there a more elegant way of doing this?
# Desired data frame
new_data = pd.DataFrame()
# Concat numbers and corresponding letters
new_data['AC'] = data['A'].astype(str) + ',' + data['C']
new_data['BD'] = data['B'].astype(str) + ',' + data['D']
# Drop duplicates despite order
new_data = new_data.apply(lambda r: sorted(r), axis = 1).drop_duplicates()
# Recreate dataframe
new_data = pd.DataFrame.from_items(zip(new_data.index, new_data.values)).T
new_data = pd.concat([new_data.iloc[:,0].str.split(',', expand=True),
new_data.iloc[:,1].str.split(',', expand=True)], axis=1)
new_data.columns=['A', 'B', 'C', 'D']
new_data
# A B C D
#0 0 a 50 y
#1 10 b 22 c
#2 11 r 35 w
#3 21 x 5 z
EDIT 从技术上讲,输出应如下所示:
EDIT technically output should look like this:
new_data.columns=['A', 'C', 'B', 'D']
new_data
# A B C D
#0 0 a 50 y
#1 10 b 22 c
#2 11 r 35 w
#3 21 x 5 z
推荐答案
我认为你可以用 stack
、drop_duplicates
和 unstack
做到这一点>:
I think that you can do this with stack
, drop_duplicates
and unstack
:
data.set_index(['A','B']).stack().drop_duplicates().unstack().reset_index()
A B C D
0 0 50 a y
1 10 22 b c
2 11 35 r w
3 21 5 x z
这篇关于Pandas:删除以任何顺序存在的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!