pandas 如何不管顺序比较2个数据帧的行 [英] pandas how to compare rows of 2 dataframes regardless of order
本文介绍了 pandas 如何不管顺序比较2个数据帧的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
import pandas as pd
df1 = pd.DataFrame(index=[1,2,3,4])
df1['A'] = [1,2,5,4]
df1['B'] = [5,6,9,8]
df1['C'] = [9,10,1,12]
>>> df1
A B C
1 1 5 9
2 2 6 10
3 5 9 1
4 4 8 12
我想比较 df1 的行并得到 row1(1,5,9) == row3(5,9,1) 的结果.
I want to compare rows of df1 and get a result of row1(1,5,9) == row3(5,9,1).
这意味着我只关心包含的行项目而忽略行项目的顺序.
It means I care only contained items of row and ignore order of items of row.
推荐答案
我认为需要按 np.sort
:
I think need sorting each row by np.sort
:
df2 = pd.DataFrame(np.sort(df1.values, axis=1), index=df1.index, columns=df1.columns)
print (df2)
A B C
1 1 5 9
2 2 6 10
3 1 5 9
4 4 8 12
然后通过由 duplicated
:
And then remove duplicates by inverted (~)
boolean mask created by duplicated
:
df2 = pd.DataFrame(np.sort(df1.values, axis=1), index=df1.index)
print (df2)
0 1 2
1 1 5 9
2 2 6 10
3 1 5 9
4 4 8 12
df1 = df1[~df2.duplicated()]
print (df1)
A B C
1 1 5 9
2 2 6 10
4 4 8 12
这篇关于 pandas 如何不管顺序比较2个数据帧的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文