pandas 如何不管顺序比较2个数据帧的行 [英] pandas how to compare rows of 2 dataframes regardless of order

查看:58
本文介绍了 pandas 如何不管顺序比较2个数据帧的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import pandas as pd
df1 = pd.DataFrame(index=[1,2,3,4])


df1['A'] = [1,2,5,4]
df1['B'] = [5,6,9,8]
df1['C'] = [9,10,1,12]

>>> df1
   A  B   C
1  1  5   9
2  2  6  10
3  5  9   1
4  4  8  12

我想比较 df1 的行并得到 row1(1,5,9) == row3(5,9,1) 的结果.

I want to compare rows of df1 and get a result of row1(1,5,9) == row3(5,9,1).

这意味着我只关心包含的行项目而忽略行项目的顺序.

It means I care only contained items of row and ignore order of items of row.

推荐答案

我认为需要按 np.sort:

I think need sorting each row by np.sort:

df2 = pd.DataFrame(np.sort(df1.values, axis=1), index=df1.index, columns=df1.columns)
print (df2)
   A  B   C
1  1  5   9
2  2  6  10
3  1  5   9
4  4  8  12

然后通过由 duplicated:

And then remove duplicates by inverted (~) boolean mask created by duplicated:

df2 = pd.DataFrame(np.sort(df1.values, axis=1), index=df1.index)
print (df2)
   0  1   2
1  1  5   9
2  2  6  10
3  1  5   9
4  4  8  12

df1 = df1[~df2.duplicated()]
print (df1)
   A  B   C
1  1  5   9
2  2  6  10
4  4  8  12

这篇关于 pandas 如何不管顺序比较2个数据帧的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆