如何删除在数据框中同时出现在两列中相同的行？ [英] how to remove rows that appear same in two columns simultaneously in dataframe?

查看：151 发布时间：2020/10/16 22:50:48 python pandas dataframe

本文介绍了如何删除在数据框中同时出现在两列中相同的行？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据帧DF1

I have a Dataframe, DF1

   Id1   Id2  
0  286   409 
1  286   257  
2  409   286    
3  257   183

在此DF中，对我来说，行 286,409 和 409,286 相同。我只想保留这些行之一。我要做的就是使用 Networkx python库建立网络图。

In this DF, for me rows 286,409 and 409,286 are same. I only want to keep one of these rows. All this I am doing is to build a network graph using Networkx python library.

我尝试通过创建另一个具有互换列的df，例如DF2

I have tried achieving it by creating another df with interchanged columns like, DF2

   Id2   Id1
0  409   286
1  257   286
2  286   409
3  183   257

然后我使用 isin 函数类似

DF1 [DF1 [['Id1'，'Id2' ]]。isin（DF2 [[['Id2'，'Id1']]）]
，但它按原样打印DF1。

DF1[DF1[['Id1', 'Id2']].isin(DF2[['Id2', 'Id1']])] but it prints DF1 as it was.

预期输出DF：

   Id1   Id2  
0  286   409 
1  286   257     
3  257   183

感谢您的帮助。

推荐答案

我相信您需要按 np.sort 对两列进行排序并按 DataFrame.duplicated ，带有反掩码：

I believe you need sorting both columns by np.sort and filter by DataFrame.duplicated with inverse mask:

df1 = pd.DataFrame(np.sort(DF1[['Id1', 'Id2']].to_numpy(), axis=1), index=DF1.index)

df = DF1[~df1.duplicated()]
print (df)
   Id1  Id2
0  286  409
1  286  257
3  257  183

详细信息：如果使用 numpy.sort 与 axis = 1 它按行排序，因此第一个和第三个'row'是相同的：

Detail : If use numpy.sort with axis=1 it sorting per rows, so first and third 'row' are same:

print (np.sort(DF1[['Id1', 'Id2']].to_numpy(), axis=1))
[[286 409]
 [257 286]
 [286 409]
 [183 257]]

然后使用 DataFrame。复制的 函数（与DataFrame一起使用，因此我们ed DataFrame构造函数）：

Then use DataFrame.duplicated function (working with DataFrame, so used DataFrame constructor):

df1 = pd.DataFrame(np.sort(DF1[['Id1', 'Id2']].to_numpy(), axis=1), index=DF1.index)
print (df1)
     0    1
0  286  409
1  257  286
2  286  409
3  183  257

第三值重复：

print (df1.duplicated())
0    False
1    False
2     True
3    False
dtype: bool

最后一个必需的反转掩码来删除重复项，输出用 布尔索引 ：

Last is necessary invert mask for remove duplicates, output is filtered in boolean indexing:

print (DF1[~df1.duplicated()])
   Id1  Id2
0  286  409
1  286  257
3  257  183

这篇关于如何删除在数据框中同时出现在两列中相同的行？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何删除在数据框中同时出现在两列中相同的行？ [英] how to remove rows that appear same in two columns simultaneously in dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何删除在数据框中同时出现在两列中相同的行？ [英] how to remove rows that appear same in two columns simultaneously in dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭