在numpy中快速找到对称对 [英] Find symmetric pairs quickly in numpy

查看:86
本文介绍了在numpy中快速找到对称对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

from itertools import product
import pandas as pd

df = pd.DataFrame.from_records(product(range(10), range(10)))
df = df.sample(90)
df.columns = "c1 c2".split()
df = df.sort_values(df.columns.tolist()).reset_index(drop=True)
#     c1  c2
# 0    0   0
# 1    0   1
# 2    0   2
# 3    0   3
# 4    0   4
# ..  ..  ..
# 85   9   4
# 86   9   5
# 87   9   7
# 88   9   8
# 89   9   9
# 
# [90 rows x 2 columns]

如何快速查找,识别和删除此数据帧中所有对称对的最后一个重复项?

How do I quickly find, identify, and remove the last duplicate of all symmetric pairs in this data frame?

对称对的一个示例是'(0,1)'等于'(1,0)'.后者应删除.

An example of symmetric pair is that '(0, 1)' is equal to '(1, 0)'. The latter should be removed.

该算法必须快速,因此建议使用numpy.不允许转换为python对象.

The algorithm must be fast, so it is recommended to use numpy. Converting to python object is not allowed.

推荐答案

您可以对值进行排序,然后对groupby:

You can sort the values, then groupby:

a= np.sort(df.to_numpy(), axis=1)
df.groupby([a[:,0], a[:,1]], as_index=False, sort=False).first()

选项2 :如果c1, c2对很多,groupby可能会变慢.在这种情况下,我们可以分配新值并按drop_duplicates进行过滤:

Option 2: If you have a lot of pairs c1, c2, groupby can be slow. In that case, we can assign new values and filter by drop_duplicates:

a= np.sort(df.to_numpy(), axis=1) 

(df.assign(one=a[:,0], two=a[:,1])   # one and two can be changed
   .drop_duplicates(['one','two'])   # taken from above
   .reindex(df.columns, axis=1)
)

这篇关于在numpy中快速找到对称对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆