矢量化搜索排序 numpy [英] Vectorized searchsorted numpy

查看:47
本文介绍了矢量化搜索排序 numpy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个数组 AB,其中 AB 都是 mxn.我现在的目标是,对于 AB 的每一行,找到我应该在哪里插入 A 的 i 行的元素B对应的行.也就是说,我希望将 np.digitizenp.searchsorted 应用于 AB 的每一行.

Assume that I have two arrays A and B, where both A and B are m x n. My goal is now, for each row of A and B, to find where I should insert the elements of row i of A in the corresponding row of B. That is, I wish to apply np.digitize or np.searchsorted to each row of A and B.

我天真的解决方案是简单地遍历行.但是,这对我的应用程序来说太慢了.因此,我的问题是:是否存在我尚未找到的任一算法的矢量化实现?

My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?

推荐答案

与前一行相比,我们可以为每一行添加一些偏移量.我们将对两个数组使用相同的偏移量.这个想法是在此后在输入数组的扁平版本上使用 np.searchsorted,因此 b 中的每一行都将被限制为在 中的相应行中查找排序位置一个.此外,为了让它也适用于负数,我们只需要对最小数进行偏移.

We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use np.searchsorted on flattened version of input arrays thereafter and thus each row from b would be restricted to find sorted positions in the corresponding row in a. Additionally, to make it work for negative numbers too, we just need to offset for the minimum numbers as well.

所以,我们会有一个像这样的矢量化实现 -

So, we would have a vectorized implementation like so -

def searchsorted2d(a,b):
    m,n = a.shape
    max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
    r = max_num*np.arange(a.shape[0])[:,None]
    p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
    return p - n*(np.arange(m)[:,None])

运行时测试 -

In [173]: def searchsorted2d_loopy(a,b):
     ...:     out = np.zeros(a.shape,dtype=int)
     ...:     for i in range(len(a)):
     ...:         out[i] = np.searchsorted(a[i],b[i])
     ...:     return out
     ...: 

In [174]: # Setup input arrays
     ...: a = np.random.randint(11,99,(10000,20))
     ...: b = np.random.randint(11,99,(10000,20))
     ...: a = np.sort(a,1)
     ...: b = np.sort(b,1)
     ...: 

In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True

In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop

In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop

这篇关于矢量化搜索排序 numpy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆