用相同的切片numpy切片重复 [英] Slicing repeadlty with the same slice numpy

查看:126
本文介绍了用相同的切片numpy切片重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个一维numpy数组(大约500万个元素)

I have several one dimensional numpy array (around 5 millions elements)

我必须用相同的切片反复切片.我有一个数组的集合(所有维度都相同),我想用相同的数组索引(数组的相同维度)来切片

I have to slice them repeatedly with the same slice. I have a a collections of arrays (all of the same dimensions ) and I want to slice them with the same array index (same dimension of the arrays)

有没有一种方法可以对所有不同的数组A进行A [index]的计算,它比朴素的方法更有效?

Is there a way to cal A[index] for all the different arrays A which is more efficient than the naive way?

也许可以使用Cython加快速度?

Maybe there’s a way to use Cython to speed things up?

谢谢!

修改

为了使事情更清楚,这是我的设置:我有一个包含数百万个元素的数组A.为了对该数组A执行某种操作,我首先需要对其进行排序;但随后我想恢复原始订单,因此我将其取消排序.我需要重复几次.简而言之:

To make things clearer, this is my setting: I have one array A of several million elements. To perform a certain operation on this array A, I first need to sort it; but then I want to recover the original order, so I un-sort it. I need to repeat this several times. So in short:

A = np.random.rand(5e6, 1)
indices = np.argsort(A)
sortedA = A[indices]
inv_indices = np.argsort(indices)

for _ in range(100):
    fancy_A = fancy_function(sortedA) #returns an array with the same dimensions
    res = fancy_A[inv_indices]
    results.append(res)

我想优化循环内的代码.如您所见,inv_indices始终相同,我认为可能会有更有效的方法.

I want to optimize the code inside the loop. As you can see, inv_indices is always the same, and I thought that there may be a more efficient way of doing that.

谢谢!

推荐答案

由于inv_indices对数组进行了重新排序,而不是选择子集,因此将fancy_A收集到一个位中可能同样快速且节省空间.数组,并将其编入索引.

Since inv_indices reorders the array, rather than selecting subsets, it probably is just as fast, and space efficient, to collect the fancy_A into one bit array, and index that.

results = []
for _ in range(100):
    fancy_A = fancy_function(sortedA) #returns an array with the same dimensions
    #res = fancy_A[inv_indices]
    results.append(fancy_A)

bigA = np.stack(results)
bigA = bigA[:, inv_indices]    # assumes inv_indices is a list or array


如果fancy_A是1d且inv_indices是一个简单列表,则将其直接应用于堆栈:


If the fancy_A is 1d and inv_indices a simple list, then applying it to the stack is straight forward:

In [849]: A = np.random.randint(0,10,10)
In [850]: A
Out[850]: array([0, 1, 5, 7, 4, 4, 0, 6, 9, 1])
In [851]: idx = np.argsort(A)
In [852]: idx
Out[852]: array([0, 6, 1, 9, 4, 5, 2, 7, 3, 8], dtype=int32)
In [853]: A[idx]
Out[853]: array([0, 0, 1, 1, 4, 4, 5, 6, 7, 9])
In [854]: res = [A for _ in range(5)]
In [855]: res = np.stack([A for _ in range(5)])
In [856]: res
Out[856]: 
array([[0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1]])
In [857]: res[:,idx]
Out[857]: 
array([[0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9]])


索引整个数组所需的时间:


On the time it takes to index a whole array:

In [860]: A = np.random.randint(0,1000,100000)
In [861]: idx = np.argsort(A)
In [862]: timeit A.copy()
31.8 µs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [863]: timeit A[idx]
332 µs ± 9.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

这篇关于用相同的切片numpy切片重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆