如何在不进行迭代的情况下基于成对的开始/结束索引来定义numpy数组的多个切片? [英] How can I define multiple slices of a numpy array based on pairs of start/end indices without iterating?

查看：70 发布时间：2020/5/18 22:57:32 python arrays numpy

本文介绍了如何在不进行迭代的情况下基于成对的开始/结束索引来定义numpy数组的多个切片?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个整数的numpy数组.

I have a numpy array of integers.

我还有另外两个数组，分别代表该数组的开始和长度(或者可以是开始和结束)索引，这些索引标识了我需要处理的整数序列.序列是可变长度的.

I have two other arrays representing the start and length (or it could be start and end) indices into this array that identify sequences of integers that I need to process. The sequences are variable length.

x=numpy.array([2,3,5,7,9,12,15,21,27,101, 250]) #Can have length of millions

starts=numpy.array([2,7]) # Can have lengths of thousands
ends=numpy.array([5,9])

# required output is x[2:5],x[7:9] in flat 1D array 
# [5,7,9,12,21,27,101]

我可以使用for循环轻松地做到这一点，但是该应用程序对性能很敏感，因此我正在寻找一种无需Python迭代即可完成此操作的方法.

I can do this easily with for loops but the application is performance sensitive so I'm looking for a way to do it without Python iteration.

将不胜感激地收到任何帮助！

Any help will be gratefully received!

道格

推荐答案

方法1

一种矢量化方法是通过广播创建屏蔽-

One vectorized approach would be with masking created off with broadcasting -

In [16]: r = np.arange(len(x))

In [18]: x[((r>=starts[:,None]) & (r<ends[:,None])).any(0)]
Out[18]: array([ 5,  7,  9, 21, 27])

方法2

另一种矢量化方法是使用累积量创建1和0的斜坡(对于许多起始端对应该更好)，就像这样-

Another vectorized way would be with creating ramps of 1s and 0s with cumsum (should be better with many start-end pairs), like so -

idx = np.zeros(len(x),dtype=int)
idx[starts] = 1
idx[ends[ends<len(x)]] = -1
out = x[idx.cumsum().astype(bool)]

方法3

另一种基于循环的实现内存效率的方法可能对starts,ends对中的许多条目都更好-

Another loop-based one to achieve memory-efficiency, could be better with many entries in starts,ends pairs -

mask = np.zeros(len(x),dtype=bool)
for (i,j) in zip(starts,ends):
    mask[i:j] = True
out = x[mask]

方法4

为完整起见，这是另一个with循环，用于选择切片，然后将其分配到已初始化的数组中，并且应该适合从大型数组中选择的切片-

For completeness, here's another with loop to select slices and then assign into an initialized array and should be good on slices to be selected off a large array -

lens = ends-starts
out = np.empty(lens.sum(),dtype=x.dtype)
start = 0
for (i,j,l) in zip(starts,ends,lens):
    out[start:start+l] = x[i:j]
    start += l

如果迭代次数很多，则可以进行较小的优化以减少每次迭代的计算量-

If the iterations are a lot, there's a minor optimization possible to reduce compute per iteration -

lens = ends-starts
lims = np.r_[0,lens].cumsum()
out = np.empty(lims[-1],dtype=x.dtype)
for (i,j,s,t) in zip(starts,ends,lims[:-1],lims[1:]):
    out[s:t] = x[i:j]

这篇关于如何在不进行迭代的情况下基于成对的开始/结束索引来定义numpy数组的多个切片?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在不进行迭代的情况下基于成对的开始/结束索引来定义numpy数组的多个切片? [英] How can I define multiple slices of a numpy array based on pairs of start/end indices without iterating?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在不进行迭代的情况下基于成对的开始/结束索引来定义numpy数组的多个切片? [英] How can I define multiple slices of a numpy array based on pairs of start/end indices without iterating?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭