numpy-从具有间距的数组中选择元素 [英] numpy - selecting elements from an array with spacing

查看:326
本文介绍了numpy-从具有间距的数组中选择元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个numpy数组,带有一堆单调递增的值.说,

I have a numpy array with a bunch of monotonically increasing values. Say,

a = [1,2,3,4,6,10,10,11,14]
a_arr=np.array(a)

也说

thresh = 4

我想创建一个包含a_arr子集的索引的数组,该子集遍历该数组,选择元素,但忽略与上一个选择至少相距thresh的元素.使用算法可能更容易描述这一点:

I want to create an array that contains the indices of a subset of a_arr which steps through the array, selecting elements but ignoring elements that aren't spaced at least thresh away from the last selection. This might be more easily described with an algorithm:

def select_idx(a, thresh):
    ret = []
    for idx, elt in enumerate(a):
        if len(ret) == 0 or elt >= a[ret[-1]] + thresh:
            ret.append(idx)
    return ret

很明显,我可以使用此功能来完成此操作,但这似乎很慢.有什么办法可以在numpy中将其向量化吗?

Obviously I could do this using exactly this function, but that seems slow. Any way to vectorize this in numpy?

谢谢.

P.S.在此示例中,select_idx(a,thresh)= [0,4,5,8]

P.S. In this example, select_idx(a, thresh) = [0, 4, 5, 8]

编辑:此问题的近似版本可能更易于矢量化:将数字行划分为大小为thresh的存储桶,我猜想是从a中的第一个值开始.因此,本例中的存储分区分隔符为0、4、8、12、16,....选择数字的索引,这些索引是其存储分区中的第一个元素. (是的,我意识到这与我之前写的不一样.)

Edit: An approximate version of this problem might be easier to vectorize: divide the number line into buckets of size thresh, I guess starting from the first value in a. So the bucket dividers in this example would be 0, 4, 8, 12, 16, .... Select the indices of the numbers that are the first element in their bucket. (Yes, I realize this isn't the same thing as what I wrote before.)

推荐答案

以下是您的近似问题的矢量化解决方案:

Here's a vectorized solution to your approximate problem:

idx = np.cumsum(np.bincount((a-a[0])/thresh))[:-1]

这将为您提供除第一个零(始终存在)之外的所有索引.这是解释:

This gives you all the indices except for the first zero, which is always present. Here's the explanation:

  1. (a-a[0])/thresh进行整数除法(假设a具有整数dtype)将值分成宽度为thresh的组.

  1. (a-a[0])/thresh does integer division (assuming a has an integer dtype) to bin the values into groups thresh wide.

cumsum(bincount(...))计算每个组的大小并将其转换为索引.请注意,如果存储桶中没有任何值,bincount将报告0,因此此数组中可能有重复项.

cumsum(bincount(...)) counts the size of each group and converts them into indices. Note that if there's no values in a bucket bincount will report 0, so there may be repeats in this array.

最后,我们丢弃最后一个索引,该索引对应于a的大小.另外,如果索引的顺序无关紧要,则可以利用它来获取零索引:

Finally, we discard the last index, which corresponds to the size of a. Alternatively, if the order of indices doesn't matter, you could exploit this to get your zero index back:

idx = np.cumsum(np.bincount((a-a[0])/thresh)) % len(a)

这篇关于numpy-从具有间距的数组中选择元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆