numpy-从具有间距的数组中选择元素 [英] numpy - selecting elements from an array with spacing
问题描述
我有一个numpy数组,带有一堆单调递增的值.说,
I have a numpy array with a bunch of monotonically increasing values. Say,
a = [1,2,3,4,6,10,10,11,14]
a_arr=np.array(a)
也说
thresh = 4
我想创建一个包含a_arr
子集的索引的数组,该子集遍历该数组,选择元素,但忽略与上一个选择至少相距thresh
的元素.使用算法可能更容易描述这一点:
I want to create an array that contains the indices of a subset of a_arr
which steps through the array, selecting elements but ignoring elements that aren't spaced at least thresh
away from the last selection. This might be more easily described with an algorithm:
def select_idx(a, thresh):
ret = []
for idx, elt in enumerate(a):
if len(ret) == 0 or elt >= a[ret[-1]] + thresh:
ret.append(idx)
return ret
很明显,我可以使用此功能来完成此操作,但这似乎很慢.有什么办法可以在numpy中将其向量化吗?
Obviously I could do this using exactly this function, but that seems slow. Any way to vectorize this in numpy?
谢谢.
P.S.在此示例中,select_idx(a,thresh)= [0,4,5,8]
P.S. In this example, select_idx(a, thresh) = [0, 4, 5, 8]
编辑:此问题的近似版本可能更易于矢量化:将数字行划分为大小为thresh
的存储桶,我猜想是从a中的第一个值开始.因此,本例中的存储分区分隔符为0、4、8、12、16,....选择数字的索引,这些索引是其存储分区中的第一个元素. (是的,我意识到这与我之前写的不一样.)
Edit: An approximate version of this problem might be easier to vectorize: divide the number line into buckets of size thresh
, I guess starting from the first value in a. So the bucket dividers in this example would be 0, 4, 8, 12, 16, .... Select the indices of the numbers that are the first element in their bucket. (Yes, I realize this isn't the same thing as what I wrote before.)
推荐答案
以下是您的近似问题的矢量化解决方案:
Here's a vectorized solution to your approximate problem:
idx = np.cumsum(np.bincount((a-a[0])/thresh))[:-1]
这将为您提供除第一个零(始终存在)之外的所有索引.这是解释:
This gives you all the indices except for the first zero, which is always present. Here's the explanation:
-
(a-a[0])/thresh
进行整数除法(假设a
具有整数dtype)将值分成宽度为thresh
的组.
(a-a[0])/thresh
does integer division (assuminga
has an integer dtype) to bin the values into groupsthresh
wide.
cumsum(bincount(...))
计算每个组的大小并将其转换为索引.请注意,如果存储桶中没有任何值,bincount
将报告0,因此此数组中可能有重复项.
cumsum(bincount(...))
counts the size of each group and converts them into indices. Note that if there's no values in a bucket bincount
will report 0, so there may be repeats in this array.
最后,我们丢弃最后一个索引,该索引对应于a
的大小.另外,如果索引的顺序无关紧要,则可以利用它来获取零索引:
Finally, we discard the last index, which corresponds to the size of a
. Alternatively, if the order of indices doesn't matter, you could exploit this to get your zero index back:
idx = np.cumsum(np.bincount((a-a[0])/thresh)) % len(a)
这篇关于numpy-从具有间距的数组中选择元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!