在 numpy 中以可变大小的块重复 [英] Repeat but in variable sized chunks in numpy
问题描述
我有一个数组,它是不同块的串联:
a = np.array([0, 1, 2, 10, 11, 20, 21, 22, 23])# ><><><块 = np.array([3, 2, 4])重复 = np.array([1, 3, 2])
上例中以新十年开头的每个段都是一个单独的块";我想重复一遍.每个块的大小和重复次数都是已知的.我不能在 kron
或 repeat
之后进行重塑,因为块的大小不同.
我想要的结果是
np.array([0, 1, 2, 10, 11, 10, 11, 10, 11, 20, 21, 22, 23, 20, 21, 22, 23])# 重复:>1 <>3<>2<
这很容易在循环中完成:
in_offset = np.r_[0, np.cumsum(chunks[:-1])]out_offset = np.r_[0, np.cumsum(chunks[:-1] * repeats[:-1])]输出 = np.zeros((chunks * repeats).sum(), dtype=a.dtype)对于范围内的 c(len(chunks)):对于范围内的 r(重复 [c]):对于范围内的 i(chunks[c]):输出[out_offset[c] + r * chunks[c] + i] = a[in_offset[c] + i]
这导致以下矢量化:
regions = 块 * 重复索引 = np.arange(regions.sum())段= np.repeat(块,重复)重置 = np.cumsum(segments[:-1])偏移量 = np.zeros_like(index)偏移量[重置] = 段[:-1]偏移量[np.cumsum(regions[:-1])] -= chunks[:-1]索引 -= np.cumsum(offsets)输出 = a[索引]
有没有更有效的方法来向量化这个问题?为了让我们清楚,我不是要求进行代码审查.我对这些函数调用如何协同工作感到满意.我想知道是否有完全不同(更有效)的函数调用组合可以用来实现相同的结果.
这个问题的灵感来自
I have an array that is the concatenation of different chunks:
a = np.array([0, 1, 2, 10, 11, 20, 21, 22, 23])
# > < > < > <
chunks = np.array([3, 2, 4])
repeats = np.array([1, 3, 2])
Each segment starting with a new decade in the example above is a separate "chunk" that I would like to repeat. The chunk sizes and number of repetitions are known for each. I can't do a reshape followed by kron
or repeat
because the chunks are different sizes.
The result I would like is
np.array([0, 1, 2, 10, 11, 10, 11, 10, 11, 20, 21, 22, 23, 20, 21, 22, 23])
# repeats:> 1 < > 3 < > 2 <
This is easy to do in a loop:
in_offset = np.r_[0, np.cumsum(chunks[:-1])]
out_offset = np.r_[0, np.cumsum(chunks[:-1] * repeats[:-1])]
output = np.zeros((chunks * repeats).sum(), dtype=a.dtype)
for c in range(len(chunks)):
for r in range(repeats[c]):
for i in range(chunks[c]):
output[out_offset[c] + r * chunks[c] + i] = a[in_offset[c] + i]
This leads to the following vectorization:
regions = chunks * repeats
index = np.arange(regions.sum())
segments = np.repeat(chunks, repeats)
resets = np.cumsum(segments[:-1])
offsets = np.zeros_like(index)
offsets[resets] = segments[:-1]
offsets[np.cumsum(regions[:-1])] -= chunks[:-1]
index -= np.cumsum(offsets)
output = a[index]
Is there a more efficient way to vectorize this problem? Just so we are clear, I am not asking for a code review. I am happy with how these function calls work together. I would like to know if there is an entirely different (more efficient) combination of function calls I could use to achieve the same result.
This question was inspired by my answer to this question.
An even more "numpythonic" way of solving this than the other answer is -
np.concatenate(np.repeat(np.split(a, np.cumsum(chunks))[:-1], repeats))
array([ 0, 1, 2, 10, 11, 10, 11, 10, 11, 20, 21, 22, 23, 20, 21, 22, 23])
Notice, no explicit for-loops.
(np.split
has an implicit loop as pointed out by @Divakar).
EDIT: Benchmarks (MacBook pro 13) -
Divakar's solution scales better for larger arrays, chunks and repeats as @Mad Physicist pointed out in his post.
这篇关于在 numpy 中以可变大小的块重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!