在 numpy 中以可变大小的块重复 [英] Repeat but in variable sized chunks in numpy

查看:34
本文介绍了在 numpy 中以可变大小的块重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数组,它是不同块的串联:

a = np.array([0, 1, 2, 10, 11, 20, 21, 22, 23])# ><><><块 = np.array([3, 2, 4])重复 = np.array([1, 3, 2])

上例中以新十年开头的每个段都是一个单独的块";我想重复一遍.每个块的大小和重复次数都是已知的.我不能在 kronrepeat 之后进行重塑,因为块的大小不同.

我想要的结果是

np.array([0, 1, 2, 10, 11, 10, 11, 10, 11, 20, 21, 22, 23, 20, 21, 22, 23])# 重复:>1 <>3<>2<

这很容易在循环中完成:

in_offset = np.r_[0, np.cumsum(chunks[:-1])]out_offset = np.r_[0, np.cumsum(chunks[:-1] * repeats[:-1])]输出 = np.zeros((chunks * repeats).sum(), dtype=a.dtype)对于范围内的 c(len(chunks)):对于范围内的 r(重复 [c]):对于范围内的 i(chunks[c]):输出[out_offset[c] + r * chunks[c] + i] = a[in_offset[c] + i]

这导致以下矢量化:

regions = 块 * 重复索引 = np.arange(regions.sum())段= np.repeat(块,重复)重置 = np.cumsum(segments[:-1])偏移量 = np.zeros_like(index)偏移量[重置] = 段[:-1]偏移量[np.cumsum(regions[:-1])] -= chunks[:-1]索引 -= np.cumsum(offsets)输出 = a[索引]

有没有更有效的方法来向量化这个问题?为了让我们清楚,我不是要求进行代码审查.我对这些函数调用如何协同工作感到满意.我想知道是否有完全不同(更有效)的函数调用组合可以用来实现相同的结果.

这个问题的灵感来自

I have an array that is the concatenation of different chunks:

a = np.array([0, 1, 2, 10, 11, 20, 21, 22, 23])
#             >     <  >    <  >            <
chunks = np.array([3, 2, 4])
repeats = np.array([1, 3, 2])

Each segment starting with a new decade in the example above is a separate "chunk" that I would like to repeat. The chunk sizes and number of repetitions are known for each. I can't do a reshape followed by kron or repeat because the chunks are different sizes.

The result I would like is

np.array([0, 1, 2, 10, 11, 10, 11, 10, 11, 20, 21, 22, 23, 20, 21, 22, 23])
# repeats:>  1  <  >         3          <  >              2             <

This is easy to do in a loop:

in_offset = np.r_[0, np.cumsum(chunks[:-1])]
out_offset = np.r_[0, np.cumsum(chunks[:-1] * repeats[:-1])]
output = np.zeros((chunks * repeats).sum(), dtype=a.dtype)
for c in range(len(chunks)):
    for r in range(repeats[c]):
        for i in range(chunks[c]):
            output[out_offset[c] + r * chunks[c] + i] = a[in_offset[c] + i]

This leads to the following vectorization:

regions = chunks * repeats
index = np.arange(regions.sum())

segments = np.repeat(chunks, repeats)
resets = np.cumsum(segments[:-1])
offsets = np.zeros_like(index)
offsets[resets] = segments[:-1]
offsets[np.cumsum(regions[:-1])] -= chunks[:-1]

index -= np.cumsum(offsets)

output = a[index]

Is there a more efficient way to vectorize this problem? Just so we are clear, I am not asking for a code review. I am happy with how these function calls work together. I would like to know if there is an entirely different (more efficient) combination of function calls I could use to achieve the same result.

This question was inspired by my answer to this question.

解决方案

An even more "numpythonic" way of solving this than the other answer is -

np.concatenate(np.repeat(np.split(a, np.cumsum(chunks))[:-1], repeats))

array([ 0,  1,  2, 10, 11, 10, 11, 10, 11, 20, 21, 22, 23, 20, 21, 22, 23])

Notice, no explicit for-loops.

(np.split has an implicit loop as pointed out by @Divakar).


EDIT: Benchmarks (MacBook pro 13) -

Divakar's solution scales better for larger arrays, chunks and repeats as @Mad Physicist pointed out in his post.

这篇关于在 numpy 中以可变大小的块重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆