二维数组的numpy重复 [英] Numpy repeat for 2d array

查看:783
本文介绍了二维数组的numpy重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出两个数组,例如

arr = array([10, 24, 24, 24,  1, 21,  1, 21,  0,  0], dtype=int32)
rep = array([3, 2, 2, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

np.repeat(arr,rep)返回

np.repeat(arr, rep) returns

array([10, 10, 10, 24, 24, 24, 24], dtype=int32)

有没有办法为一组2D阵列复制此功能?

Is there any way to replicate this functionality for a set of 2D arrays?

给出了

arr = array([[10, 24, 24, 24,  1, 21,  1, 21,  0,  0],
            [10, 24, 24,  1, 21,  1, 21, 32,  0,  0]], dtype=int32)
rep = array([[3, 2, 2, 0, 0, 0, 0, 0, 0, 0],
            [2, 2, 2, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)

是否可以创建矢量化函数?

is it possible to create a function which vectorizes?

PS:每一行中的重复次数不必相同.我要对每个结果行进行填充,以确保它们的大小相同.

PS: The number of repeats in each row need not be the same. I'm padding each result row to ensure that they are of same size.

def repeat2d(arr, rep):
    # Find the max length of repetitions in all the rows. 
    max_len = rep.sum(axis=-1).max()  
    # Create a common array to hold all results. Since each repeated array will have 
    # different sizes, some of them are padded with zero.
    ret_val = np.empty((arr.shape[0], maxlen))  
    for i in range(arr.shape[0]):
        # Repeated array will not have same num of cols as ret_val.
        temp = np.repeat(arr[i], rep[i])
        ret_val[i,:temp.size] = temp
    return ret_val 

我确实了解np.vectorize,而且与普通版本相比,它没有任何性能上的好处.

I do know about np.vectorize and I know that it does not give any performance benefits over the normal version.

推荐答案

因此,每行有不同的重复数组吗?但是每行重复的总数是相同的吗?

So you have a different repeat array for each row? But the total number of repeats per row is the same?

只需在展平的数组上执行repeat,然后将其重新整形为正确的行数.

Just do the repeat on the flattened arrays, and reshape back to the correct number of rows.

In [529]: np.repeat(arr,rep.flat)
Out[529]: array([10, 10, 10, 24, 24, 24, 24, 10, 10, 24, 24, 24, 24,  1])
In [530]: np.repeat(arr,rep.flat).reshape(2,-1)
Out[530]: 
array([[10, 10, 10, 24, 24, 24, 24],
       [10, 10, 24, 24, 24, 24,  1]])

如果每行的重复次数不同,则存在填充可变长度行的问题.其他SO问题也提到了这一点.我不记得所有的细节,但是我认为解决方案是这样的:

If the repetitions per row vary, we have the problem of padding variable length rows. That's come up in other SO questions. I don't recall all the details, but I think the solution is along this line:

更改rep,以便数字不同:

In [547]: rep
Out[547]: 
array([[3, 2, 2, 0, 0, 0, 0, 0, 0, 0],
       [2, 2, 2, 1, 0, 2, 0, 0, 0, 0]])
In [548]: lens=rep.sum(axis=1)
In [549]: lens
Out[549]: array([7, 9])
In [550]: m=np.max(lens)
In [551]: m
Out[551]: 9

创建目标:

In [552]: res = np.zeros((arr.shape[0],m),arr.dtype)

创建索引数组-需要制定详细信息:

create an indexing array - details need to be worked out:

In [553]: idx=np.r_[0:7,m:m+9]
In [554]: idx
Out[554]: array([ 0,  1,  2,  3,  4,  5,  6,  9, 10, 11, 12, 13, 14, 15, 16, 17])

平面索引分配:

In [555]: res.flat[idx]=np.repeat(arr,rep.flat)
In [556]: res
Out[556]: 
array([[10, 10, 10, 24, 24, 24, 24,  0,  0],
       [10, 10, 24, 24, 24, 24,  1,  1,  1]])

这篇关于二维数组的numpy重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆