如何拆分numpy数组并在拆分数组上执行某些操作[Python] [英] how to split numpy array and perform certain actions on split arrays [Python]

查看:92
本文介绍了如何拆分numpy数组并在拆分数组上执行某些操作[Python]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以前仅问过一部分问题( [1] [2] ),其中介绍了如何拆分numpy数组.我对Python很陌生.我有一个包含262144个项目的数组,想将其拆分为长度为512的小数组,分别对它们进行排序并求和它们的前五个值,但是我不确定超出这一行的范围如何:

Only part of this question has been asked before ([1][2]) , which explained how to split numpy arrays. I am quite new in Python. I have an array containing 262144 items and want to split it in small arrays of a length of 512, sort them individually and sum up their first five values but I am unsure how beyond this line :

np.array_split(vector, 512)

如何调用和分析每个数组?继续使用numpy数组还是一个好主意,还是应该还原并使用Dictionary呢?

How do I call and analyse each array ? Would it be good idea to continue to use numpy array or should I revert back and use dictionary instead ?

推荐答案

这样分割并不是一个有效的解决方案,相反,我们可以重塑形状,从而有效地将子数组创建为 2D 数组的行.这些将是输入数组的视图,因此那里没有额外的内存需求.然后,我们将获得argsort索引,并选择每行的前五个索引,最后将它们加起来以获得所需的输出.

Splitting as such won't be an efficient solution, instead we could reshape, which effectively creates subarrays as rows of a 2D array. These would be views into the input array, so no additional memory requirement there. Then, we would get argsort indices and select first five indices per row and finally sum those up for the desired output.

因此,我们将有一个类似的实现-

Thus, we would have an implementation like so -

N = 512 # Number of elements in each split array
M = 5   # Number of elements in each subarray for sorting and summing

b = a.reshape(-1,N)
out = b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)

分步示例运行-

In [217]: a   # Input array
Out[217]: array([45, 19, 71, 53, 20, 33, 31, 20, 41, 19, 38, 31, 86, 34])

In [218]: N = 7 # 512 for original case, 7 for sample

In [219]: M = 5

# Reshape into M rows 2D array
In [220]: b = a.reshape(-1,N)

In [224]: b
Out[224]: 
array([[45, 19, 71, 53, 20, 33, 31],
       [20, 41, 19, 38, 31, 86, 34]])

# Get argsort indices per row
In [225]: b.argsort(1)
Out[225]: 
array([[1, 4, 6, 5, 0, 3, 2],
       [2, 0, 4, 6, 3, 1, 5]])

# Select first M ones
In [226]: b.argsort(1)[:,:M]
Out[226]: 
array([[1, 4, 6, 5, 0],
       [2, 0, 4, 6, 3]])

# Use fancy-indexing to select those M ones per row
In [227]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]]
Out[227]: 
array([[19, 20, 31, 33, 45],
       [19, 20, 31, 34, 38]])

# Finally sum along each row
In [228]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Out[228]: array([148, 142])

通过 np.argpartition -

out = b[np.arange(b.shape[0])[:,None], np.argpartition(b,M,axis=1)[:,:M]].sum(1)

运行时测试-

In [236]: a = np.random.randint(11,99,(512*512))

In [237]: N = 512

In [238]: M = 5

In [239]: b = a.reshape(-1,N)

In [240]: %timeit b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
100 loops, best of 3: 14.2 ms per loop

In [241]: %timeit b[np.arange(b.shape[0])[:,None], \
                np.argpartition(b,M,axis=1)[:,:M]].sum(1)
100 loops, best of 3: 3.57 ms per loop

这篇关于如何拆分numpy数组并在拆分数组上执行某些操作[Python]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆