在numpy ndarray中沿指定轴循环矢量的有效方法是什么? [英] What are the efficient ways to loop over vectors along a specified axis in numpy ndarray?

查看:75
本文介绍了在numpy ndarray中沿指定轴循环矢量的有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过沿numpy ndarray的轴(可以是任何轴)(可以是任何尺寸)上的向量进行循环处理数据.

I'm processing data by looping over vectors along an axis (could be any axis) of numpy ndarray (could be of any dimensions).

我没有直接处理数组,因为数据并不完美.它要求对每个载体进行质量控制.如果不好,向量将被零(或nan)填充并且没有实际处理.

I didn't work on array directly because the data are not perfect. It requires quality control on each vector. If not good, the vector will be filled by zeros (or nan) and not have a real processing.

我发现了此问题类似,但我的问题要困难得多,因为

I found this Q similar but my problem is much more difficult because

  1. ndim是任意的.
  1. ndim is arbitrary.

对于3D数组,我可以像这样沿着axis 1提取向量

For a 3D array, I can take vectors along axis 1 like this

 x = np.arange(24).reshape(2,3,4)
 for i in range(x.shape[0]):
     for k in range(x.shape[2]):
         process(x[i,:,k])

但是如果ndim和所获取的axis不固定,该如何获取向量?

but if ndim and the taken axis are not fixed, how to take vectors?

  1. 获取矢量的轴是任意的.

我正在考虑的一种可能方法是

One possible way I'm considering is

 y = x.swapaxes(ax,-1)
 # loop over vectors along last axis
 for i in np.ndindex(y.shape[:-1]):
     process(y[i+(slice(None),)])
 # then swap back
 z = y.swapaxes(ax,-1)

但是我怀疑这种方法的效率.

But I'm doubting the efficiency of this method.

推荐答案

测试效率的最佳方法是对实际示例进行时间测试.但是对玩具示例的%timeit(ipython)测试是一个开始.

The best way to test efficiency is to do time tests on realistic examples. But %timeit (ipython) tests on toy examples are a start.

根据回答类似是否必须迭代"问题的经验,时间没有太大差异. np.frompyfunc具有适度的速度边沿-但其pyfunc带有标量,而不是数组或切片. (np.vectorize是此功能更好的API,但速度较慢).

Based on experience from answering similar 'if you must iterate' questions, there isn't much difference in times. np.frompyfunc has a modest speed edge - but its pyfunc takes scalars, not arrays or slices. (np.vectorize is a nicer API to this function, and a bit slower).

但是在这里,您想在遍历所有其他维的同时将数组的一维切片传递给函数.我认为替代迭代方法没有太大区别.

But here you want to pass a 1d slice of an array to your function, while iterating over all the other dimensions. I don't think there's much difference in the alternative iteration methods.

swapaxistransposeravel之类的操作速度很快,通常只是创建具有不同形状和步幅的新视图.

Actions like swapaxis, transpose and ravel are fast, often just creating a new view with different shape and strides.

np.ndindex使用np.nditer(带有multindex平面)在一系列尺寸上进行迭代. nditer在C代码中使用时很快,但是在Python代码中使用时没什么特别的.

np.ndindex uses np.nditer (with the multindex flat) to iterate over a range of dimensions. nditer is fast when used in C code, but isn't anything special when used in Python code.

np.apply_along_axis创建一个(i,j,:,k)索引元组,并逐步执行变量.这是一种很好的通用方法,但是并没有做任何特别的事情来加快速度. itertools.product是生成索引的另一种方法.

np.apply_along_axis creates a (i,j,:,k) indexing tuple, and steps the variables. It's a nice general approach, but isn't doing anything special to speed things up. itertools.product is another way of generating the indices.

但是通常不是通过迭代机制来减慢运行速度,而是对函数的重复调用.您可以使用一个简单的函数来测试迭代机制,例如

But usually it isn't the iteration mechanism that slows things down, it's the repeated call to your function. You can test the iteration mechanism by using a trivial function, e.g.

def foo(x):
   return x

==================

===================

您不需要swapaxes即可使用ndindex;您可以使用它在轴的任意组合上进行迭代.

You don't need to swapaxes to use ndindex; you can use it to iterate on any combination of axes.

例如,制作一个3d数组,然后沿中间维度求和:

For example, make a 3d array, and sum along the middle dimension:

In [495]: x=np.arange(2*3*4).reshape(2,3,4)

In [496]: N=np.ndindex(2,4)

In [497]: [x[i,:,k].sum() for i,k in N]
Out[497]: [12, 15, 18, 21, 48, 51, 54, 57]

In [498]: x.sum(1)
Out[498]: 
array([[12, 15, 18, 21],
       [48, 51, 54, 57]])

我认为这不会影响速度;代码更简单.

I don't think it makes a difference in speed; the code's just simpler.

==================

===================

另一个可能的工具是np.ma,带掩码数组.使用这些元素,您可以将单个元素标记为已屏蔽(因为它们是nan0).它具有用于评估summeanproduct之类的代码的方式,以使被屏蔽的值不会损害解决方案.

Another possible tool is np.ma, masked arrays. With those you mark individual elements as masked (because they are nan or 0). It has code that evaluates things like sum, mean, product in such a way that the masked values don't harm the solution.

再次使用3d数组:

In [517]: x=np.arange(2*3*4).reshape(2,3,4)

添加一些错误的值:

In [518]: x[1,1,2]=99    
In [519]: x[0,0,:]=99

这些值弄乱了正常的sum:

In [520]: x.sum(axis=1)
Out[520]: 
array([[111, 113, 115, 117],
       [ 48,  51, 135,  57]])

但是如果我们屏蔽它们,它们会被过滤"出解决方案(在这种情况下,它们会暂时设置为0)

but if we mask them, they are 'filtered out' of the solution (in this case, they are set temporarily to 0)

In [521]: xm=np.ma.masked_greater(x,50)

In [522]: xm
Out[522]: 
masked_array(data =
 [[[-- -- -- --]
  [4 5 6 7]
  [8 9 10 11]]

 [[12 13 14 15]
  [16 17 -- 19]
  [20 21 22 23]]],
             mask =
 [[[ True  True  True  True]
 ...
  [False False False False]]],
       fill_value = 999999)

In [523]: xm.sum(1)
Out[523]: 
masked_array(data =
 [[12 14 16 18]
 [48 51 36 57]],
 ...)

这篇关于在numpy ndarray中沿指定轴循环矢量的有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆