Python向量化嵌套循环 [英] Python vectorizing nested for loops

查看:127
本文介绍了Python向量化嵌套循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在帮助您发现和理解以Python方式优化嵌套for循环中的以下数组操作的方法方面,我将不胜感激.

I'd appreciate some help in finding and understanding a pythonic way to optimize the following array manipulations in nested for loops:

def _func(a, b, radius):
    "Return 0 if a>b, otherwise return 1"
    if distance.euclidean(a, b) < radius:
        return 1
    else:
        return 0

def _make_mask(volume, roi, radius):
    mask = numpy.zeros(volume.shape)
    for x in range(volume.shape[0]):
        for y in range(volume.shape[1]):
            for z in range(volume.shape[2]):
                mask[x, y, z] = _func((x, y, z), roi, radius)
    return mask

其中volume.shape(182、218、200)和roi.shape(3,)都是ndarray类型;而radiusint

Where volume.shape (182, 218, 200) and roi.shape (3,) are both ndarray types; and radius is an int

推荐答案

方法1

这是一种矢量化方法-

m,n,r = volume.shape
x,y,z = np.mgrid[0:m,0:n,0:r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
mask = X**2 + Y**2 + Z**2 < radius**2

可能的改进:我们可以使用numexpr模块-

Possible improvement : We can probably speedup the last step with numexpr module -

import numexpr as ne

mask = ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')

方法2

我们还可以逐步构建与形状参数相对应的三个范围,并即时对roi的三个元素进行减法运算,而无需像先前使用np.mgrid那样实际创建网格.出于效率目的,使用 broadcasting 将会受益.实现看起来像这样-

We can also gradually build the three ranges corresponding to the shape parameters and perform the subtraction against the three elements of roi on the fly without actually creating the meshes as done earlier with np.mgrid. This would be benefited by the use of broadcasting for efficiency purposes. The implementation would look like this -

m,n,r = volume.shape
vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \
       ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)
mask = vals < radius**2

简化版:感谢@Bi Rico在这里提出改进建议,因为我们可以使用np.ogrid以更简洁的方式执行这些操作,就像这样-

Simplified version : Thanks to @Bi Rico for suggesting an improvement here as we can use np.ogrid to perform those operations in a bit more concise manner, like so -

m,n,r = volume.shape    
x,y,z = np.ogrid[0:m,0:n,0:r]-roi
mask = (x**2+y**2+z**2) < radius**2


运行时测试

函数定义-

def vectorized_app1(volume, roi, radius):
    m,n,r = volume.shape
    x,y,z = np.mgrid[0:m,0:n,0:r]
    X = x - roi[0]
    Y = y - roi[1]
    Z = z - roi[2]
    return X**2 + Y**2 + Z**2 < radius**2

def vectorized_app1_improved(volume, roi, radius):
    m,n,r = volume.shape
    x,y,z = np.mgrid[0:m,0:n,0:r]
    X = x - roi[0]
    Y = y - roi[1]
    Z = z - roi[2]
    return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')

def vectorized_app2(volume, roi, radius):
    m,n,r = volume.shape
    vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \
           ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)
    return vals < radius**2

def vectorized_app2_simplified(volume, roi, radius):
    m,n,r = volume.shape    
    x,y,z = np.ogrid[0:m,0:n,0:r]-roi
    return (x**2+y**2+z**2) < radius**2

时间-

In [106]: # Setup input arrays  
     ...: volume = np.random.rand(90,110,100) # Half of original input sizes 
     ...: roi = np.random.rand(3)
     ...: radius = 3.4
     ...: 

In [107]: %timeit _make_mask(volume, roi, radius)
1 loops, best of 3: 41.4 s per loop

In [108]: %timeit vectorized_app1(volume, roi, radius)
10 loops, best of 3: 62.3 ms per loop

In [109]: %timeit vectorized_app1_improved(volume, roi, radius)
10 loops, best of 3: 47 ms per loop

In [110]: %timeit vectorized_app2(volume, roi, radius)
100 loops, best of 3: 4.26 ms per loop

In [139]: %timeit vectorized_app2_simplified(volume, roi, radius)
100 loops, best of 3: 4.36 ms per loop

因此,broadcasting一如既往地展示了其疯狂的魔力,使 10,000x 速度比原始代码快,并且比 10x 更好,比通过使用即时广播的操作!

So, as always broadcasting showing its magic for a crazy almost 10,000x speedup over the original code and more than 10x better than creating meshes by using on-the-fly broadcasted operations!

这篇关于Python向量化嵌套循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆