Python:嵌套 for 循环非常慢 - 读取 RLE 压缩的 3D 数据 [英] Python: Nested for loop is extremely slow - reading RLE compressed 3D data

查看:144
本文介绍了Python:嵌套 for 循环非常慢 - 读取 RLE 压缩的 3D 数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将使用游程编码 (RLE) 压缩的 3D 数据读取到 Python 中的 3D numpy 数组中.在 Matlab 中,使用嵌套循环需要大约一秒钟.但是在 python 中这需要 48 秒!

I need to read 3D data compressed with run-length encoding (RLE) into a 3D numpy array in python. In Matlab this takes around a second using a nested loop. However in python this takes 48 seconds!

这是我的代码:

# Preallocate 3D voxel grid
vox_size = [200,150,200];
voxelGrid3D = np.zeros([200,150,200], dtype=np.uint32);

# Get values from RLE encoded 3D scene: 
# Example:
# 0, (3), 4, (2) --> corresponds to --> 00044 
# --> value == [0, 4]
# --> value_reps == [3, 2]
value = labelsRleCompressed[::2];
value_reps = labelsRleCompressed[1::2];

vox_idx = 0;
vox_idx_all = 0;
num_elements = value_reps.size; # Number of elements to convert
for m in np.arange(0,num_elements):
    numReps = value_reps[m];
    currentValue = value[m];
    for l in np.arange(0,numReps):
    # Compute respective grid indices
        i = (np.floor(vox_idx_all / (vox_size[0] * vox_size[1]) ) % vox_size[2]);
        j = (np.floor(vox_idx_all / (vox_size[0]) ) % vox_size[1]);
        k = (np.floor(vox_idx_all ) % vox_size[0]);

    # Fill grid with label value
        voxelGrid3D[i,j,k] = currentValue;
        vox_idx_all = vox_idx_all + 1;

即使我删除了内部循环并用预先计算的网格索引 + reshape 函数替换它,整个过程仍然需要 10 秒!

Even if I remove the inner loop and replace it with precomputed grid indices + reshape function the whole process still takes 10 seconds!

voxelGrid = np.zeros(num_voxels,dtype=np.uint32)
repIter = 0;
numReps = 0;
vox_idx = 0;
for counter in np.arange(0,num_voxels):
    if repIter == numReps:
        numReps = value_iter[vox_idx];
        currentValue = value[vox_idx];
        vox_idx = vox_idx + 1;
        voxelGrid[counter] = currentValue
        repIter = 1;
    else:
        voxelGrid[counter] = currentValue
        repIter = repIter + 1;
voxelGrid3D = np.reshape(voxelGrid,(vox_size[0],vox_size[1],vox_size[2]))

这对我的应用程序来说太慢了.有没有人知道如何使它更快?

This is much too slow for my application. Has anyone an idea how to make this even faster?

推荐答案

加速循环

首先不要使用 np.arange 来创建迭代器(这将创建一个您迭代的数组).请改用 range (Python3) 或 xrange (Python2).这应该可以将性能提高几个百分点,但这并不是您真正的瓶颈.

At first don't use np.arange to create an iterator (this will create an array on which you iterate over). Use range (Python3) or xrange (Python2) instead. This should increase the performane by a few percent, but isn't your real bottleneck here.

Matlab 有一个及时编译器,可以在循环中执行相对较好的,CPython 默认没有这个.但是有一个称为 numba http://numba.pydata.org/ 的即时编译器.在文档中,您将找到可以编译为本地机器代码的受支持函数.使用 numba 时,我还建议在循环中编写内容而不是矢量化代码,因为这对编译器来说更容易处理.

Matlab has a just in time compiler to perform relativly good in loops, CPython doesn't have this by default. But there is a just in time compiler called numba http://numba.pydata.org/ . In the documentation you will find supported functions that can be compiled to native machine code. When using numba I would also recommend to write things in loops instead of vectorised code, because this is easier to handle for the compiler.

我稍微修改了你的代码.

I have modified your code a bit.

def decompress_RLE(labelsRleCompressed,vox_size):
    res=np.empty(vox_size[0]*vox_size[1]*vox_size[2],np.uint32)

    ii=0
    for i in range(0,labelsRleCompressed.size,2):
        value=labelsRleCompressed[i]
        rep=labelsRleCompressed[i+1]
        for j in range(0,rep):
            res[ii]=value
            ii=ii+1

    res=res.reshape((vox_size[0],vox_size[1],vox_size[2]))

    return res

为基准测试创建数据

vox_size=np.array((300,300,300),dtype=int32)
#create some data
labelsRleCompressed=np.random.randint(0, 500, 
size=vox_size[0]*vox_size[1]*vox_size[2]/2, dtype=np.uint32)
labelsRleCompressed[1::2]=4

简单地使用生成的数据调用该函数会导致运行时间 7.5 秒,这是一个相当差的性能.

Simply calling the function with the generated data results in a runtime of 7.5 seconds, which is a rather poor performance.

现在让我们使用 numba.

Now let's use numba.

import numba
nb_decompress_RLE = numba.jit("uint32[:,:,:](uint32[:],int32[:])",nopython=True)(decompress_RLE) #stick to the datatypes written in the decorator

使用测试数据调用已编译的 nb_decompress_RLE 会导致 0.0617 秒的运行时间.速度提高了 119 倍!使用 np.copy 简单复制数组仅快 3 倍.

Calling the compiled nb_decompress_RLE with the test data results in a runtime of 0.0617 seconds. A nice speed up by a factor of 119! Simple copying an array with np.copy is only 3 times faster.

这篇关于Python:嵌套 for 循环非常慢 - 读取 RLE 压缩的 3D 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆