将函数应用于无循环的多维numpy数组 [英] Applying functions to multidimensional numpy arrays without loops

查看:77
本文介绍了将函数应用于无循环的多维numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用numpy处理栅格数据(从GDAL中读取),它表示高程.我的目标是使用numpy计算数组中每个像素的水流方向,该方向主要由给定像素与其8个邻居之间的高程差确定.

I am working with raster data with numpy (after reading from GDAL), which represents elevation. My goal is calculate water flow direction for every pixel in the array using numpy, determined primarily from the difference in elevation between a given pixel and it's 8 neighbours.

我已经实现了滚动窗口技术,以生成每个像素及其相邻像素的多维数组,其工作方式如下:

I have already implemented a rolling window technique to generate a multidimensional array with each pixel and it's neighbours, which works as below:

def rolling_window(array, window_size):
    itemsize = array.itemsize
    shape = (array.shape[0] - window_size + 1,
             array.shape[1] - window_size + 1,
             window_size, window_size)
    strides = (array.shape[1] * itemsize, itemsize,
               array.shape[1] * itemsize, itemsize)
    return np.lib.stride_tricks.as_strided(array, shape=shape, strides=strides)

array = np.arange(100)
array = array.reshape(10, 10)
w = rolling_window(array, 3)

# produces array with shape (8, 8, 3, 3) - edge cases are not currently dealt with.

因此,一系列围绕x的研究像素位于1,1的3 x 3阵列,每个阵列位于栅格行"的另一个维度内,例如,从输入的一个像素开始,表示它的阵列可以如下所示,其中值为4的像素是研究像素,其他值为它的直接邻居.

So, a series of 3 x 3 arrays, centred around the study pixel at 1,1, each within another dimension of the array for the raster "rows" e.g., from one pixel of the input, the array representing it could be as below, where the pixel valued 4 is the study pixel, and the other values are it's immediate neighbours.

array([[[[ 0,  1,  2],
         [ 3,  4,  5],
         [ 6,  7,  8]]]])

当前使用此多维数组的方法的简化版本是以下功能:

A simplified version of my current method for working with this multidimensional array is the following function:

def flow_dir(array):

    # Value to assign output based on element index.
    flow_idx_dict = {0: 32,
                     1: 64,
                     2: 128,
                     3: 16,
                     5: 1,
                     6: 8,
                     7: 4,
                     8: 2}

    # Generates the rolling window array as mentioned above.
    w = rolling_window(array, 3)

    # Iterate though each pixel array.
    for x, i in enumerate(w, 1):
        for y, j in enumerate(i, 1):
            j = j.flatten()

            # Centre pixel value after flattening.
            centre = j[4]

            # Some default values.
            idx = 4
            max_drop = 0

            # Iterate over pixel values in array.
            for count, px in enumerate(j):

                # Calculate difference between centre pixel and neighbour.
                drop = centre - px

                # Find the maximum difference pixel index.
                if count != 4:
                    if drop > max_drop:
                        max_drop = drop
                        idx = count

            # Assign a value from a dict, matching index to flow direction category.
            value = flow_idx_dict[idx]

            # Update each pixel in the input array with the flow direction.
            array[x, y] = value
    return array

所有的for循环和if语句都很慢,这是可以理解的.我知道必须有矢量化的numpy方法来执行此操作,但是我一直在努力寻找所需的确切函数,或者可能还不了解如何正确实现它们.我已经尝试过np.apply_along_axis,np.where,np.nditer等,但是到目前为止都无济于事.我认为我需要的是:

Understandably, all these for loops and if statements are very slow. I know there must be a vectorized numpy way to do this, but I'm struggling to find the exact functions(s) I need, or perhaps have not understood how to implement them properly. I have tried np.apply_along_axis, np.where, np.nditer, and others, but to no avail so far. What I think I need is:

  1. 一种将函数应用于滚动窗口所产生的每个像素阵列的方法,而无需使用for循环来访问它们.

  1. A way to apply a function to each of these pixel arrays produced by the rolling window without using for loops to access them.

查找最大丢弃索引值,而不使用if语句和枚举.

Find the maximum drop index value, without using if statements and enumerate.

能够批量更新输入数组,而不是单个元素.

To be able to update the input array in a batch, not by individual element.

推荐答案

我认为这里可以避免滚动窗口;与NxNx3x3相比,在NxN数组上进行矢量化更容易且更具可读性.

I think rolling windows can be avoided here; It is easier and more readable to vectorize on NxN array than NxNx3x3.

请考虑以下数据:

array = np.array([[78, 72, 69, 71, 58, 49],
       [74, 67, 56, 49, 46, 50],
       [69, 53, 44, 37, 38, 48],
       [64, 58, 55, 22, 33, 24],
       [68, 61, 47, 21, 16, 19],
       [74, 53, 34, 12, 11, 12]])
N=6

首先,以这种方式计算8个渐变并进行编码:

First, compute the 8 gradients and codes this way :

gradient = np.empty((8,N-2,N-2),dtype=np.float)
code = np.empty(8,dtype=np.int)
for k in range(8):
    theta = -k*np.pi/4
    code[k] = 2**k
    j, i = np.int(1.5*np.cos(theta)),-np.int(1.5*np.sin(theta))
    d = np.linalg.norm([i,j])
    gradient[k] = (array[1+i: N-1+i,1+j: N-1+j]-array[1: N-1,1: N-1])/d

之所以快是因为几乎没有外部循环(8). (-gradient).argmax(axis=0)为每个像素指定流的方向.

It is fast because there is few external loops (8). (-gradient).argmax(axis=0) give for each pixel the direction of the flow.

最后,take代码:

direction = (-gradient).argmax(axis=0)
result = code.take(direction)

结果:

array([[  2,   2,   4,   4],
       [  1,   2,   4,   8],
       [128,   1,   2,   4],
       [  2,   1,   4,   4]])

这篇关于将函数应用于无循环的多维numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆