大型3D阵列上的快速1D线性np.NaN插值 [英] Fast 1D linear np.NaN interpolation over large 3D array

查看：71 发布时间：2020/5/24 1:01:56 python numpy pandas scipy interpolation

本文介绍了大型3D阵列上的快速1D线性np.NaN插值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有shape=(92, 4800, 4800)的3D数组(z, y, x)，其中axis 0上的每个值代表一个不同的时间点.在某些情况下，时域中的值获取失败，导致某些值变为np.NaN.在其他情况下，未获取任何值，并且z上的所有值均为np.NaN.

I have a 3D array (z, y, x) with shape=(92, 4800, 4800) where each value along axis 0 represents a different point in time. The acquisition of values in the time domain failed in a few instances causing some values to be np.NaN. In other instances no values have been acquired and all values along z are np.NaN.

不考虑所有值均为np.NaN的实例，使用线性插值法沿axis 0填充np.NaN的最有效方法是什么?

What is the most efficient way to use linear interpolation to fill np.NaN along axis 0 disregarding instances where all values are np.NaN?

这是我正在做的一个工作示例，该示例将pandas包装器应用于scipy.interpolate.interp1d.在原始数据集上，每个切片大约需要2秒钟，这意味着整个阵列将在2.6小时内处理完毕.尺寸减小的示例数据集大约需要9.5秒.

Here is a working example of what I'm doing that employs pandas wrapper to scipy.interpolate.interp1d. This takes around 2 seconds per slice on the original dataset meaning the whole array is processed in 2.6 hours. The example dataset with reduced size takes around 9.5 seconds.

import numpy as np
import pandas as pd

# create example data, original is (92, 4800, 4800)
test_arr = np.random.randint(low=-10000, high=10000, size=(92, 480, 480))
test_arr[1:90:7, :, :] = -32768  # NaN fill value in original data
test_arr[:, 1:90:6, 1:90:8] = -32768

def interpolate_nan(arr, method="linear", limit=3):
    """return array interpolated along time-axis to fill missing values"""
    result = np.zeros_like(arr, dtype=np.int16)

    for i in range(arr.shape[1]):
        # slice along y axis, interpolate with pandas wrapper to interp1d
        line_stack = pd.DataFrame(data=arr[:,i,:], dtype=np.float32)
        line_stack.replace(to_replace=-37268, value=np.NaN, inplace=True)
        line_stack.interpolate(method=method, axis=0, inplace=True, limit=limit)
        line_stack.replace(to_replace=np.NaN, value=-37268, inplace=True)
        result[:, i, :] = line_stack.values.astype(np.int16)
    return result

使用示例数据集在我的计算机上的性能:

Performance on my machine with the example dataset:

%timeit interpolate_nan(test_arr)
1 loops, best of 3: 9.51 s per loop

我应该澄清代码正在产生预期的结果.问题是-如何优化此过程?

I should clarify that the code is producing my expected outcome. The question is - how can I optimize this process?

推荐答案

我最近在 numba 的帮助下针对我的特定用例解决了此问题，并且

I recently solved this problem for my particular use case with the help of numba and also did a little writeup on it.

from numba import jit

@jit(nopython=True)
def interpolate_numba(arr, no_data=-32768):
    """return array interpolated along time-axis to fill missing values"""
    result = np.zeros_like(arr, dtype=np.int16)

    for x in range(arr.shape[2]):
        # slice along x axis
        for y in range(arr.shape[1]):
            # slice along y axis
            for z in range(arr.shape[0]):
                value = arr[z,y,x]
                if z == 0:  # don't interpolate first value
                    new_value = value
                elif z == len(arr[:,0,0])-1:  # don't interpolate last value
                    new_value = value

                elif value == no_data:  # interpolate

                    left = arr[z-1,y,x]
                    right = arr[z+1,y,x]
                    # look for valid neighbours
                    if left != no_data and right != no_data:  # left and right are valid
                        new_value = (left + right) / 2

                    elif left == no_data and z == 1:  # boundary condition left
                        new_value = value
                    elif right == no_data and z == len(arr[:,0,0])-2:  # boundary condition right
                        new_value = value

                    elif left == no_data and right != no_data:  # take second neighbour to the left
                        more_left = arr[z-2,y,x]
                        if more_left == no_data:
                            new_value = value
                        else:
                            new_value = (more_left + right) / 2

                    elif left != no_data and right == no_data:  # take second neighbour to the right
                        more_right = arr[z+2,y,x]
                        if more_right == no_data:
                            new_value = value
                        else:
                            new_value = (more_right + left) / 2

                    elif left == no_data and right == no_data:  # take second neighbour on both sides
                        more_left = arr[z-2,y,x]
                        more_right = arr[z+2,y,x]
                        if more_left != no_data and more_right != no_data:
                            new_value = (more_left + more_right) / 2
                        else:
                            new_value = value
                    else:
                        new_value = value
                else:
                    new_value = value
                result[z,y,x] = int(new_value)
    return result

这比我的初始代码快 20倍.

这篇关于大型3D阵列上的快速1D线性np.NaN插值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

大型3D阵列上的快速1D线性np.NaN插值 [英] Fast 1D linear np.NaN interpolation over large 3D array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

大型3D阵列上的快速1D线性np.NaN插值 [英] Fast 1D linear np.NaN interpolation over large 3D array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭