大型3D阵列上的快速1D线性np.NaN插值 [英] Fast 1D linear np.NaN interpolation over large 3D array
问题描述
我有一个带有shape=(92, 4800, 4800)
的3D数组(z, y, x)
,其中axis 0
上的每个值代表一个不同的时间点.在某些情况下,时域中的值获取失败,导致某些值变为np.NaN
.在其他情况下,未获取任何值,并且z
上的所有值均为np.NaN
.
I have a 3D array (z, y, x)
with shape=(92, 4800, 4800)
where each value along axis 0
represents a different point in time. The acquisition of values in the time domain failed in a few instances causing some values to be np.NaN
. In other instances no values have been acquired and all values along z
are np.NaN
.
不考虑所有值均为np.NaN
的实例,使用线性插值法沿axis 0
填充np.NaN
的最有效方法是什么?
What is the most efficient way to use linear interpolation to fill np.NaN
along axis 0
disregarding instances where all values are np.NaN
?
这是我正在做的一个工作示例,该示例将pandas
包装器应用于scipy.interpolate.interp1d
.在原始数据集上,每个切片大约需要2秒钟,这意味着整个阵列将在2.6小时内处理完毕.尺寸减小的示例数据集大约需要9.5秒.
Here is a working example of what I'm doing that employs pandas
wrapper to scipy.interpolate.interp1d
. This takes around 2 seconds per slice on the original dataset meaning the whole array is processed in 2.6 hours. The example dataset with reduced size takes around 9.5 seconds.
import numpy as np
import pandas as pd
# create example data, original is (92, 4800, 4800)
test_arr = np.random.randint(low=-10000, high=10000, size=(92, 480, 480))
test_arr[1:90:7, :, :] = -32768 # NaN fill value in original data
test_arr[:, 1:90:6, 1:90:8] = -32768
def interpolate_nan(arr, method="linear", limit=3):
"""return array interpolated along time-axis to fill missing values"""
result = np.zeros_like(arr, dtype=np.int16)
for i in range(arr.shape[1]):
# slice along y axis, interpolate with pandas wrapper to interp1d
line_stack = pd.DataFrame(data=arr[:,i,:], dtype=np.float32)
line_stack.replace(to_replace=-37268, value=np.NaN, inplace=True)
line_stack.interpolate(method=method, axis=0, inplace=True, limit=limit)
line_stack.replace(to_replace=np.NaN, value=-37268, inplace=True)
result[:, i, :] = line_stack.values.astype(np.int16)
return result
使用示例数据集在我的计算机上的性能:
Performance on my machine with the example dataset:
%timeit interpolate_nan(test_arr)
1 loops, best of 3: 9.51 s per loop
我应该澄清代码正在产生预期的结果.问题是-如何优化此过程?
I should clarify that the code is producing my expected outcome. The question is - how can I optimize this process?
推荐答案
我最近在 numba 的帮助下针对我的特定用例解决了此问题,并且
I recently solved this problem for my particular use case with the help of numba and also did a little writeup on it.
from numba import jit
@jit(nopython=True)
def interpolate_numba(arr, no_data=-32768):
"""return array interpolated along time-axis to fill missing values"""
result = np.zeros_like(arr, dtype=np.int16)
for x in range(arr.shape[2]):
# slice along x axis
for y in range(arr.shape[1]):
# slice along y axis
for z in range(arr.shape[0]):
value = arr[z,y,x]
if z == 0: # don't interpolate first value
new_value = value
elif z == len(arr[:,0,0])-1: # don't interpolate last value
new_value = value
elif value == no_data: # interpolate
left = arr[z-1,y,x]
right = arr[z+1,y,x]
# look for valid neighbours
if left != no_data and right != no_data: # left and right are valid
new_value = (left + right) / 2
elif left == no_data and z == 1: # boundary condition left
new_value = value
elif right == no_data and z == len(arr[:,0,0])-2: # boundary condition right
new_value = value
elif left == no_data and right != no_data: # take second neighbour to the left
more_left = arr[z-2,y,x]
if more_left == no_data:
new_value = value
else:
new_value = (more_left + right) / 2
elif left != no_data and right == no_data: # take second neighbour to the right
more_right = arr[z+2,y,x]
if more_right == no_data:
new_value = value
else:
new_value = (more_right + left) / 2
elif left == no_data and right == no_data: # take second neighbour on both sides
more_left = arr[z-2,y,x]
more_right = arr[z+2,y,x]
if more_left != no_data and more_right != no_data:
new_value = (more_left + more_right) / 2
else:
new_value = value
else:
new_value = value
else:
new_value = value
result[z,y,x] = int(new_value)
return result
这比我的初始代码快 20倍.
这篇关于大型3D阵列上的快速1D线性np.NaN插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!