调整二维数组numpy的NaN的除外 [英] resize a 2D numpy array excluding NaN

查看:946
本文介绍了调整二维数组numpy的NaN的除外的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图调整给定因素的2D numpy的阵列,获得产量较小的数组。

I'm trying to resize a 2D numpy array of a given factor, obtaining a smaller array in output.

阵列被从图像文件中读取和一些值应为NaN(不是数字,np.nan从numpy的):它是从卫星和根本没有测量一些像素遥感测量的结果。

The array is read from an image file and some of the values should be NaN (Not a Number, np.nan from numpy): it is the result of remote sensing measurements from satellite and simply some pixels weren't measured.

的合适的包,我发现了这是scypy.misc.imresize,但含有为NaN输出阵列中的每个像素被设置为NaN,即使有在原来的像素的一些有效的数据内插在一起。

The suitable package I found for this is scypy.misc.imresize, but each pixel in the output array containing a NaN is set to NaN, even if there are some valid data in the original pixels interpolated together.

我的解决办法是追加在此处,我做了什么本质上是:

My solution is appended here, what I've done is essentially :


  • 创建基于原始阵列形状和所需的缩减因子
  • 新的数组
  • 创建索引阵列,以解决原始阵列的所有像素进行平均为在新的
  • 各像素
  • 通过新的像素阵列和平均所有未楠像素,以获得新的数组像素值周期;它只有NaN,则输出将为NaN。

  • create a new array based on the original array shape and the desired reduction factor
  • create an index array to address all the pixels of the original array to be averaged for each pixel in the new
  • cycle through the new array pixels and average all the not-NaN pixel to obtain the new array pixel value; it there are only NaN, the output will be NaN.

我打算关键字添加到不同的输出(平均值,中位数,输入像素的标准偏差等)之间的选择。

I'm planning to add keyword to choice between different output (average, median, standard deviation of the input pixels and so on).

据工作正常,但〜1Mpx图像大约需要3秒钟。由于我缺乏经验蟒蛇我在寻找改善。

It is working as expected, but on a ~1Mpx image it takes around 3 seconds. Due to my lack of experience in python I'm searching for improvements.

不要任何人有意见怎么做的更好,更有效率?

Do anyone have suggestion how to do it better and more efficiently?

做任何人都知道它已经实现了所有的东西图书馆?

Do anyone know a library that already implements all that stuff?

感谢。

在这里,你有与code下面在这里生成的随机像素输入输出示例:

Here you have an example output for random pixel input generated with the code here below:

import numpy as np
import pylab as plt
from scipy import misc

def resize_2d_nonan(array,factor):
    """
    Resize a 2D array by different factor on two axis sipping NaN values.
    If a new pixel contains only NaN, it will be set to NaN


    Parameters
    ----------

    array : 2D np array

    factor : int or tuple. If int x and y factor wil be the same

    Returns
    -------
    array : 2D np array scaled by factor

    Created on Mon Jan 27 15:21:25 2014

    @author: damo_ma
    """
    xsize, ysize = array.shape

    if isinstance(factor,int):
        factor_x = factor
        factor_y = factor
    elif isinstance(factor,tuple):
        factor_x , factor_y = factor[0], factor[1]
    else:
        raise NameError('Factor must be a tuple (x,y) or an integer')

    if not (xsize %factor_x == 0 or ysize % factor_y == 0) :
        raise NameError('Factors must be intger multiple of array shape')

    new_xsize, new_ysize = xsize/factor_x, ysize/factor_y

    new_array = np.empty([new_xsize, new_ysize])
    new_array[:] = np.nan # this saves us an assignment in the loop below

    # submatrix indexes : is the average box on the original matrix
    subrow, subcol  = np.indices((factor_x, factor_y))

     # new matrix indexs
    row, col  = np.indices((new_xsize, new_ysize))

    # some output for testing
    #for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
    #    print '----------------------------------------------'
    #    print 'i: %i, j: %i, ind: %i ' % (i, j, ind)    
    #    print 'subrow+i*new_ysize, subcol+j*new_xsize :'    
    #    print i,'*',new_xsize,'=',i*factor_x
    #    print j,'*',new_ysize,'=',j*factor_y
    #    print subrow+i*factor_x,subcol+j*factor_y
    #    print '---'
    #    print 'array[subrow+i*factor_x,subcol+j*factor_y] : '    
    #    print array[subrow+i*factor_x,subcol+j*factor_y]

    for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
        # define the small sub_matrix as view of input matrix subset
        sub_matrix = array[subrow+i*factor_x,subcol+j*factor_y]
        # modified from any(a) and all(a) to a.any() and a.all()
        # see http://stackoverflow.com/a/10063039/1435167
        if not (np.isnan(sub_matrix)).all(): # if we haven't all NaN
            if (np.isnan(sub_matrix)).any(): # if we haven no NaN at all
                msub_matrix = np.ma.masked_array(sub_matrix,np.isnan(sub_matrix))
                (new_array.reshape(-1))[ind] = np.mean(msub_matrix)
            else: # if we haven some NaN
                (new_array.reshape(-1))[ind] = np.mean(sub_matrix)
        # the case assign NaN if we have all NaN is missing due 
        # to the standard values of new_array

    return new_array


row , cols = 6, 4

a = 10*np.random.random_sample((row , cols))
a[0:3,0:2] = np.nan
a[0,2] = np.nan

factor_x = 2
factor_y = 2
a_misc = misc.imresize(a, .5, interp='nearest', mode='F')
a_2d_nonan = resize_2d_nonan(a,(factor_x,factor_y))

print a
print
print a_misc
print
print a_2d_nonan

plt.subplot(131)
plt.imshow(a,interpolation='nearest')
plt.title('original')
plt.xticks(arange(a.shape[1]))
plt.yticks(arange(a.shape[0]))
plt.subplot(132)
plt.imshow(a_misc,interpolation='nearest')
plt.title('scipy.misc')
plt.xticks(arange(a_misc.shape[1]))
plt.yticks(arange(a_misc.shape[0]))
plt.subplot(133)
plt.imshow(a_2d_nonan,interpolation='nearest')
plt.title('my.func')
plt.xticks(arange(a_2d_nonan.shape[1]))
plt.yticks(arange(a_2d_nonan.shape[0]))

修改

我添加了一些修改,以解决 ChrisProsser评论

I add some modification to address ChrisProsser comment.

如果我与一些其它值替代NaN时,让说未NaN的像素的平均,它会影响所有后续计算:再采样原始阵列和与NaN的再采样的阵列之间的差取代的表明,2个像素改变了他们的价值。

If I substitute the NaN with some other value, let say the average of the not-NaN pixels, it will affect all the subsequent calculation: the difference between the resampled original array and the resampled array with NaN substituted shows that 2 pixels changed their values.

我的目标很简单,跳过所有楠像素。

My goal is simply skip all the NaN pixels.

# substitute NaN with the average value 

ind_nonan , ind_nan = np.where(np.isnan(a) == False), np.where(np.isnan(a) == True)
a_substitute = np.copy(a)

a_substitute[ind_nan] = np.mean(a_substitute[ind_nonan]) # substitute the NaN with average on the not-Nan

a_substitute_misc = misc.imresize(a_substitute, .5, interp='nearest', mode='F')
a_substitute_2d_nonan = resize_2d_nonan(a_substitute,(factor_x,factor_y))

print a_2d_nonan-a_substitute_2d_nonan

[[        nan -0.02296697]
 [ 0.23143208  0.        ]
 [ 0.          0.        ]]

** 2日编辑**

** 2nd EDIT**

要解决的<一个href=\"http://stackoverflow.com/questions/21527770/resize-a-2d-numpy-array-excluding-nan/21531433#21531433\">Hooked's答案我把一些额外的code。这是一个iteresting想法,可惜这比应该是空像素(NAN)插值新的价值观和我的小例子,产生更多的NaN的比好值。

To address the Hooked's answer I put some additional code. It is an iteresting idea, sadly it interpolates new values over pixels that should be "empty" (NaN) and for my small example generate more NaN than good values.

X , Y  = np.indices((row , cols))
X_new , Y_new  = np.indices((row/factor_x , cols/factor_y))

from scipy.interpolate import CloughTocher2DInterpolator as intp
C = intp((X[ind_nonan],Y[ind_nonan]),a[ind_nonan])

a_interp = C(X_new , Y_new)

print a
print
print a_interp

[[        nan,         nan],
 [        nan,         nan],
 [        nan,  6.32826577]])

推荐答案

您是在阵列的小的窗口的操作。代替通过阵列循环,使窗户的,阵列可有效地通过操纵其进展重组。该numpy的库提供了 as_strided()功能,以帮助这一点。一个例子是在SciPy的食谱跨越技巧提供生活的游戏。

You are operating on small windows of the array. Instead of looping through the array to make the windows, the array can be efficiently restructured by manipulating its strides. The numpy library provides the as_strided() function to help with that. An example is provided in the SciPy CookBook Stride tricks for the Game of Life.

下面将使用在高效重叠的窗口与numpy的 - 我将包括它在最后

The following will use a generalized sliding window function found at Efficient Overlapping Windows with Numpy - I will include it at the end.

确定新数组的形状:

rows, cols = a.shape
new_shape = rows / 2, cols / 2

重组的阵列到你所需要的,并创建一个数组的索引标识NaN的窗口:

Restructure the array into the windows you need, and create an indexing array identifying NaNs:

# 2x2 windows of the original array
windows = sliding_window(a, (2,2))
# make a windowed boolean array for indexing
notNan = sliding_window(np.logical_not(np.isnan(a)), (2,2))

新阵列可以使用列表COM prehension或发电机前pression进行。

The new array can be made using a list comprehension or a generator expression.

# using a list comprehension
# make a list of the means of the windows, disregarding the Nan's
means = [window[index].mean() for window, index in zip(windows, notNan)]
new_array = np.array(means).reshape(new_shape)

# generator expression
# produces the means of the windows, disregarding the Nan's
means = (window[index].mean() for window, index in zip(windows, notNan))
new_array = np.fromiter(means, dtype = np.float32).reshape(new_shape)

发电机前pression应该节约内存。使用 itertools.izip()而不是`拉链也应该帮助,如果记忆是一个问题。我只是用列表COM prehension您的解决方案。

The generator expression should conserve memory. Using itertools.izip() instead of `zip should also help if memory is a problem. I just used the list comprehension for your solution.

您的功能:

def resize_2d_nonan(array,factor):
    """
    Resize a 2D array by different factor on two axis skipping NaN values.
    If a new pixel contains only NaN, it will be set to NaN

    Parameters
    ----------
    array : 2D np array

    factor : int or tuple. If int x and y factor wil be the same

    Returns
    -------
    array : 2D np array scaled by factor

    Created on Mon Jan 27 15:21:25 2014

    @author: damo_ma
    """
    xsize, ysize = array.shape

    if isinstance(factor,int):
        factor_x = factor
        factor_y = factor
        window_size = factor, factor
    elif isinstance(factor,tuple):
        factor_x , factor_y = factor
        window_size = factor
    else:
        raise NameError('Factor must be a tuple (x,y) or an integer')

    if (xsize % factor_x or ysize % factor_y) :
        raise NameError('Factors must be integer multiple of array shape')

    new_shape = xsize / factor_x, ysize / factor_y

    # non-overlapping windows of the original array
    windows = sliding_window(a, window_size)
    # windowed boolean array for indexing
    notNan = sliding_window(np.logical_not(np.isnan(a)), window_size)

    #list of the means of the windows, disregarding the Nan's
    means = [window[index].mean() for window, index in zip(windows, notNan)]
    # new array
    new_array = np.array(means).reshape(new_shape)

    return new_array

我没有做过任何时候比较与原有的功能,但它应该是更快的。

I haven't done any time comparisons with your original function, but it should be faster.

许多解决方案我在这里看到了SO的矢量的操作以提高速度/效率 - 我不太有一个把手,不知道它是否可以应用到您的问题。搜索SO窗,阵列,移动平均线,矢量和numpy的应该产生类似的问题和答案,以供参考。

Many solutions I've seen here on SO vectorize the operations to increase speed/efficiency - I don't quite have a handle on that and don't know if it can be applied to your problem. Searching SO for window, array, moving average, vectorize, and numpy should produce similar questions and answers for reference.

sliding_window()来自的高效重叠窗口与numpy的

sliding_window() from Efficient Overlapping Windows with Numpy:

import numpy as np
from numpy.lib.stride_tricks import as_strided as ast
from itertools import product

def norm_shape(shape):
    '''
    Normalize numpy array shapes so they're always expressed as a tuple, 
    even for one-dimensional shapes.

    Parameters
        shape - an int, or a tuple of ints

    Returns
        a shape tuple
    '''
    try:
        i = int(shape)
        return (i,)
    except TypeError:
        # shape was not a number
        pass

    try:
        t = tuple(shape)
        return t
    except TypeError:
        # shape was not iterable
        pass

    raise TypeError('shape must be an int, or a tuple of ints')


def sliding_window(a,ws,ss = None,flatten = True):
    '''
    Return a sliding window over a in any number of dimensions

    Parameters:
        a  - an n-dimensional numpy array
        ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size 
             of each dimension of the window
        ss - an int (a is 1D) or tuple (a is 2D or greater) representing the 
             amount to slide the window in each dimension. If not specified, it
             defaults to ws.
        flatten - if True, all slices are flattened, otherwise, there is an 
                  extra dimension for each dimension of the input.

    Returns
        an array containing each n-dimensional window from a
    '''

    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    ws = norm_shape(ws)
    ss = norm_shape(ss)

    # convert ws, ss, and a.shape to numpy arrays so that we can do math in every 
    # dimension at once.
    ws = np.array(ws)
    ss = np.array(ss)
    shape = np.array(a.shape)


    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shape),len(ws),len(ss)]
    if 1 != len(set(ls)):
        raise ValueError(\
        'a.shape, ws and ss must all have the same length. They were %s' % str(ls))

    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shape):
        raise ValueError(\
        'ws cannot be larger than a in any dimension.\
 a.shape was %s and ws was %s' % (str(a.shape),str(ws)))

    # how many slices will there be in each dimension?
    newshape = norm_shape(((shape - ws) // ss) + 1)
    # the shape of the strided array will be the number of slices in each dimension
    # plus the shape of the window (tuple addition)
    newshape += norm_shape(ws)
    # the strides tuple will be the array's strides multiplied by step size, plus
    # the array's strides (tuple addition)
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
    strided = ast(a,shape = newshape,strides = newstrides)
    if not flatten:
        return strided

    # Collapse strided so that it has one more dimension than the window.  I.e.,
    # the new array is a flat list of slices.
    meat = len(ws) if ws.shape else 0
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
    dim = firstdim + (newshape[-meat:])
    # remove any dimensions with size 1
    dim = filter(lambda i : i != 1,dim)
    return strided.reshape(dim)

这篇关于调整二维数组numpy的NaN的除外的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆