基于最小/最大值的NumPy动态数组切片 [英] Numpy dynamic array slicing based on min/max values

查看:91
本文介绍了基于最小/最大值的NumPy动态数组切片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个hape的3维数组(c0>),其中36对应于=每日数据.在某些情况下,沿时间轴axis=0的所有元素都是np.nan.

I have a 3 dimensional array of hape (365, x, y) where 36 corresponds to =daily data. In some cases, all the elements along the time axis axis=0 are np.nan.

沿axis=0的每个点的时间序列如下所示:

The time series for each point along the axis=0 looks something like this:

我需要找到出现最大值(峰值数据)的索引,然后找到峰两边的两个最小值.

I need to find the index at which the maximum value (peak data) occurs and then the two minimum values on each side of the peak.

import numpy as np

a = np.random.random(365, 3, 3) * 10
a[:, 0, 0] = np.nan

peak_mask = np.ma.masked_array(a, np.isnan(a))
peak_indexes = np.nanargmax(peak_mask, axis=0)

我可以使用以下方法找到峰值之前的最小值:

I can find the minimum before the peak using something like this:

early_minimum_indexes = np.full_like(peak_indexes, fill_value=0)

for i in range(peak_indexes.shape[0]):
    for j in range(peak_indexes.shape[1]):
        if peak_indexes[i, j] == 0:
            early_minimum_indexes[i, j] = 0
        else:
            early_mask = np.ma.masked_array(a, np.isnan(a))
            early_loc = np.nanargmin(early_mask[:peak_indexes[i, j], i, j], axis=0)   
            early_minimum_indexes[i, j] = early_loc

结果峰和谷的绘制如下:

With the resulting peak and trough plotted like this:

对于大型数组(1m +个元素),这种方法在时间上非常不合理.有没有更好的方法可以使用numpy做到这一点?

This approach is very unreasonable time-wise for large arrays (1m+ elements). Is there a better way to do this using numpy?

推荐答案

虽然在这种情况下使用屏蔽数组可能不是最有效的解决方案,但它允许您在特定轴上执行屏蔽操作同时或多或少地保留形状,这是一个很大的方便.请记住,在许多情况下,被屏蔽的功能仍将最终复制被屏蔽的数据.

While using masked arrays may not be the most efficient solution in this case, it will allow you to perform masked operations on specific axes while more-or-less preserving shape, which is a great convenience. Keep in mind that in many cases, the masked functions will still end up copying the masked data.

在当前代码中,您基本上有正确的主意,但是却错过了一些技巧,例如能够否定和组合蒙版.同样,将掩码作为布尔值预先分配的事实更加有效,并且像np.full(..., 0) -> np.zeros(..., dtype=bool)这样的小问题也很棘手.

You have mostly the right idea in your current code, but you missed a couple of tricks, like being able to negate and combine masks. Also the fact that allocating masks as boolean up front is more efficient, and little nitpicks like np.full(..., 0) -> np.zeros(..., dtype=bool).

让我们从此倒退.假设您有一个行为良好的一维数组,带有一个峰值,例如a1.您可以使用遮罩轻松找到最大值和最小值(或索引),如下所示:

Let's work through this backwards. Let's say you had a well-behaved 1-D array with a peak, say a1. You can use masking to easily find the maxima and minima (or indices) like this:

peak_index = np.nanargmax(a1)
mask = np.zeros(a1.size, dtype=np.bool)
mask[peak:] = True
trough_plus = np.nanargmin(np.ma.array(a1, mask=~mask))
trough_minus = np.nanargmin(np.ma.array(a1, mask=mask))

这尊重这样一个事实,即被屏蔽的数组相对于普通的numpy布尔索引会翻转掩码的含义.最大值也可以出现在trough_plus的计算中,因为可以保证它不会是最小值(除非您遇到全纳情况).

This respects the fact that masked arrays flip the sense of the mask relative to normal numpy boolean indexing. It's also OK that the maximum value appears in the calculation of trough_plus, since it's guaranteed not to be a minimum (unless you have the all-nan situation).

现在,如果a1已经是蒙版数组(但仍然是1D),则可以执行相同的操作,但是可以临时合并蒙版.例如:

Now if a1 was a masked array already (but still 1D), you could do the same thing, but combine the masks temporarily. For example:

a1 = np.ma.array(a1, mask=np.isnan(a1))
peak_index = a1.argmax()
mask = np.zeros(a1.size, dtype=np.bool)
mask[peak:] = True
trough_plus = np.ma.masked_array(a1, mask=a.mask | ~mask).argmin()
trough_minus  (np.ma.masked_array(a1, mask=a.mask | mask).argmin()

同样,由于掩码数组具有反向掩码,因此像使用普通的numpy布尔掩码一样,使用|而不是&组合掩码非常重要.在这种情况下,无需调用argmaxargmin的nan版本,因为所有nan都已被屏蔽.

Again, since masked arrays have reversed masks, it's important to combine the masks using | instead of &, as you would for normal numpy boolean masks. In this case, there is no need for calling the nan version of argmax and argmin, since all the nans are already masked out.

希望,鉴于numpy函数中axis关键字的普遍性,从这里到多维的通用化变得很清楚:

Hopefully, the generalization to multiple dimensions becomes clear from here, given the prevalence of the axis keyword in numpy functions:

a = np.ma.array(a, mask=np.isnan(a))
peak_indices = a.argmax(axis=0).reshape(1, *a.shape[1:])
mask = np.arange(a.shape[0]).reshape(-1, *(1,) * (a.ndim - 1)) >= peak_indices

trough_plus = np.ma.masked_array(a, mask=~mask | a.mask).argmin(axis=0)
trough_minus = np.ma.masked_array(a, mask=mask | a.mask).argmin(axis=0)

N维掩膜技术来自基于起始索引有效填充掩膜,正是为此目的而提出的.

N-dimensional masking technique comes from Fill mask efficiently based on start indices, which was asked just for this purpose.

这篇关于基于最小/最大值的NumPy动态数组切片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆