将操作应用于numpy数组的不均匀拆分部分 [英] Applying operation to unevenly split portions of numpy array

查看:98
本文介绍了将操作应用于numpy数组的不均匀拆分部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个一维numpy数组:

I have three 1D numpy arrays:

  1. 一些测量发生的时间列表(t).
  2. t(y)中每次发生的测量的列表.
  3. 影响这些测量值的一些外部更改的时间列表(较短).
  1. A list of times at which some measurements occurred (t).
  2. A list of measurements that occurred at each of the times in t (y).
  3. A (shorter) list of times for some some external changes that affected these measurements (b).

这里是一个例子:


t = np.array([0.33856697,   1.69615293,   1.70257872,   2.32510279,
              2.37788203,   2.45102176,   2.87518307,   3.60941650,
              3.78275907,   4.37970516,   4.56480259,   5.33306546,
              6.00867792,   7.40217571,   7.46716989,   7.6791613 ,
              7.96938078,   8.41620336,   9.17116349,  10.87530965])
y = np.array([ 3.70209916,  6.31148802,  2.96578172,  3.90036915, 5.11728629,
               2.85788050,  4.50077811,  4.05113322,  3.55551093, 7.58624384,
               5.47249362,  5.00286872,  6.26664832,  7.08640263, 5.28350628,
               7.71646500,  3.75513591,  5.72849991,  5.60717179, 3.99436659])

b = np.array([ 1.7,  3.9,  9.5])

b的元素介于粗体和斜体元素t之间,将其分为四个大小不一的段,长度分别为2、7、10、1.

The elements of b fall between the bold and italicized elements t, breaking it into four uneven sized segments of lengths 2, 7, 10, 1.

我想对y的每个段进行操作以获得大小为b.size + 1的数组.具体来说,我想知道每个段中的值多于的一半是否高于或低于某个偏差.

I would like to apply an operation to each segment of y to get an array of size b.size + 1. Specifically, I want to know if more than half of the values of y within each segment fall above or below a certain bias.

我目前正在使用for循环并切片以应用测试:

I am currently using a for loop and slicing to apply my test:

bias = 5
categories = np.digitize(t, b)
result = np.empty(b.size + 1, dtype=np.bool_)
for i in range(result.size):
    mask = (categories == i)
    result[i] = (np.count_nonzero(y[mask] > bias) / np.count_nonzero(mask)) > 0.5

这似乎效率极低.不幸的是,在这种情况下np.where不会有太大帮助.有没有一种方法可以矢量化我在此描述的操作以避免Python for循环?

This seems extremely inefficient. Unfortunately, np.where won't help much in this situation. Is there a way to vectorize the operation I describe here to avoid the Python for loop?

顺便说一下,这是ytbias的关系图,以及用b界定的区域,以显示为什么预期的resultarray([False, False, True, False], dtype=bool)的原因:

By the way, here is a plot of y vs t, bias, and the regions delimited by b to show why the expected result is array([False, False, True, False], dtype=bool):

产生者

from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle
plt.ion()
f, a = plt.subplots()
a.plot(t, y, label='y vs t')
a.hlines(5, *a.get_xlim(), label='bias')
plt.tight_layout()
a.set_xlim(0, 11)
c = np.concatenate([[0], b, [11]])
for i in range(len(c) - 1):
    a.add_patch(Rectangle((c[i], 2.5), c[i+1] - c[i], 8 - 2.5, alpha=0.2, color=('red' if i % 2 else 'green'), zorder=-i-5))
a.legend()

推荐答案

这不会产生相同的结果吗?

Shouldn't this produce the same result?

split_points = np.searchsorted(t, np.r_[t[0], b, t[-1]])
numerator = np.add.reduceat(y > bias, split_points[:-1])
denominator = np.diff(split_points)
result = (numerator / denominator) > 0.5

很少注意:这种方法依赖于排序.然后,相对于b的bin都是整洁的块,因此我们不需要用掩码来描述它们,而只需以t索引形式的端点即可.这就是searchsorted为我们找到的.

Few notes: This approach relies on t being sorted. Then the bins relative to b will all be neat blocks, so we need no mask to describe them but just the endpoints in form of indices into t. That's what searchsorted finds for us.

由于您的准则似乎并不取决于组别,因此我们可以一口气为所有y制作一个大面具.在布尔数组中计算非零与求和相同,因为True会被强制为1等.这种情况的优势在于,我们可以使用add.reduceat来获取数组,分割点列表然后对块求和在分割之间,这正是我们想要的.

Since your criterion doesn't appear to depend on group, we can make one big mask for all y in one go. Counting nonzeros in a boolean array is the same as summing, because the True's will be coerced to ones etc. The advantage in this case is that we can use add.reduceat which takes the array, a list of split points and then sums the blocks between the splits, which is precisely what we want.

要归一化,我们需要计算每个垃圾箱中的总数,但是由于垃圾箱是连续的,我们只需要描述该垃圾箱的split_points的差异即可,这就是我们使用diff的地方.

To normalise we need to count the total number in each bin, but because the bins are contiguous we just need the difference of the split_points delineating that bin, which is where we use diff.

这篇关于将操作应用于numpy数组的不均匀拆分部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆