使用numpy查找xy数据点图的局部最大值? [英] Finding local maxima of xy data point graph with numpy?

查看:146
本文介绍了使用numpy查找xy数据点图的局部最大值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获得一种最有效的方法,以便在包含成千上万个值的巨大数据点集中找到局部最大值.作为输入,使用两个带有x和y值的长列表.

I would like to get most efficient way to find local maxima in huge data point sets containing thousands of values. As input are used two long lists with x and y values.

考虑以下简单示例:

xval = [-0.15, -0.02, 0.1, 0.22, 0.36, 0.43, 0.58, 0.67, 0.79, 0.86, 0.96 ]

yval = [-0.09, 0.13, -0.01, -0.1, -0.05, 0.2, 0.56, 0.47, 0.35, 0.43, 0.69]

所需的输出是带有峰值索引的列表,此处locMaxId = [1,6,10]. 比较最近的邻居是解决方案,但是对于10k的值呢?

Desired output is list with indexes of peak points, here locMaxId =[1,6,10]. Comparing the closest neighbours is solution, but for 10k values?

推荐答案

您可以让numpy处理迭代,即将其向量化:

You can let numpy handle the iteration, i.e. vectorize it:

def local_maxima(xval, yval):
    xval = np.asarray(xval)
    yval = np.asarray(yval)

    sort_idx = np.argsort(xval)
    yval = yval[sort_idx]
    gradient = np.diff(yval)
    maxima = np.diff((gradient > 0).view(np.int8))
    return np.concatenate((([0],) if gradient[0] < 0 else ()) +
                          (np.where(maxima == -1)[0] + 1,) +
                          (([len(yval)-1],) if gradient[-1] > 0 else ()))

编辑因此,代码首先计算从每个点到nex(gradient)的变化.下一步有点棘手...如果执行np.diff((gradient > 0),则生成的布尔数组为True,其中从增长(> 0)变为不增长(<= 0).通过使其具有与布尔数组相同大小的有符号整数,可以区分从增长到不增长(-1)到相反(+1)的过渡.通过采用具有与布尔数组相同的dtype大小的带符号整数类型的.view(np.int8),我们避免了复制数据,就像我们做得不那么精巧的.astype(int)一样.剩下的工作就是照顾第一个和最后一个点,并将所有点连接到一个数组中.我今天发现的一件事是,如果在发送到np.concatenate的元组中包含一个空列表,它会以dtype np.float的空数组的形式出现,最终成为结果的dtype,因此上面的代码中的空元组的连接更加复杂.

EDIT So the code first computes the variation from every point to the nex(gradient). The next step is a little tricky... If you do np.diff((gradient > 0) the resulting boolean array is True where there is a change from growing (> 0) to not growing(<= 0). By making it a signed int of the same size as the boolean array, you can discriminate from transitions from growing to not growing (-1) to the opposite (+1). By taking a .view(np.int8) of a signed integer type of the same dtype size as the boolean array, we avoid copying the data, as would happen if we did the less hacky .astype(int). All that's left is taking care of the first and last points, and concatenating all points together into a single array. One thing I found out today is that if you include an empty list in the tuple you send to np.concatenate, it comes out as an empty array of dtype np.float, and that ends up being the dtype of the result, hence the more complicated concatenation of empty tuples in the above code.

有效:

In [2]: local_maxima(xval, yval)
Out[2]: array([ 1,  6, 10], dtype=int64)

并且相当快:

In [3]: xval = np.random.rand(10000)

In [4]: yval = np.random.rand(10000)

In [5]: local_maxima(xval, yval)
Out[5]: array([   0,    2,    4, ..., 9991, 9995, 9998], dtype=int64)

In [6]: %timeit local_maxima(xval, yval)
1000 loops, best of 3: 1.16 ms per loop

此外,大多数情况下是将数据从列表转换为数组并对其进行排序.如果您的数据已经排序并保存在数组中,则可以将上述性能提高5倍.

Also, most of the time is converting your data from lists to arrays and sorting them. If your data is already sorted and kept in arrays, you can probably improve performance over the above by a factor of 5x.

这篇关于使用numpy查找xy数据点图的局部最大值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆