如何在numpy ndarray中查找最频繁的值? [英] How to find most frequent values in numpy ndarray?

查看:188
本文介绍了如何在numpy ndarray中查找最频繁的值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个形状为(30,480,640)的numpy ndarray,第1和第2轴表示位置(纬度和经度),第0轴包含实际数据点,我想在每个方向使用第0轴上最频繁的值位置,即构造一个形状为(1,480,640)的新数组.即:

I have a numpy ndarray with shape of (30,480,640), the 1th and 2th axis representing locations(latitude and longitute), the 0th axis contains actual data points.I want to use the most frequent value along the 0th axis at each location, which is to construct a new array with shape of (1,480,640).ie:

>>> data
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[40, 40, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

(perform calculation)

>>> new_data 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]]])

数据点将包含负浮点数和正浮点数.我如何执行这样的计算?非常感谢!

The data points will contain negtive and positive floating numbers. How can I perform such calculations? Thanks a lot!

我尝试使用numpy.unique,但是得到了"TypeError:unique()得到了意外的关键字参数'return_inverse'".我使用的是Unix上安装的numpy版本1.2.1,它不支持return_inverse.我也尝试过模式,但是要处理如此大量的数据要花很多时间……所以还有一种获取最频繁值的替代方法吗?再次感谢.

I tried with numpy.unique,but I got "TypeError: unique() got an unexpected keyword argument 'return_inverse'".I'm using numpy version 1.2.1 installed on Unix and it doesn't support return_inverse..I also tried mode,but it takes forever to process such large amount of data...so is there an alternative way to get the most frequent values? Thanks again.

推荐答案

要查找平面数组的最常用值,请使用uniquebincountargmax:

To find the most frequent value of a flat array, use unique, bincount and argmax:

arr = np.array([5, 4, -2, 1, -2, 0, 4, 4, -6, -1])
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.bincount(indices))]

要使用多维数组,我们不必担心unique,但是我们确实需要在bincount上使用apply_along_axis:

To work with a multidimensional array, we don't need to worry about unique, but we do need to use apply_along_axis on bincount:

arr = np.array([[5, 4, -2, 1, -2, 0, 4, 4, -6, -1],
                [0, 1,  2, 2,  3, 4, 5, 6,  7,  8]])
axis = 1
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]

使用您的数据:

data = np.array([
   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[40, 40, 42, 43, 44],
    [45, 46, 47, 48, 49],
    [50, 51, 52, 53, 54],
    [55, 56, 57, 58, 59]]])
axis = 0
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])


NumPy 1.2,真的吗?您可以使用np.searchsorted合理有效地近似np.unique(return_inverse=True)(这是一个额外的O( n log n ),因此不应显着改变性能):


NumPy 1.2, really? You can approximate np.unique(return_inverse=True) reasonably efficiently using np.searchsorted (it's an additional O(n log n), so shouldn't change the performance significantly):

u = np.unique(arr)
indices = np.searchsorted(u, arr.flat)

这篇关于如何在numpy ndarray中查找最频繁的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆