在numpy数组中按最大或最小分组 [英] Group by max or min in a numpy array

查看:225
本文介绍了在numpy数组中按最大或最小分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个等长的一维numpy数组iddata,其中id是重复的有序整数序列,这些整数定义了data上的子窗口.例如,

I have two equal-length 1D numpy arrays, id and data, where id is a sequence of repeating, ordered integers that define sub-windows on data. For example,

id  data
1     2
1     7
1     3
2     8
2     9
2    10
3     1
3   -10

我想通过对id进行分组并采用最大值或最小值来汇总data.在SQL中,这将是典型的聚合查询,例如SELECT MAX(data) FROM tablename GROUP BY id ORDER BY id.有没有一种方法可以避免Python循环并以矢量化方式执行此操作,还是必须降到C?

I would like to aggregate data by grouping on id and taking either the max or the min. In SQL, this would be a typical aggregation query like SELECT MAX(data) FROM tablename GROUP BY id ORDER BY id. Is there a way I can avoid Python loops and do this in a vectorized manner, or do I have to drop down to C?

推荐答案

最近几天,我一直在堆栈上看到一些非常相似的问题.以下代码与numpy.unique的实现非常相似,并且由于它利用了底层的numpy机制,因此它很可能会比在python循环中可以执行的任何操作都要快.

I've been seeing some very similar questions on stack overflow the last few days. The following code is very similar to the implementation of numpy.unique and because it takes advantage of the underlying numpy machinery, it is most likely going to be faster than anything you can do in a python loop.

import numpy as np
def group_min(groups, data):
    # sort with major key groups, minor key data
    order = np.lexsort((data, groups))
    groups = groups[order] # this is only needed if groups is unsorted
    data = data[order]
    # construct an index which marks borders between groups
    index = np.empty(len(groups), 'bool')
    index[0] = True
    index[1:] = groups[1:] != groups[:-1]
    return data[index]

#max is very similar
def group_max(groups, data):
    order = np.lexsort((data, groups))
    groups = groups[order] #this is only needed if groups is unsorted
    data = data[order]
    index = np.empty(len(groups), 'bool')
    index[-1] = True
    index[:-1] = groups[1:] != groups[:-1]
    return data[index]

这篇关于在numpy数组中按最大或最小分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆