用NumPy向量化的groupby [英] Vectorized groupby with NumPy

查看：1106 发布时间：2020/5/18 19:21:27 python numpy

本文介绍了用NumPy向量化的groupby的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Pandas具有广泛使用的 groupby 可以根据对应的映射拆分DataFrame的功能，您可以根据该映射对每个子组应用计算并重新组合结果.

Pandas has a widely-used groupby facility to split up a DataFrame based on a corresponding mapping, from which you can apply a calculation on each subgroup and recombine the results.

在没有本地Python for循环的情况下，可以在NumPy中灵活地完成此操作吗?在Python循环中，这看起来像:

Can this be done flexibly in NumPy without a native Python for-loop? With a Python loop, this would look like:

>>> import numpy as np

>>> X = np.arange(10).reshape(5, 2)
>>> groups = np.array([0, 0, 0, 1, 1])

# Split up elements (rows) of `X` based on their element wise group
>>> np.array([X[groups==i].sum() for i in np.unique(groups)])
array([15, 30])

上方15是X的前三行的总和，而30是其余两行的总和.

Above 15 is the sum of the first three rows of X, and 30 is the sum of the remaining two.

灵活地"，我的意思是我们不是在关注某个特定的计算，例如求和，计数，最大值等，而是将任何计算传递给分组数组.

By "flexibly," I just mean that we aren't focusing on one particular computation such as sum, count, maximum, etc, but rather passing any computation to the grouped arrays.

如果没有，是否有比上述方法更快的方法?

If not, is there a faster approach than the above?

推荐答案

如果您想更灵活地实现groupby，可以使用numpy的ufunc中的任何一个进行分组:

If you want a more flexible implementation of groupby that can group using any of numpy's ufuncs:

def groupby_np(X, groups, axis = 0, uf = np.add, out = None, minlength = 0, identity = None):
    if minlength < groups.max() + 1:
        minlength = groups.max() + 1
    if identity is None:
        identity = uf.identity
    i = list(range(X.ndim))
    del i[axis]
    i = tuple(i)
    n = out is None
    if n:
        if identity is None:  # fallback to loops over 0-index for identity
            assert np.all(np.in1d(np.arange(minlength), groups)), "No valid identity for unassinged groups"
            s = [slice(None)] * X.ndim
            for i_ in i:
                s[i_] = 0
            out = np.array([uf.reduce(X[tuple(s)][groups == i]) for i in range(minlength)])
        else:
            out = np.full((minlength,), identity, dtype = X.dtype)
    uf.at(out, groups, uf.reduce(X, i))
    if n:
        return out

groupby_np(X, groups)
array([15, 30])

groupby_np(X, groups, uf = np.multiply)
array([   0, 3024])

groupby_np(X, groups, uf = np.maximum)
array([5, 9])

groupby_np(X, groups, uf = np.minimum)
array([0, 6])

这篇关于用NumPy向量化的groupby的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用NumPy向量化的groupby [英] Vectorized groupby with NumPy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

用NumPy向量化的groupby [英] Vectorized groupby with NumPy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭