基于Numpy中的其他数组从数组中汇总数据 [英] Summing data from array based on other array in Numpy

查看：55 发布时间：2021/6/15 19:31:30 python arrays performance numpy matrix

本文介绍了基于Numpy中的其他数组从数组中汇总数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个大小相同的 2D numpy 数组(在本例中简化了大小和内容).

I have two 2D numpy arrays (simplified in this example with respect to size and content) with identical sizes.

一个 ID 矩阵:

和一个值矩阵:

14.8 17.0 74.3 40.3 90.2
25.2 75.9  5.6 40.0 33.7
78.9 39.3 11.3 63.6 56.7
11.4 75.7 78.4 88.7 58.6
79.6 32.3 35.3 52.5 13.3

我的目标是对按第一个矩阵的 ID 分组的第二个矩阵的值进行计数和求和:

My goal is to count and sum the values from the second matrix grouped by the IDs from the first matrix:

1: (8, 336.8)
2: (9, 453.4)
5: (8, 402.4)

我可以在 for 循环中执行此操作，但是当矩阵的大小为数千而不是 5x5 和数千个唯一 ID 时，需要花费大量时间来处理.

I can do this in a for loop but when the matrices have sizes in thousands instead of just 5x5 and thousands of unique ID's, it takes a lot of time to process.

numpy 是否有一个聪明的方法或方法的组合来做到这一点?

Does numpy have a clever method or a combination of methods for doing this?

推荐答案

这是一种矢量化方法，用于获取 ID 和 ID-based 的计数和 的总和值>value 与 <的组合代码>np.unique 和 np.bincount -

Here's a vectorized approach to get the counts for ID and ID-based summed values for value with a combination of np.unique and np.bincount -

unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)

value_sums = np.bincount(idx,value.ravel())

要将最终输出作为字典，您可以使用循环理解来收集求和值，如下所示 -

To get the final output as a dictionary, you can use loop-comprehension to gather the summed values, like so -

{i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}

样品运行 -

In [86]: ID
Out[86]: 
array([[1, 1, 1, 2, 2],
       [1, 1, 2, 2, 5],
       [1, 1, 2, 5, 5],
       [1, 2, 2, 5, 5],
       [2, 2, 5, 5, 5]])

In [87]: value
Out[87]: 
array([[ 14.8,  17. ,  74.3,  40.3,  90.2],
       [ 25.2,  75.9,   5.6,  40. ,  33.7],
       [ 78.9,  39.3,  11.3,  63.6,  56.7],
       [ 11.4,  75.7,  78.4,  88.7,  58.6],
       [ 79.6,  32.3,  35.3,  52.5,  13.3]])

In [88]: unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)
    ...: value_sums = np.bincount(idx,value.ravel())
    ...: 

In [89]: {i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}
Out[89]: 
{1: (8, 336.80000000000001),
 2: (9, 453.40000000000003),
 5: (8, 402.40000000000003)}

这篇关于基于Numpy中的其他数组从数组中汇总数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

基于Numpy中的其他数组从数组中汇总数据 [英] Summing data from array based on other array in Numpy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

基于Numpy中的其他数组从数组中汇总数据 [英] Summing data from array based on other array in Numpy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭