使用CuPy或NumPy有效地计算分区总和 [英] Calculate partitioned sum efficiently with CuPy or NumPy
问题描述
我有一个要累加的长度很长的数组*,长度为 L
(我们叫它 values
),还有一个长度相同的排序后的一维数组 L
包含用于对原始数组进行分区的 N
个整数–我们将此数组称为 labels
.
I have a very long array* of length L
(let's call it values
) that I want to sum over, and a sorted 1D array of the same length L
that contains N
integers with which to partition the original array – let's call this array labels
.
我当前正在做的是这个(模块
正在 cupy
或 numpy
):
What I'm currently doing is this (module
being cupy
or numpy
):
result = module.empty(N)
for i in range(N):
result[i] = values[labels == i].sum()
但这并不是最有效的方法(应该可以摆脱 for
循环,但是如何?).由于对 labels
进行了排序,因此我可以轻松确定断点并将这些索引用作起点/终点,但是我看不出这是如何解决 for
循环问题的.
But this can't be the most efficient way of doing it (it should be possible to get rid of the for
loop, but how?). Since labels
is sorted, I could easily determine the break points and use those indices as start/stop points, but I don't see how this solves the for
loop problem.
请注意,如果可能的话,我想避免沿途创建大小为 N
x L
的数组,因为 L
非常大.
Note that I would like to avoid creating an array of size N
xL
along the way, if possible, since L
is very large.
我正在cupy工作,但是任何numpy解决方案也都可以使用,并且可能会移植.在cupy中,对于 ReductionKernel
来说似乎是这种情况,但是我不太清楚该怎么做.
I'm working in cupy, but any numpy solution is welcome too and could probably be ported. Within cupy, it seems this would be a case for a ReductionKernel
, but I don't quite see how to do it.
*, values
是一维的,但我认为解决方案将不依赖于此
* in my case, values
is 1D, but I assume the solution wouldn't depend on this
推荐答案
您正在描述一个分组和汇总.您可以为此编写一个CuPy RawKernel
,但是使用
You are describing a groupby sum aggregation. You could write a CuPy RawKernel
for this, but it would be much easier to use the existing groupby aggregations implemented in cuDF, the GPU dataframe library. They can interoperate without requiring you to copy the data. If you call .values
on the resulting cuDF Series, it will give you a CuPy array.
如果您返回CPU,则可以对熊猫执行相同的操作.
If you went back to the CPU, you could do the same thing with pandas.
import cupy as cp
import pandas as pd
N = 100
values = cp.random.randint(0, N, 1000)
labels = cp.sort(cp.random.randint(0, N, 1000))
L = len(values)
result = cp.empty(L)
for i in range(N):
result[i] = values[labels == i].sum()
result[:5]
array([547., 454., 402., 601., 668.])
import cudf
df = cudf.DataFrame({"values": values, "labels": labels})
df.groupby(["labels"])["values"].sum().values[:5]
array([547, 454, 402, 601, 668])
这篇关于使用CuPy或NumPy有效地计算分区总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!