numba cuda无法通过+ =产生正确的结果(需要减少gpu吗?) [英] numba cuda does not produce correct result with += (gpu reduction needed?)
问题描述
我正在使用numba cuda来计算函数.
I am using numba cuda to calculate a function.
代码只是将所有值加到一个结果中,但是numba cuda给我的结果不同于numpy.
The code is simply to add up all the values into one result, but numba cuda gives me a different result from numpy.
数字代码
import math
def numba_example(number_of_maximum_loop,gs,ts,bs):
from numba import cuda
result = cuda.device_array([3,])
@cuda.jit(device=True)
def BesselJ0(x):
return math.sqrt(2/math.pi/x)
@cuda.jit
def cuda_kernel(number_of_maximum_loop,result,gs,ts,bs):
i = cuda.grid(1)
if i < number_of_maximum_loop:
result[0] += BesselJ0(i/100+gs)
result[1] += BesselJ0(i/100+ts)
result[2] += BesselJ0(i/100+bs)
# Configure the blocks
threadsperblock = 128
blockspergrid = (number_of_maximum_loop + (threadsperblock - 1)) // threadsperblock
# Start the kernel
cuda_kernel[blockspergrid, threadsperblock](number_of_maximum_loop,result,gs,ts,bs)
return result.copy_to_host()
numba_example(1000,20,20,20)
输出:
array([ 0.17770302, 0.34166728, 0.35132036])
numpy代码
import math
def numpy_example(number_of_maximum_loop,gs,ts,bs):
import numpy as np
result = np.zeros([3,])
def BesselJ0(x):
return math.sqrt(2/math.pi/x)
for i in range(number_of_maximum_loop):
result[0] += BesselJ0(i/100+gs)
result[1] += BesselJ0(i/100+ts)
result[2] += BesselJ0(i/100+bs)
return result
numpy_example(1000,20,20,20)
输出:
array([ 160.40546935, 160.40546935, 160.40546935])
我不知道我在哪里错.我想我可能会使用减少.但是用一个cuda内核完成它似乎是不可能的.
I don't know where I am being wrong. I guess I might use reduction. But it seems impossible to finish it with one cuda kernel.
推荐答案
是的,需要适当的并行归约才能将来自多个GPU线程的数据求和到单个变量.
Yes, a proper parallel reduction is needed to sum data from multiple GPU threads to a single variable.
这是一个简单的示例,说明如何从单个内核完成此操作:
Here's one trivial example of how it could be done from a single kernel:
$ cat t23.py
import math
def numba_example(number_of_maximum_loop,gs,ts,bs):
from numba import cuda
result = cuda.device_array([3,])
@cuda.jit(device=True)
def BesselJ0(x):
return math.sqrt(2/math.pi/x)
@cuda.jit
def cuda_kernel(number_of_maximum_loop,result,gs,ts,bs):
i = cuda.grid(1)
if i < number_of_maximum_loop:
cuda.atomic.add(result, 0, BesselJ0(i/100+gs))
cuda.atomic.add(result, 1, BesselJ0(i/100+ts))
cuda.atomic.add(result, 2, BesselJ0(i/100+bs))
# Configure the blocks
threadsperblock = 128
blockspergrid = (number_of_maximum_loop + (threadsperblock - 1)) // threadsperblock
# Start the kernel
init = [0.0,0.0,0.0]
result = cuda.to_device(init)
cuda_kernel[blockspergrid, threadsperblock](number_of_maximum_loop,result,gs,ts,bs)
return result.copy_to_host()
print(numba_example(1000,20,20,20))
$ python t23.py
[ 162.04299487 162.04299487 162.04299487]
$
您还可以直接使用 reduce
装饰器对numba进行适当的减少,如
You can also do a proper reduction in numba directly with the reduce
decorator as described here although I'm not sure 3 reductions can be done in a single kernel that way.
最后,您可以使用numba cuda编写普通的cuda并行约简,如
Finally, you could write an ordinary cuda parallel reduction using numba cuda as indicated here. It should not be difficult I think to extend that to performing 3 reductions in a single kernel.
这3种不同的方法当然可能会有性能差异.
These 3 different methods will likely have performance differences, of course.
顺便说一句,如果您想知道我上面的代码与问题中您的python代码之间的结果差异,我将无法解释.当我运行您的python代码时,得到的结果与我的答案中的numba cuda代码匹配:
As an aside, if you're wondering about the results discrepancy between my code above and your python code in the question, I can't explain it. When I run your python code I get results matching the numba cuda code in my answer:
$ cat t24.py
import math
def numpy_example(number_of_maximum_loop,gs,ts,bs):
import numpy as np
result = np.zeros([3,])
def BesselJ0(x):
return math.sqrt(2/math.pi/x)
for i in range(number_of_maximum_loop):
result[0] += BesselJ0(i/100+gs)
result[1] += BesselJ0(i/100+ts)
result[2] += BesselJ0(i/100+bs)
return result
print(numpy_example(1000,20,20,20))
$ python t24.py
[ 162.04299487 162.04299487 162.04299487]
$
这篇关于numba cuda无法通过+ =产生正确的结果(需要减少gpu吗?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!