CUDA GPU处理:TypeError:compile_kernel()获得了意外的关键字参数'boundscheck' [英] CUDA GPU processing: TypeError: compile_kernel() got an unexpected keyword argument 'boundscheck'

查看:84
本文介绍了CUDA GPU处理:TypeError:compile_kernel()获得了意外的关键字参数'boundscheck'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

今天,我开始使用CUDA和GPU处理.我找到了本教程: https://www.geeksforgeeks.org/running-python-script-on-gpu/

Today I started working with CUDA and GPU processing. I found this tutorial: https://www.geeksforgeeks.org/running-python-script-on-gpu/

不幸的是,我第一次运行gpu代码的尝试失败了:

Unfortunately my first attempt to run gpu code failed:

from numba import jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer 

# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1    

# function optimized to run on gpu 
@jit(target ="cuda")                         
def func2(a): 
    for i in range(10000000): 
        a[i]+= 1
if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 
    b = np.ones(n, dtype = np.float32) 

    start = timer() 
    func(a) 
    print("without GPU:", timer()-start)     

    start = timer() 
    func2(a) 
    print("with GPU:", timer()-start) 

输出:

/home/amu/anaconda3/bin/python /home/amu/PycharmProjects/gpu_processing_base/gpu_base_1.py
without GPU: 4.89985659904778
Traceback (most recent call last):
  File "/home/amu/PycharmProjects/gpu_processing_base/gpu_base_1.py", line 30, in <module>
    func2(a)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/dispatcher.py", line 40, in __call__
    return self.compiled(*args, **kws)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 758, in __call__
    kernel = self.specialize(*args)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 769, in specialize
    kernel = self.compile(argtypes)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 785, in compile
    **self.targetoptions)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/core/compiler_lock.py", line 32, in _acquire_compile_lock
    return func(*args, **kwargs)
TypeError: compile_kernel() got an unexpected keyword argument 'boundscheck'

Process finished with exit code 1

我已经在pycharm的anaconda环境中安装了本教程中提到的 numba cudatoolkit .

I have installed numba and cudatoolkit mentioned in the tutorial in an anaconda environment in pycharm.

推荐答案

添加答案以使其脱离未答复的队列.

Adding an answer to get this off the unanswered queue.

该示例中的代码已损坏.您的numba或CUDA安装没有任何问题.问题(或复制它的博客)中的代码无法发出博客帖子声明的结果.

The code in that example is broken. It isn't anything wrong with your numba or CUDA installations. There is no way that the code in your question (or the blog you copied it from) can emit the result the blog post claims.

可以通过多种方式对其进行修改以使其起作用.一个会是这样的:

There are many ways this could potentially be modified to work. One would be like this:

from numba import vectorize, jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer 

# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1    

# function optimized to run on gpu 
@vectorize(['float64(float64)'], target ="cuda")                         
def func2(x): 
    return x+1

if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 

    start = timer() 
    func(a) 
    print("without GPU:", timer()-start)     

    start = timer() 
    func2(a) 
    print("with GPU:", timer()-start) 

此处 func2 变成为设备编译的ufunc.然后,它将在GPU的整个输入阵列上运行.这样做是这样的:

Here func2 becomes a ufunc which is compiled for the device. It will then be run over the whole input array on the GPU. Doing so does this:

$ python bogoexample.py 
without GPU: 4.314514834433794
with GPU: 0.21419800259172916

因此它更快,但是请记住,GPU时间包括编译GPU ufunc所需的时间

So it is faster, but keep in mind that the GPU time includes the time taken for compilation of the GPU ufunc

另一种选择是实际编写GPU内核.像这样:

Another alternative would be to actually write a GPU kernel. Like this:

from numba import vectorize, jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer 

# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1    

# function optimized to run on gpu 
@vectorize(['float64(float64)'], target ="cuda")                         
def func2(x): 
    return x+1

# kernel to run on gpu
@cuda.jit
def func3(a, N):
    tid = cuda.grid(1)
    if tid < N:
        a[tid] += 1


if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 

    for i in range(0,5):
         start = timer() 
         func(a) 
         print(i, " without GPU:", timer()-start)     

    for i in range(0,5):
         start = timer() 
         func2(a) 
         print(i, " with GPU ufunc:", timer()-start) 

    threadsperblock = 1024
    blockspergrid = (a.size + (threadsperblock - 1)) // threadsperblock
    for i in range(0,5):
         start = timer() 
         func3[blockspergrid, threadsperblock](a, n) 
         print(i, " with GPU kernel:", timer()-start) 

运行如下:

$ python bogoexample.py 
0  without GPU: 4.885275377891958
1  without GPU: 4.748716968111694
2  without GPU: 4.902181145735085
3  without GPU: 4.889955999329686
4  without GPU: 4.881594380363822
0  with GPU ufunc: 0.16726416163146496
1  with GPU ufunc: 0.03758022002875805
2  with GPU ufunc: 0.03580896370112896
3  with GPU ufunc: 0.03530424740165472
4  with GPU ufunc: 0.03579768259078264
0  with GPU kernel: 0.1421878095716238
1  with GPU kernel: 0.04386183246970177
2  with GPU kernel: 0.029975440353155136
3  with GPU kernel: 0.029602501541376114
4  with GPU kernel: 0.029780613258481026

在这里,您可以看到内核的运行速度比ufunc快,并且缓存(这是JIT编译函数的缓存,而不是调用的记忆)大大提高了GPU上的调用速度.

Here you can see that the kernel runs slightly faster than the ufunc, and that caching (and this is caching of the JIT compiled functions, not memoization of the calls) significantly speeds up the call on the GPU.

这篇关于CUDA GPU处理:TypeError:compile_kernel()获得了意外的关键字参数'boundscheck'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆