无法正确打印两个值(python3.5 + numba + CUDA8.0) [英] Two values can't be printed correctly (python3.5+numba+CUDA8.0)

查看:99
本文介绍了无法正确打印两个值(python3.5 + numba + CUDA8.0)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个数组,我将在GPU中对其进行一些计算。



在进行计算之前,我应该获取此数组的子集。



当我打印子集时,发现两个值不正确。



代码如下:

 导入os,sys,time 
以pd
导入熊猫
以NP

导入numpy from numba import cuda,float32

os.environ ['NUMBAPRO_NVVM'] = r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\bin\nvvm64_31_0 .dll'
os.environ ['NUMBAPRO_LIBDEVICE'] = r'D:\NVIDIA GPU计算工具包\CUDA\v8.0\nvvm\libdevice'

bpg =(3,1)
tpb =(2,2)

@ cuda.jit
def calcu_TE(D,TE):
gw = cuda.gridDim .x

bx = cuda.blockIdx.x

tx = cuda.threadIdx.x
bw = cuda.blockDim.x
ty = cuda .threadIdx.y
bh = cuda.blockDim.y

c_num = D.shape [0]
#print(c_num)
c_index = bx
而c_inde x< c_num * c_num:
c_x = int(c_index / c_num)
c_y = c_index%c_num
如果c_x == c_y:
TE [0] = 0.0
否则:
X = D [c_x ,:]
Y = D [c_y ,:]
如果bx == 1:
print('c_index,bx,tx, ty,X:',c_index,bx,tx,ty,'',X [0],X [1],X [2],X [3],X [4],X [5],X [6 ],X [7],X [8],X [9])
print('c_index,bx,tx,ty,Y:',c_index,bx,tx,ty,'',Y [0 ],Y [1],Y [2],Y [3],Y [4],Y [5],Y [6],Y [7],Y [8],Y [9])
#print('c_index,bx,tx,ty,Y:',c_index,bx,tx,ty,Y [0],Y [1],Y [2],Y [3],Y [4], Y [5],Y [6],Y [7],Y [8],Y [9])
h = tx
如果h == 0:
Xi = X [1 :]
Xi1 = X [:-1]
Yi = Y [1:]
如果bx == 1:
print('bx,tx,ty:', bx,tx,ty,'\n Xi',Xi [0],Xi [1],Xi [2],Xi [3],Xi [4],Xi [5],Xi [6],Xi [ 7],Xi [8],
'\n Xi1',Xi1 [0],Xi1 [1],Xi1 [2],Xi1 [3],Xi1 [4],Xi1 [5],Xi1 [6],X i1 [7],Xi1 [8],
'\n Yi',Yi [0],Yi [1],Yi [2],Yi [3],Yi [4],Yi [5] ,Yi [6],Yi [7],Yi [8])
c_index + = gw


D = np.array([[0.42487645,0.41607881,0.42027071, 0.43751907,0.43512794,0.43656972,0.43940639,0.43864551,0.43447691,0.43120232],
[2.989578,2.834707,2.942902,3.294948,2.868170,2.975180,3.066900,2.712719,2.835360,2.607334]],dtype = np.float32)
TE = np.empty([1,1])$ ​​b $ b print('D:',D)

stream = cuda.stream()
与stream.auto_synchronize ():
dD = cuda.to_device(D,流)
dTE = cuda.to_device(TE,流)
calcu_TE [bpg,tpb,流](dD,dTE)

输出为:

  D:[[0.42487645 0.41607881 0.42027071 0.43751907 0.43512794 0.43656972 
0.43940639 0.43864551 0.43447691 0.43120232]
[2.98957801 2.83470702 2.94290209 3.2949481 2.86817002 2.97517991
3.06690001 2.71271896 2.8353600
c_index,bx,tx,ty,X:1 1 0 0 0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
c_index,bx,tx,ty,X:1 1 1 0 0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
c_index,bx,tx,ty,X:1 1 0 1 0.424876 0.416079 0.420271 0.437519 0.435128 0.4365128 0.436570 0.439406 0.438646 0.434477 0.431202
c_index,bx,tx,ty,X:1 1 1 1 1 0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
c_index,bx,tx,ty,Y:1 1 0 0 2.989578 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2.835360 2.607334
c_index,bx,tx ty,Y:1 1 1 0 2.989578 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2.835360 2.607334
c_index,bx,tx,ty,Y:1 1 0 1 2.989578 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2.835360 2.607334
c_index,bx,tx,ty,Y:1 1 1 1 2.989578 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2。 835360 2.607334

bx,tx,ty:1 0 0
Xi 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
Xi1 0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.434477
b易2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 0.000000 18949972373983835000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0.438646 0.434477
Yi 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 0.000000 1894997237398383500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000.000000

这太奇怪了。



Yi应该为 Yi 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2.835360 2.607334



但是它被打印了 Yi 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 0.000000 189499723739838350000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000。 b

有两个错误的值。



我不知道为什么会这样。有什么我忽略的东西吗?

解决方案

这似乎与Numba编译器为该代码生成代码的方式有关。内核中的长打印语句,与内核的正确性无关。如果您这样更改代码(即,使打印语句更短):

  @ cuda.jit 
def calcu_TE (D,TE):
gw = cuda.gridDim.x

bx = cuda.blockIdx.x

tx = cuda.threadIdx.x
bw = cuda.blockDim.x
ty = cuda.threadIdx.y
bh = cuda.blockDim.y

c_num = D.shape [0]
c_index = bx
,而c_index< c_num * c_num:
c_x = int(c_index / c_num)
c_y = c_index%c_num
如果c_x == c_y:
TE [0] = 0.0
其他:
X = D [c_x ,:]
Y = D [c_y ,:]
如果bx == 1:
打印('c_index,bx,tx,ty,X:',c_index,bx,tx,ty,',X [0],X [1],X [2],X [3],X [4], X [5],X [6],X [7],X [8],X [9])
print('c_index,bx,tx,ty,Y:',c_index,bx,tx, ty,'',Y [0],Y [1],Y [2],Y [3],Y [4],Y [5],Y [6],Y [7],Y [8], Y [9])
h = tx
如果h == 0:
Xi = X [1:]
Xi1 = X [:-1]
Yi = Y [1:]
如果bx == 1:
print('bx,tx,ty,Yi:',bx,tx, ty,'',Yi [0],Yi [1],Yi [2],Yi [3],Yi [4],Yi [5],Yi [6],Yi [7],Yi [8])
c_index + = gw

您应该找到 Yi 打印正确。通常,在CUDA中依靠打印语句来检测内核是一个很糟糕的主意,通常情况下,您只会因此而感到困惑。


There is an array, I'll do some calculation with it in GPU.

Before my calculation, I should get the subsets of this array.

When I print the subsets, find two values are not right.

The code is as follows:

import os,sys,time
import pandas as pd
import numpy as np

from numba import cuda, float32

os.environ['NUMBAPRO_NVVM']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\bin\nvvm64_31_0.dll'
os.environ['NUMBAPRO_LIBDEVICE']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\libdevice'

bpg = (3,1)  
tpb = (2,2)  

@cuda.jit
def calcu_TE(D,TE):
    gw = cuda.gridDim.x

    bx = cuda.blockIdx.x

    tx = cuda.threadIdx.x
    bw = cuda.blockDim.x
    ty = cuda.threadIdx.y
    bh = cuda.blockDim.y

    c_num = D.shape[0]
    #print(c_num)
    c_index = bx
    while c_index<c_num*c_num:
        c_x = int(c_index/c_num)
        c_y = c_index%c_num
        if c_x==c_y:
            TE[0] = 0.0
        else:
            X = D[c_x,:]
            Y = D[c_y,:]
            if bx==1 :
                print('c_index,bx,tx,ty,X: ',c_index,bx,tx,ty,'  ',X[0],X[1],X[2],X[3],X[4],X[5],X[6],X[7],X[8],X[9])
                print('c_index,bx,tx,ty,Y: ',c_index,bx,tx,ty,'  ',Y[0],Y[1],Y[2],Y[3],Y[4],Y[5],Y[6],Y[7],Y[8],Y[9])
            #print('c_index,bx,tx,ty,Y: ',c_index,bx,tx,ty,Y[0],Y[1],Y[2],Y[3],Y[4],Y[5],Y[6],Y[7],Y[8],Y[9])
            h = tx
            if h==0:
                Xi = X[1:]
                Xi1 = X[:-1]
                Yi = Y[1:]
                if bx==1 :
                    print('bx,tx,ty: ',bx,tx,ty,'\n Xi',Xi[0],Xi[1],Xi[2],Xi[3],Xi[4],Xi[5],Xi[6],Xi[7],Xi[8],
                          '\n Xi1',Xi1[0],Xi1[1],Xi1[2],Xi1[3],Xi1[4],Xi1[5],Xi1[6],Xi1[7],Xi1[8],
                          '\n Yi',Yi[0],Yi[1],Yi[2],Yi[3],Yi[4],Yi[5],Yi[6],Yi[7],Yi[8])
        c_index +=gw


D = np.array([[ 0.42487645,0.41607881,0.42027071,0.43751907,0.43512794,0.43656972,0.43940639,0.43864551,0.43447691,0.43120232],
              [2.989578,2.834707,2.942902,3.294948,2.868170,2.975180,3.066900,2.712719,2.835360,2.607334]], dtype=np.float32)
TE = np.empty([1,1])
print('D: ',D)

stream = cuda.stream()
with stream.auto_synchronize():
    dD = cuda.to_device(D, stream)
    dTE = cuda.to_device(TE, stream)
    calcu_TE[bpg, tpb, stream](dD,dTE)

The output is:

D:  [[ 0.42487645  0.41607881  0.42027071  0.43751907  0.43512794  0.43656972
   0.43940639  0.43864551  0.43447691  0.43120232]
 [ 2.98957801  2.83470702  2.94290209  3.2949481   2.86817002  2.97517991
   3.06690001  2.71271896  2.83536005  2.6073339 ]]
c_index,bx,tx,ty,X:  1 1 0 0    0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
c_index,bx,tx,ty,X:  1 1 1 0    0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
c_index,bx,tx,ty,X:  1 1 0 1    0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
c_index,bx,tx,ty,X:  1 1 1 1    0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
c_index,bx,tx,ty,Y:  1 1 0 0    2.989578 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2.835360 2.607334
c_index,bx,tx,ty,Y:  1 1 1 0    2.989578 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2.835360 2.607334
c_index,bx,tx,ty,Y:  1 1 0 1    2.989578 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2.835360 2.607334
c_index,bx,tx,ty,Y:  1 1 1 1    2.989578 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2.835360 2.607334

bx,tx,ty:  1 0 0
 Xi 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
 Xi1 0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477
 Yi 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 0.000000 18949972373983835000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000
bx,tx,ty:  1 0 1
 Xi 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477 0.431202
 Xi1 0.424876 0.416079 0.420271 0.437519 0.435128 0.436570 0.439406 0.438646 0.434477
 Yi 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 0.000000 18949972373983835000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000

It's so strange.

Yi should be Yi 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 2.835360 2.607334.

But it was printed Yi 2.834707 2.942902 3.294948 2.868170 2.975180 3.066900 2.712719 0.000000 18949972373983835000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000.

There are two values wrong.

I do't know why it happened. Is there anything I ignored?

解决方案

This would appear to be a problem with the way the Numba compiler is producing code for the very long print statement in your kernel, and nothing to do with the correctness of your kernel. If you change the code like this (i.e. make the print statement shorter):

@cuda.jit
def calcu_TE(D,TE):
    gw = cuda.gridDim.x

    bx = cuda.blockIdx.x

    tx = cuda.threadIdx.x
    bw = cuda.blockDim.x
    ty = cuda.threadIdx.y
    bh = cuda.blockDim.y

    c_num = D.shape[0]
    c_index = bx
    while c_index<c_num*c_num:
        c_x = int(c_index/c_num)
        c_y = c_index%c_num
        if c_x==c_y:
            TE[0] = 0.0
        else:
            X = D[c_x,:]
            Y = D[c_y,:]
            if bx==1 :
                print('c_index,bx,tx,ty,X: ',c_index,bx,tx,ty,'  ',X[0],X[1],X[2],X[3],X[4],X[5],X[6],X[7],X[8],X[9])
                print('c_index,bx,tx,ty,Y: ',c_index,bx,tx,ty,'  ',Y[0],Y[1],Y[2],Y[3],Y[4],Y[5],Y[6],Y[7],Y[8],Y[9])
            h = tx
            if h==0:
                Xi = X[1:]
                Xi1 = X[:-1]
                Yi = Y[1:]
                if bx==1 :
                    print('bx,tx,ty,Yi:',bx,tx,ty,'  ',Yi[0],Yi[1],Yi[2],Yi[3],Yi[4],Yi[5],Yi[6],Yi[7],Yi[8])
        c_index +=gw

You should find that Yi is printed correctly. In general, relying on print statements to instrument kernels in CUDA is a rather poor idea, and often you will only confuse yourself by doing so, as in this case.

这篇关于无法正确打印两个值(python3.5 + numba + CUDA8.0)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆