使用CUDA在gpu上平行克罗内克张量积 [英] parallel Kronecker tensor product on gpu using CUDA
问题描述
a = <32x1>
的每个元素乘以所有的向量 kron(a,b)
其他向量的元素 b = <1x32>
,输出向量大小将为 k <32x32> = a。* b
,我试图写它在CPP和它的工作,因为我只关心的所有元素的2d数组,我想我可以使它容易作为一维数组,因为 m = sum(sum (kron(a,b)))
是我正在工作的代码 for = 0; i <32; i ++)
for(j = 0; j <32; j ++)
k [i * 32 + j] = a [i] * b [j]
这意味着 a [i]
通过 b
中的eachelement,我虽然去32块,每个块有32个线程,代码应该
__ global__ void myKrom(int * c,int * a,int * b){
int i = blockDim.x * blockIdx.x + threadIdx.x;
while(i <32){
c [i] = a [blockIdx.x] + b [blockDim.x * blockIdx.x + threadIdx.x];
}
是外层循环,但它没有,任何机构可以告诉我在哪里,我可以请求并行方式做并行和
您可能实际上是这样的:
__ global__ void myKrom ,int * a,int * b)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if(i <32 * 32){
c [i] = a [blockIdx.x] + b [threadIdx.x];
}
}
myKrom<<<< 32,32>>> (c,a,b);
i am working in parallelise [this file][1] on GPU using [PTX file with matlab parallel.gpu.CUDAkernel][2] my problem with [kron tensor product][3]in my code it should multiply two vectorskron(a,b)
by multiplying each element of the first vectora=<32x1>
by the all elements of the other vectorb=<1x32>
and the output vector size will bek<32x32>=a.*b
,i tried to write it in CPP and it worked,as i only concern about summing all the elements of 2d array , i thought i can make it easy as 1D array because m=sum(sum(kron(a,b)))
is the code i am working on
for(i=0;i<32;i++)
for(j=0;j<32;j++)
k[i*32+j]=a[i]*b[j]
it meant to have the a[i]
th element multiply by eachelement in b
, and i though to go with 32 blocks with each block has a 32 threads and the code should be
__global__ void myKrom(int* c,int* a, int*b) {
int i=blockDim.x*blockIdx.x+threadIdx.x;
while(i<32) {
c[i]=a[blockIdx.x]+b[blockDim.x*blockIdx.x+threadIdx.x];
}
that should make the trick as the blockIdx.x
is the outer loop, but it didn't, could any body tell me where,may i ask for parallel way to do the parallel sum
You may actually mean something like this:
__global__ void myKrom(int* c,int* a, int*b)
{
int i=blockDim.x*blockIdx.x+threadIdx.x;
if(i<32*32){
c[i]=a[blockIdx.x]+b[threadIdx.x];
}
}
when you call the kernel by myKrom<<<32, 32>>> (c, a, b);
这篇关于使用CUDA在gpu上平行克罗内克张量积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!