使用CUDA在gpu上平行克罗内克张量积 [英] parallel Kronecker tensor product on gpu using CUDA

查看：301 发布时间：2017/3/4 13:29:22 matlab parallel-processing cuda gpu linear-algebra

本文介绍了使用CUDA在gpu上平行克罗内克张量积的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在GPU上使用[PTX文件与matlab parallel.gpu.CUDAkernel] [2]我的问题[kron张量产品] [3]在我的代码它应该乘以两个通过将第一向量 a = <32x1> 的每个元素乘以所有的向量 kron（a，b）其他向量的元素 b = <1x32> ，输出向量大小将为 k <32x32> = a。* b ，我试图写它在CPP和它的工作，因为我只关心的所有元素的2d数组，我想我可以使它容易作为一维数组，因为 m = sum（sum （kron（a，b）））是我正在工作的代码

  for = 0; i <32; i ++）
 for（j = 0; j <32; j ++）
k [i * 32 + j] = a [i] * b [j]

这意味着 a [i] 通过 b 中的eachelement，我虽然去32块，每个块有32个线程，代码应该

  __ global__ void myKrom（int * c，int * a，int * b）{
 int i = blockDim.x * blockIdx.x + threadIdx.x; 
 while（i <32）{
 c [i] = a [blockIdx.x] + b [blockDim.x * blockIdx.x + threadIdx.x]; 
}

是外层循环，但它没有，任何机构可以告诉我在哪里，我可以请求并行方式做并行和

解决方案

您可能实际上是这样的：

  __ global__ void myKrom ，int * a，int * b）
 {
 int i = blockDim.x * blockIdx.x + threadIdx.x; 
 if（i <32 * 32）{
 c [i] = a [blockIdx.x] + b [threadIdx.x]; 
} 
 
}

myKrom<<<< 32，32>>> （c，a，b）;

i am working in parallelise [this file][1] on GPU using [PTX file with matlab parallel.gpu.CUDAkernel][2] my problem with [kron tensor product][3]in my code it should multiply two vectorskron(a,b) by multiplying each element of the first vectora=<32x1> by the all elements of the other vectorb=<1x32> and the output vector size will bek<32x32>=a.*b,i tried to write it in CPP and it worked,as i only concern about summing all the elements of 2d array , i thought i can make it easy as 1D array because m=sum(sum(kron(a,b))) is the code i am working on

for(i=0;i<32;i++)
 for(j=0;j<32;j++)
   k[i*32+j]=a[i]*b[j]

it meant to have the a[i]th element multiply by eachelement in b, and i though to go with 32 blocks with each block has a 32 threads and the code should be

__global__ void myKrom(int* c,int* a, int*b) {
  int i=blockDim.x*blockIdx.x+threadIdx.x;
  while(i<32) {
    c[i]=a[blockIdx.x]+b[blockDim.x*blockIdx.x+threadIdx.x];
  }

that should make the trick as the blockIdx.x is the outer loop, but it didn't, could any body tell me where,may i ask for parallel way to do the parallel sum

解决方案

You may actually mean something like this:

__global__ void myKrom(int* c,int* a, int*b)
{
  int i=blockDim.x*blockIdx.x+threadIdx.x;
  if(i<32*32){
    c[i]=a[blockIdx.x]+b[threadIdx.x];
  }

}

when you call the kernel by myKrom<<<32, 32>>> (c, a, b);

这篇关于使用CUDA在gpu上平行克罗内克张量积的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用CUDA在gpu上平行克罗内克张量积 [英] parallel Kronecker tensor product on gpu using CUDA

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

使用CUDA在gpu上平行克罗内克张量积 [英] parallel Kronecker tensor product on gpu using CUDA

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭