cuda中的cuda函数应用 [英] cuda function application elementwise in cuda

查看:348
本文介绍了cuda中的cuda函数应用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将矩阵A和获得结果y的向量x相乘后,我要对y应用一个函数h元素。

After multiplying a matrix A and a vector x obtaining the result y, I want to apply a function h elementwise to y.

我想获得z = h (A x),其中h以元素方式应用于向量A

I want to obtain z = h(Ax), where h is applied elementwise to the vector Ax.

我知道如何使矩阵/向量乘法GPU(与cublas)。现在我想要h(这是我自己的函数,用C ++编码)应用于GPU中的结果向量,我该怎么做?

I know how to make the matrix/vector multiplication on the GPU (with cublas). Now I want h (which is my own function, coded in C++) to be applied to the resultant vector also in GPU, how can I do that?

推荐答案

两种可能的方法是:


  1. 编写您自己的CUDA内核以执行操作

  2. 使用推荐(例如 thrust :: for_each())。

  1. Write your own CUDA kernel to perform the operation
  2. Use thrust (e.g. thrust::for_each() ).

下面是两种方法的工作示例:

Here is a worked example of both approaches:

$ cat t934.cu
#include <iostream>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/for_each.h>

#define DSIZE 4

#define nTPB 256

template <typename T>
__host__ __device__ T myfunc(T &d){

  return d + 5;  // define your own function here
}

struct mytfunc
{
template <typename T>
__host__ __device__
 void operator()(T &d){

  d = myfunc(d);
  }
};

template <typename T>
__global__ void mykernel(T *dvec, size_t dsize){

  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  if (idx < dsize) dvec[idx] = myfunc(dvec[idx]);
}

int main(){

  // first using kernel
  float *h_data, *d_data;
  h_data = new float[DSIZE];
  cudaMalloc(&d_data, DSIZE*sizeof(float));
  for (int i = 0; i < DSIZE; i++) h_data[i] = i;
  cudaMemcpy(d_data, h_data, DSIZE*sizeof(float), cudaMemcpyHostToDevice);
  mykernel<<<(DSIZE+nTPB-1)/nTPB,nTPB>>>(d_data, DSIZE);
  cudaMemcpy(h_data, d_data, DSIZE*sizeof(float), cudaMemcpyDeviceToHost);
  for (int i = 0; i < DSIZE; i++) std::cout << h_data[i] << ",";
  std::cout << std::endl;

  // then using thrust
  thrust::host_vector<float>   hvec(h_data, h_data+DSIZE);
  thrust::device_vector<float> dvec = hvec;
  thrust::for_each(dvec.begin(), dvec.end(), mytfunc());
  thrust::copy_n(dvec.begin(), DSIZE, std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl;
}

$ nvcc -o t934 t934.cu
$ ./t934
5,6,7,8,
10,11,12,13,
$

请注意,为了提供完整的示例,从主机存储器中的向量定义开始。如果你已经有了设备内存中的向量(可能是由于计算y = Ax),那么你可以通过将该向量传递给CUDA内核,或直接在推力函数中使用它,使用 thrust :: device_ptr wrapper(此方法在之前链接的推力快速入门指南中介绍。)

Note that in order to provide a complete example, I'm starting with a vector definition in host memory. If you already have the vector in device memory (perhaps as a result of computing y=Ax) then you can work directly on that, by passing that vector to the CUDA kernel, or using it directly in the thrust function, using a thrust::device_ptr wrapper (this method is covered in the thrust quick start guide previously linked.)

ve在这里是你想使用一个变量的任意函数。这应该处理在 myfunc 中定义的几乎任意的函数。但是,对于您可能感兴趣的某些类别的功能,您也可以实现一个或多个CUBLAS调用。

The assumption I've made here is you want to use an arbitrary function of one variable. This should handle pretty much arbitrary functions defined in myfunc. However, for some categories of functions that you may be interested in, you may be able to realize it one or more CUBLAS calls as well.

这篇关于cuda中的cuda函数应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆