如何将多个重复的参数传递给CUDA内核 [英] How to pass multiple duplicated arguments to CUDA Kernel

查看:108
本文介绍了如何将多个重复的参数传递给CUDA内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种优雅的方式来在CUDA内核中传递多个重复的参数,

I'm looking for an elegant way to pass multiple duplicated arguments in CUDA kernel,

众所周知,每个内核参数都位于每个堆栈的堆栈中因此,CUDA线程在内核传递给每个线程的参数和位于每个堆栈上的内存之间可能存在重复。

As we all know, each kernel argument is located on the stack of each CUDA thread, therefore, there might be duplication between arguments being passed by the Kernel to each thread, memory which is located on each stack.

为了尽量减少传递的重复参数的数量,我正在寻找一种优雅的方式。

In order to minimize the number of duplicated arguments being passed, I'm looking for an elegant way doing so.

为了解释我的担忧:假设我的代码如下:

In order to explain my concern: Let's say my code looks like this:

   kernelFunction<<<gridSize,blockSize>>>(UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements,x,y,ect...)

UINT imageWidth,UINT imageWidth,UINT imageStride,UINT numberOfElements参数位于每个线程库中,

The UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements arguments are located at each thread stock ,

我正在寻找一个技巧,以减少参数发送并从其他来源访问数据。

I'm looking for a trick to send less arguments and access the data from other source.

我当时正在考虑使用常量内存,但是由于常量内存位于全局变量中,因此我将其删除。不用说,存储位置应该很快。

I was thinking about using constant memory, but since constant memory is located on the global , I drop it. needless to say that the memory location should be fast.

推荐答案

内核参数是通过常量内存(或sm_1x中的共享内存)传递的,因此您没有建议进行复制。

Kernel arguments are passed in via constant memory (or shared memory in sm_1x), so there is no replication as you suggest.

cf 编程指南


__ global__函数参数传递给设备:

__global__ function parameters are passed to the device:


  • 通过共享内存,并且在
    计算能力1.x的设备上限制为256字节,

  • 通过恒定内存,在设备
    计算能力的设备上限制为4 KB 2 .x及更高版本。

当然,如果随后修改了代码中的一个变量,则您正在修改本地副本(按照C标准),因此每个线程将在寄存器中或在栈中(如果需要)拥有自己的副本。

Of course, if you subsequently modify one of variable in your code then you're modifying a local copy (as per the C standard) and hence each thread will have its own copy, either in registers or, if needed, on the stack.

这篇关于如何将多个重复的参数传递给CUDA内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆