CUDA:如何将多个重复的参数传递给CUDA内核 [英] CUDA: How to pass multiple duplicated arguments to CUDA Kernel

查看:549
本文介绍了CUDA:如何将多个重复的参数传递给CUDA内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法来传递多个重复的参数在CUDA内核,

I'm looking for an elegent way to pass multiple duplicated arguments in CUDA kernel,

我们都知道,每个内核参数位于每个CUDA线程,因此,内核传递给每个线程的参数之间可能存在重复,每个线程都位于每个堆栈上。

As we all know, each kernel argument is located on the stack of each CUDA thread, therefore, there might be duplication between arguments being passed by the Kernel to each thread, memory which is located on each stack.

为了最小化传递的重复参数的数量,我寻找一个优雅的方式这样做。

In order to minimize the number of duplicated arguments being passed, I'm looking for an elegant way doing so.

为了解释我的担心:让我们假设我的代码看起来像这样:

In order to explain my concern: Let's say my code looks like this:

   kernelFunction<<<gridSize,blockSize>>>(UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements,x,y,ect...)

UINT imageWidth,UINT imageWidth,UINT imageStride,UINT numberOfElements个参数位于每个线程库,

The UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements arguments are located at each thread stock ,

我正在寻找一个技巧发送更少的参数和访问来自其他来源的数据。

I'm looking for a trick to send less arguments and access the data from other source.

我正在考虑使用常量内存,但由于常量内存位于全局,我放弃它。不用说内存位置应该快。

I was thinking about using constant memory, but since constant memory is located on the global , I drop it. needless to say that the memory location should be fast.

任何帮助将不胜感激。

Any help would be appreciated.

推荐答案

内核参数通过常量内存(或sm_1x中的共享内存)传递,因此没有您建议的复制。

Kernel arguments are passed in via constant memory (or shared memory in sm_1x), so there is no replication as you suggest.

cf 计划指南


__ global__函数参数传递给设备:

__global__ function parameters are passed to the device:


  • 存储器并且在计算能力1.x,

  • 的设备上通过常数内存限制为256字节,并且在设备上限制为4KB计算能力2的
    .x和更高版本。

当然,如果您随后在代码中修改一个变量,您正在修改本地副本(根据C标准),因此每个线程都将有自己的副本,无论是在寄存器中,或者如果需要,在堆栈上。

Of course, if you subsequently modify one of variable in your code then you're modifying a local copy (as per the C standard) and hence each thread will have its own copy, either in registers or, if needed, on the stack.

这篇关于CUDA:如何将多个重复的参数传递给CUDA内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆