CUDA：向内核传递参数会减慢内核启动吗？ [英] CUDA: Does passing arguments to a kernel slow the kernel launch much?

查看：163 发布时间：2017/3/4 15:16:49 gpgpu cuda

本文介绍了CUDA：向内核传递参数会减慢内核启动吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这里是CUDA的初学者。

CUDA beginner here.

在我的代码中，我目前正在主机代码中循环多次启动内核。（因为我需要块之间的同步）。所以我想知道我是否可以优化内核启动。

In my code i am currently launching kernels a lot of times in a loop in the host code. (Because i need synchronization between blocks). So i wondered if i might be able to optimize the kernel launch.

我的内核启动看起来像这样：

My kernel launches look something like this:

MyKernel<<<blocks,threadsperblock>>>(double_ptr, double_ptr, int N, double x);

因此，为了启动内核，一些信号显然必须从CPU到GPU， m不知道参数的传递是否使这个过程明显变慢。

So to launch a kernel some signal obviously has to go from the CPU to the GPU, but i'm wondering if the passing of arguments make this process noticeably slower.

内核的参数每次都是相同的，所以也许我可以通过复制一次来节省时间，通过

The arguments to the kernel are the same every single time, so perhaps i could save time by copying them once, access them in the kernel by a name defined by

__device__ int N;
<and somehow (how?) copy the value to this name N on the GPU once>

并且只是启动没有任何参数的内核

and simply launch the kernel with no arguments as such

MyKernel<<<blocks,threadsperblock>>>();

这会使我的程序更快吗？
这是什么最好的方法？
AFAIK的参数存储在一些恒定的全局内存中。如何确保手动传输的值存储在快速或快速的内存中？

Will this make my program any faster? What is the best way of doing this? AFAIK the arguments are stored in some constant global memory. How can i make sure that the manually transferred values are stored in a memory which is as fast or faster?

感谢您提供任何帮助。

CUDA：向内核传递参数会减慢内核启动吗？ [英] CUDA: Does passing arguments to a kernel slow the kernel launch much?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

CUDA：向内核传递参数会减慢内核启动吗？ [英] CUDA: Does passing arguments to a kernel slow the kernel launch much?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭