CUDA：从内核调用device函数 [英] CUDA: Calling a device function from a kernel

查看：767 发布时间：2017/3/4 15:06:18 cuda

本文介绍了CUDA：从内核调用__device__函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个内核在if语句中调用 device 函数。代码如下：

I have a kernel that calls a device function inside an if statement. The code is as follows:

__device__ void SetValues(int *ptr,int id)
{
    if(ptr[threadIdx.x]==id) //question related to here
          ptr[threadIdx.x]++;
}

__global__ void Kernel(int *ptr)
{
    if(threadIdx.x<2)
         SetValues(ptr,threadIdx.x);
}

在内核线程0-1中同时调用SetValues。之后会发生什么？我的意思是现在有2个并发调用SetValues。每个函数调用是否顺序执行？因此，它们的行为像2个内核函数调用？

In the kernel threads 0-1 call SetValues concurrently. What happens after that? I mean there are now 2 concurrent calls to SetValues. Does every function call execute serially? So they behave like 2 kernel function calls?

推荐答案

CUDA实际上默认内联所有函数（虽然Fermi也支持函数指针和实函数调用）。所以你的示例代码编译成这样

CUDA actually inlines all functions by default (although Fermi does also support function pointers and real function calls). So your example code gets compiled to something like this

__global__ void Kernel(int *ptr)
{
    if(threadIdx.x<2)
        if(ptr[threadIdx.x]==threadIdx.x)
            ptr[threadIdx.x]++;
}

执行与正常代码并行执行。如果你将一个记忆赛车工程化成一个功能，没有可以救你的序列化机制。

Execution happens in parallel, just like normal code. If you engineer a memory race into a function, there is no serialization mechanism that can save you.

这篇关于CUDA：从内核调用__device__函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CUDA：从内核调用device函数 [英] CUDA: Calling a device function from a kernel

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

CUDA：从内核调用__device__函数 [英] CUDA: Calling a __device__ function from a kernel

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

CUDA：从内核调用device函数 [英] CUDA: Calling a device function from a kernel

登录关闭