CUDA:减少翘曲和volatile关键字 [英] CUDA: In warp reduction and volatile keyword

查看:190
本文介绍了CUDA:减少翘曲和volatile关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读以下内容的问题及其答案后

LINK

After reading the question and its answer from the following
LINK

我仍然有一个疑问。从我在C / C ++中的背景;我了解使用 volatile 具有缺点。并且在答案中还指出,在CUDA的情况下,如果不使用 volatile 关键字,则优化可以将寄存器替换为共享数组以保留数据。

I still have a question remaining in my mind. From my background in C/C++; I understand that using volatile has it's demerits. And also it is pointed in the answers that in case of CUDA, the optimizations can replace shared array with registers to keep data if volatile keyword is not used.

我想知道在计算(总)减少量时会遇到的性能问题。例如

I want to know what would be the performance issues that can be encountered when calculating (sum) reduction. e.g.

__device__ void sum(volatile int *s_data, int tid)
{
    if (tid < 16)
    {
        s_data[tid] += s_data[tid + 16];
        s_data[tid] += s_data[tid +  8];
        s_data[tid] += s_data[tid +  4];
        s_data[tid] += s_data[tid +  2];
        s_data[tid] += s_data[tid +  1];
    }
}

我用于减少翘曲。由于所有在in warp中的线程都是同步的,因此我认为没有必要使用 syncthreads()构造。

I am using in warp reduction. Since all the threads with in warp are in sync, therefore I believe there is no need to use syncthreads() construct.

我想知道会删除关键字 volatile 搞乱我的总和吗(由于cuda优化)?我可以在没有 volatile 关键字的情况下使用这种缩减方式吗?

I want to know will removing the keyword volatile mess up my sum (due to cuda optimizations)? Can I use reduction such as this without volatile keyword.

由于我多次使用了这种缩减功能,因此 volatile 关键字会导致性能下降吗?

Since I use this reduction function multiple time, will volatile keyword cause any performance degradation?

推荐答案

从其中删除volatile关键字代码可以在Fermi和Kepler GPUS上破坏该代码。这些GPU缺少直接在共享内存上运行的指令。而是,编译器必须向寄存器和从寄存器发出加载/存储对。

Removing the volatile keyword from that code could break that code on Fermi and Kepler GPUS. Those GPUs lack instructions to directly operate on shared memory. Instead, the compiler must emit a load/store pair to and from register.

在这种情况下,volatile关键字的作用是使编译器尊重加载-操作-存储周期,而不执行优化以保持<$ c $的值c> s_data [tid] 在寄存器中。要将总和保持在寄存器中,将破坏使扭曲级别的共享内存总和正常工作所需的隐式内存同步。

What the volatile keyword does in this context is make the compiler honour that load-operate-store cycle and not perform an optimisation that would keep the value of s_data[tid] in register. To keep the sum accumulating in register would break the implicit memory syncronisation required to make that warp level shared memory summation work correctly.

这篇关于CUDA:减少翘曲和volatile关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆