使用内联PTX asm()指令时,"volatile"有什么作用? [英] When using inline PTX asm() instructions, what does 'volatile' do?

查看:103
本文介绍了使用内联PTX asm()指令时,"volatile"有什么作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我们在通常的C/C ++ CUDA代码中编写内联PTX程序集时,例如:

When we write inline PTX assembly in our generally C/C++ CUDA code, e.g.:

__device__ __inline__ uint32_t bfind(uint32_t val)
{
    uint32_t ret;
    asm ("bfind.u32 %0, %1;" : "=r"(ret): "r"(val));
    return ret;
}

我们可以在 asm 之后添加 volatile 关键字,例如:

we can add the volatile keyword after asm, e.g.:

__device__ __inline__ uint32_t bfind(uint32_t val)
{
    uint32_t ret;
    asm volatile ("bfind.u32 %0, %1;" : "=r"(ret): "r"(val));
    return ret;
}

CUDA 有关内联PTX组装的文档说:

编译器假定 asm()语句除更改输出操作数外没有其他影响.为了确保在生成PTX期间不会删除或移动 asm ,您应该使用volatile关键字

The compiler assumes that an asm() statement has no side effects except to change the output operands. To ensure that the asm is not deleted or moved during generation of PTX, you should use the volatile keyword

我不明白那是什么意思.所以,

I don't understand what that means. So,

  • 为什么要删除我的 asm()?或者更确切地说,如果编译器注意到它无效,为什么我应该介意删除它?
  • 如果在生成PTX的过程中移动 asm(),为什么会出现问题?那是优化过程的一部分,不是吗?
  • 当分别面对非易失性和易失性 asm()指令时,如何更准确地描述编译器的行为?
  • Why would my asm() be deleted? Or rather, if the compiler notices it has no effect, why should I mind it being deleted?
  • Why is it a problem if my asm() is moved during the generation of PTX? That's part of the optimization process, isn't it?
  • How would one characterize the compiler's behavior more exactly when facing non-volatile and volatile asm() instructions respectively?

推荐答案

为什么要删除我的asm()?或者更确切地说,如果编译器注意到了没有效果,我为什么要删除它?

Why would my asm() be deleted? Or rather, if the compiler notices it has no effect, why should I mind it being deleted?

如果编译器检测到您的内联PTX除了在线程本地作用域之外没有对更改状态做出任何贡献,可以随意将其删除以进行优化.通常来说,这正是您想要发生的事情.但有时并非如此.您的意图和编译器的优化策略可能并不总是以您想要或期望的方式相交.买者自负等等.

If the compiler detects that your inline PTX doesn't contribute to changing state at anything other than at thread local scope, it feels free to delete it as an optimization. Generally speaking, that is exactly what you want to happen. But sometimes, it isn't. Your intentions and the compiler's optimization strategy might not always intersect in ways you either want or expect. Caveat emptor and all that.

如果在生成PTX的过程中移动了asm(),为什么会出现问题?那是优化过程的一部分,不是吗?

Why is it a problem if my asm() is moved during the generation of PTX? That's part of the optimization process, isn't it?

这不是问题,它是优化过程的一部分;想象一下,您正在设计微基准,并且编译器决定重新排序您精心设计的内联PTX编码指令序列(经典情况是将调用移到错误的地方).代码,以免破坏时序部分或内存事务模式设计).结果将不是您想要的.我想这可能会令人沮丧.

It is not a problem, and is a part of the optimization process; but sometimes you might want to circumvent that. Imagine you are crafting micro-benchmarks and the compiler decides to reorder your carefully designed sequence of instructions you coded in inline PTX (the classic case is moving calls to the wrong place in emitted code so that timing sections or memory transaction pattern designs get broken). The results wouldn't be what you intended. I would imagine that could be pretty frustrating.

如何准确地描述编译器的行为?分别面对非易失性和易失性asm()指令?

How would one characterize the compiler's behavior more exactly when facing non-volatile and volatile asm() instructions respectively?

与标准CUDA内核代码一样,volatile确保编译器尊重在其输出中发出给定的内联PTX操作,而不是将其暴露给代码分析的优化.

As with standard CUDA kernel code, volatile ensures that compiler honors emitting a given inline PTX operation in its output, rather than exposing it to being optimized away by code analysis.

这篇关于使用内联PTX asm()指令时,"volatile"有什么作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆