CUDA的内联PTX代码的语法 [英] Syntax on inline PTX code for CUDA

查看:1243
本文介绍了CUDA的内联PTX代码的语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如Nvidia的Inline PTX Assembly文档中所述,使用内联汇编的语法是:
asm(temp_string:constraint(output):constraint(input)) ;

以下是两个示例:

asm(vadd.s32.s32.s32%0,%1.h0 ,%2.h0;:= r(v):r(a),r(b));

as(vadd.u32.u32.u32%0.b0,%1,%2,%3;:= r(v):r h0 或<$ c $(z)); c> b0 跟随%n 。我查看了CUDA的官方文档,没有发现任何关于 h0 b0 的含义的问题。我看到 h0 h1 b0 code> b1 b2 b3 。我猜 h0 h1 代表一个16位值,而 bn 表示字节值。有人知道这些的确切含义吗?

As written in Nvidia's Inline PTX Assembly document, the grammar for using inline assembly is: asm("temp_string" : "constraint"(output) : "constraint"(input));
Here are two examples:
asm("vadd.s32.s32.s32 %0, %1.h0, %2.h0;" : "=r"(v) : "r"(a), "r"(b));
asm("vadd.u32.u32.u32 %0.b0, %1, %2, %3;" : "=r"(v) : "r"(a), "r"(b), "r"(z));
In both examples, there are parameters such as:h0 or b0 follow the %n. I looked through CUDA's official document and didn't find anything concerns about the meaning of h0 or b0. I've seen h0,h1 and b0,b1,b2,b3. I guess h0 or h1 represents a 16bit value, while bn represents a byte value. Does someone know the exact meaning of these?

感谢Roger Dahl的帮助。我读了PTX ISA 3.0并找到了答案。

h表示半字。 h0 表示32位字的低半字。 h1 表示32位字的高半字。 b表示整数字节。 b0 b1 b2 b3 表示32位字的第一个8位,第二个8位,第三个8位和最高8位。

Thanks for the help from Roger Dahl. I read the PTX ISA 3.0 and found the answer.
"h" means half-word. h0 means the low half-word of a 32bit word. h1 means the high half-word of a 32bit word. "b" means an integer byte. b0,b1,b2 and b3 represent the first 8bit, second 8bit, third 8bit and highest 8bit of a 32bit word.

推荐答案

vadd 是PTX附带的视频特定说明之一。 CUDA分发包括完整的PTX ISA的描述。在我的机器上,它在 C:\Program Files \ NVIDIA GPU计算工具包\CUDA\v4.1\doc\ptx_isa_3.0.pdf h0 h1 b0 等的说明,指示符位于 8.7.11视频说明部分。它们表示不同的隐式移位/掩码操作(参见 optMerge 函数)。

vadd is one of the video specific instructions that are included with PTX. A description of the complete PTX ISA is included with the CUDA distribution. On my machine, it's in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\doc\ptx_isa_3.0.pdf. The description of the h0, h1, b0, etc, designators are in the 8.7.11 Video Instructions section. They represent different implicit shift/mask operations (see the optMerge function).

这篇关于CUDA的内联PTX代码的语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆