CUDA的内联PTX代码的语法 [英] Syntax on inline PTX code for CUDA
问题描述
如Nvidia的Inline PTX Assembly文档中所述,使用内联汇编的语法是:
asm(temp_string:constraint(output):constraint(input)) ;
以下是两个示例:
asm(vadd.s32.s32.s32%0,%1.h0 ,%2.h0;:= r(v):r(a),r(b));
as(vadd.u32.u32.u32%0.b0,%1,%2,%3;:= r(v):r
, h0
或<$ c $(z)); c> b0
跟随%n
。我查看了CUDA的官方文档,没有发现任何关于 h0
或 b0
的含义的问题。我看到 h0
, h1
和 b0
code> b1 b2
, b3
。我猜 h0
或 h1
代表一个16位值,而 bn
表示字节值。有人知道这些的确切含义吗?
As written in Nvidia's Inline PTX Assembly document, the grammar for using inline assembly is:
asm("temp_string" : "constraint"(output) : "constraint"(input));
Here are two examples:
asm("vadd.s32.s32.s32 %0, %1.h0, %2.h0;" : "=r"(v) : "r"(a), "r"(b));
asm("vadd.u32.u32.u32 %0.b0, %1, %2, %3;" : "=r"(v) : "r"(a), "r"(b), "r"(z));
In both examples, there are parameters such as:h0
or b0
follow the %n
. I looked through CUDA's official document and didn't find anything concerns about the meaning of h0
or b0
. I've seen h0
,h1
and b0
,b1
,b2
,b3
. I guess h0
or h1
represents a 16bit value, while bn
represents a byte value. Does someone know the exact meaning of these?
感谢Roger Dahl的帮助。我读了PTX ISA 3.0并找到了答案。
h表示半字。 h0
表示32位字的低半字。 h1
表示32位字的高半字。 b表示整数字节。 b0
, b1
, b2
和 b3
表示32位字的第一个8位,第二个8位,第三个8位和最高8位。
Thanks for the help from Roger Dahl. I read the PTX ISA 3.0 and found the answer.
"h" means half-word. h0
means the low half-word of a 32bit word. h1
means the high half-word of a 32bit word. "b" means an integer byte. b0
,b1
,b2
and b3
represent the first 8bit, second 8bit, third 8bit and highest 8bit of a 32bit word.
推荐答案
vadd
是PTX附带的视频特定说明之一。 CUDA分发包括完整的PTX ISA的描述。在我的机器上,它在 C:\Program Files \ NVIDIA GPU计算工具包\CUDA\v4.1\doc\ptx_isa_3.0.pdf
。 h0
, h1
, b0
等的说明,指示符位于 8.7.11视频说明
部分。它们表示不同的隐式移位/掩码操作(参见 optMerge
函数)。
vadd
is one of the video specific instructions that are included with PTX. A description of the complete PTX ISA is included with the CUDA distribution. On my machine, it's in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\doc\ptx_isa_3.0.pdf
. The description of the h0
, h1
, b0
, etc, designators are in the 8.7.11 Video Instructions
section. They represent different implicit shift/mask operations (see the optMerge
function).
这篇关于CUDA的内联PTX代码的语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!