_addcarry_u64 和 _addcarryx_u64 与 MSVC 和 ICC [英] _addcarry_u64 and _addcarryx_u64 with MSVC and ICC
问题描述
MSVC 和 ICC 都支持内在函数 _addcarry_u64
和 _addcarryx_u64
.
根据英特尔的内在指南和白皮书 这些应该映射到adcx
和 adox
分别.但是,通过查看生成的程序集,很明显它们分别映射到 adc
和 adcx
,并且没有映射到 adox
的内在函数.>
另外,告诉编译器在 MSVC 或 我不确定如何使用 MSVC 和 ICC 启用 ADX.-march=core-avx2
在 Linux 上使用 ICC 制作没有区别.
MSVC 文档 列出了_addcarryx_u64
使用 ADX 技术,而 _addcarry_u64
没有列出技术.但是,这些内在函数的 MSVC 文档中的链接直接指向英特尔内在指南,该指南与 MSVC 自己的文档和生成的程序集相矛盾.
由此我得出结论,英特尔的内在指南和白皮书是错误的.
这对于 MSVC 来说是有意义的,它不允许内联汇编,它应该提供一种使用 adc
的方法,它可以通过 _addcarry_u64
实现.
adcx
和 adox
的一大优点是它们操作不同的标志(carry CF
和溢出 OF
) 并且这允许两个独立的并行进位链.但是,既然 adox
没有内在的东西,这怎么可能呢?使用 ICC 至少可以使用内联汇编,但这在 64 位模式下的 MSVC 中是不可能的.
微软和英特尔的文档(白皮书和在线内在指南)现在都同意.
_addcarry_u64
内在文档说只产生 adc
._addcarryx_u64
内在函数可以生成 adcx
或 adox
.但是,在 MSVC 2013 和 2015 中,_addcarryx_u64
仅生成 adcx
.ICC 两者都产生.
相关,GCC 目前不支持 ADOX 和 ADCX.目前"包括 GCC 6.4 (Fedora 25) 和 GCC 7.1 (Fedora 26).GCC 有效地禁用了内在函数,但它仍然通过在预处理器中定义 __ADX__
来宣传支持.另请参阅问题 67317,_addcarry_u32/_addcarry_u64 的愚蠢代码生成.非常感谢奚若瑶发现问题.
根据 GCC 帮助邮件列表上的 Uros Bizjak,GCC 可能 永远不要支持内在函数.另请参阅 GCC 不会为 _addcarryx_u64 生成 ADCX 或 ADOX.
Clang 在 ADOX 和 ADCX 方面有自己的一套问题.尝试使用 Clang 3.9 和 4.0 时会崩溃.另请参阅问题 34249,在 Clang 3.9 中使用 _addcarryx_u64 时出现恐慌.根据 Craig Topper 的说法,它应该在 Clang 5.0 中修复.
我很抱歉在 MSVC 问题下发布信息.这是搜索有关使用内在函数的信息时为数不多的命中之一.
MSVC and ICC both support the intrinsics _addcarry_u64
and _addcarryx_u64
.
According to Intel's Intrinsic Guide and white paper these should map to adcx
and adox
respectively. However, by looking at the generated assembly it's clear they map to adc
and adcx
respectively and there is no intrinsic which maps to adox
.
Additionally, telling the compiler to enable AVX2 with
I'm not sure how to enable ADX with MSVC and ICC./arch:AVX2
in MSVC or -march=core-avx2
with ICC on Linux makes no difference.
The documentation for MSVC lists _addcarryx_u64
with the technology of ADX whereas _addcarry_u64
has no listed technology. However, the link in MSVC's documentation for these intrinsics goes directly to the Intel Intrinsic guide which contradicts MSVC's own documentation and the generated assembly.
From this I conclude that Intel's Intrinsic guide and white paper are wrong.
This makes some sense for MSVC sense it does not allow inline assembly it should provide a way to use adc
which it does with _addcarry_u64
.
One of the big advantages of adcx
and adox
is that they operate on different flags (carry CF
and overflow OF
) and this allows two independent parallel carry chains. However, since there is no intrinsic for adox
how is this possible? With ICC at least one can use inline assembly but this is not possible with MSVC in 64-bit mode.
Microsoft and Intel's documentation (both the white paper and the intrinsic guide online) both agree now.
The _addcarry_u64
intrinsic documentation says produces only adc
. The _addcarryx_u64
intrinsic can produce either adcx
or adox
. With MSVC 2013 and 2015, however, _addcarryx_u64
only produces adcx
. ICC produces both.
Related, GCC does not support ADOX and ADCX at the moment. "At the moment" includes GCC 6.4 (Fedora 25) and GCC 7.1 (Fedora 26). GCC effectively disabled the intrinsics, but it still advertises support by defining __ADX__
in the preprocessor. Also see Issue 67317, Silly code generation for _addcarry_u32/_addcarry_u64. Many thanks to Xi Ruoyao for finding the issue.
According to Uros Bizjak on the GCC Help mailing list, GCC may never support the intrinsics. Also see GCC does not generate ADCX or ADOX for _addcarryx_u64.
Clang has its own set of issues with respect to ADOX and ADCX. Clang 3.9 and 4.0 crash when attempting to use them. Also see Issue 34249, Panic when using _addcarryx_u64 with Clang 3.9. According to Craig Topper, it should be fixed in Clang 5.0.
My apologies for posting the information under a MSVC question. This is one of the few hits when searching for information about using the intrinsics.
这篇关于_addcarry_u64 和 _addcarryx_u64 与 MSVC 和 ICC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!