在带有Neon的ARMv7a上以64位带符号比较支持CMGT的最有效方法是什么? [英] What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon?
问题描述
This question was originally posed for SSE2 here. Since every single algorithm overlapped with ARMv7a+NEON's support for the same operations, the question was updated to include the ARMv7+NEON versions. At the request of a commenter, this question is asked here to show that it is indeed a separate topic and to provide alternative solutions that might be more practical for ARMv7+NEON. The net purpose of these questions is to find ideal implementations for consideration into WebAssembly SIMD.
推荐答案
签名的64位饱和减法.
Signed 64-bit saturating subtract.
假设我使用 _mm_subs_epi16
进行的测试是正确的,并且将1:1转换为NEON ...
Assuming my tests using _mm_subs_epi16
are correct and translate to 1:1 to NEON...
uint64x2_t pcmpgtq_armv7 (int64x2_t a, int64x2_t b) {
return vreinterpretq_u64_s64(vshrq_n_s64(vqsubq_s64(b, a), 63));
}
肯定是模拟 pcmpgtq
的最快方法.
Would certainly seem to be the fastest achievable way to emulate pcmpgtq
.
// return (a > b) ? -1LL : 0LL;
int64_t cmpgt(int64_t a, int64_t b) {
return ((b & ~a) | ((b - a) & ~(b ^ a))) >> 63;
}
int64_t cmpgt(int64_t a, int64_t b) {
return ((b - a) ^ ((b ^ a) & ((b - a) ^ b))) >> 63;
}
这篇关于在带有Neon的ARMv7a上以64位带符号比较支持CMGT的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!