在带有Neon的ARMv7a上以64位带符号比较支持CMGT的最有效方法是什么? [英] What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon?

查看:89
本文介绍了在带有Neon的ARMv7a上以64位带符号比较支持CMGT的最有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题最初是为

This question was originally posed for SSE2 here. Since every single algorithm overlapped with ARMv7a+NEON's support for the same operations, the question was updated to include the ARMv7+NEON versions. At the request of a commenter, this question is asked here to show that it is indeed a separate topic and to provide alternative solutions that might be more practical for ARMv7+NEON. The net purpose of these questions is to find ideal implementations for consideration into WebAssembly SIMD.

推荐答案

签名的64位饱和减法.

Signed 64-bit saturating subtract.

假设我使用 _mm_subs_epi16 进行的测试是正确的,并且将1:1转换为NEON ...

Assuming my tests using _mm_subs_epi16 are correct and translate to 1:1 to NEON...

uint64x2_t pcmpgtq_armv7 (int64x2_t a, int64x2_t b) {
    return vreinterpretq_u64_s64(vshrq_n_s64(vqsubq_s64(b, a), 63));
}

肯定是模拟 pcmpgtq 的最快方法.

Would certainly seem to be the fastest achievable way to emulate pcmpgtq.

骇客的喜悦给出以下公式:

// return (a > b) ? -1LL : 0LL; 
int64_t cmpgt(int64_t a, int64_t b) {
    return ((b & ~a) | ((b - a) & ~(b ^ a))) >> 63; 
}

int64_t cmpgt(int64_t a, int64_t b) {
    return ((b - a) ^ ((b ^ a) & ((b - a) ^ b))) >> 63;
}

这篇关于在带有Neon的ARMv7a上以64位带符号比较支持CMGT的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆