比较AVX/AVX2中的两个向量(c) [英] Comparing 2 vectors in AVX/AVX2 (c)

查看:86
本文介绍了比较AVX/AVX2中的两个向量(c)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个 __ m256i 向量(每个向量都包含字符),我想找出它们是否完全相同.如果所有位都相等,我需要的是 true ,否则是 0 .

I have two __m256i vectors (each containing chars), and I want to find out if they are completely identical or not. All I need is true if all bits are equal, and 0 otherwise.

最有效的方法是什么?这是加载数组的代码:

What's the most efficient way of doing that? Here's the code loading the arrays:

char * a1 = "abcdefhgabcdefhgabcdefhgabcdefhg";
__m256i r1 = _mm256_load_si256((__m256i *) a1);

char * a2 = "abcdefhgabcdefhgabcdefhgabcdefhg";
__m256i r2 = _mm256_load_si256((__m256i *) a2);

推荐答案

当前Intel和AMD CPU上最有效的方法是逐元素比较是否相等,然后检查所有元素的比较是否正确.

The most efficient way on current Intel and AMD CPUs is an element-wise comparison for equality, and then check that the comparison was true for all elements.

这可以编译为多个指令,但是它们都很便宜,而且(如果您跳转到结果的话)比较+分支甚至将宏融合到单个uop中.

This compiles to multiple instructions, but they're all cheap and (if you branch on the result) the compare+branch even macro-fuses into a single uop.

#include <immintrin.h>
#include <stdbool.h>

bool vec_equal(__m256i a, __m256i b) {
    __m256i pcmp = _mm256_cmpeq_epi32(a, b);  // epi8 is fine too
    unsigned bitmask = _mm256_movemask_epi8(pcmp);
    return (bitmask == 0xffffffffU);
}

生成的asm应该为 vpcmpeqd/vpmovmskb/cmp 0xffffffff/je ,在Intel CPU上仅为3 ups.

The resulting asm should be vpcmpeqd / vpmovmskb / cmp 0xffffffff / je, which is only 3 uops on Intel CPUs.

vptest 为2微秒,并且不与 jcc 进行宏融合,因此与 movmsk / cmp <相等或更差/code>用于测试打包比较的结果.(请参见 http://agner.org/optimize/

vptest is 2 uops and doesn't macro-fuse with jcc, so equal or worse than movmsk / cmp for testing the result of a packed-compare. (See http://agner.org/optimize/

这篇关于比较AVX/AVX2中的两个向量(c)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆