比较AVX/AVX2中的两个向量(c) [英] Comparing 2 vectors in AVX/AVX2 (c)
问题描述
我有两个 __ m256i
向量(每个向量都包含字符),我想找出它们是否完全相同.如果所有位都相等,我需要的是 true
,否则是 0
.
I have two __m256i
vectors (each containing chars), and I want to find out if they are completely identical or not. All I need is true
if all bits are equal, and 0
otherwise.
最有效的方法是什么?这是加载数组的代码:
What's the most efficient way of doing that? Here's the code loading the arrays:
char * a1 = "abcdefhgabcdefhgabcdefhgabcdefhg";
__m256i r1 = _mm256_load_si256((__m256i *) a1);
char * a2 = "abcdefhgabcdefhgabcdefhgabcdefhg";
__m256i r2 = _mm256_load_si256((__m256i *) a2);
推荐答案
当前Intel和AMD CPU上最有效的方法是逐元素比较是否相等,然后检查所有元素的比较是否正确.
The most efficient way on current Intel and AMD CPUs is an element-wise comparison for equality, and then check that the comparison was true for all elements.
这可以编译为多个指令,但是它们都很便宜,而且(如果您跳转到结果的话)比较+分支甚至将宏融合到单个uop中.
This compiles to multiple instructions, but they're all cheap and (if you branch on the result) the compare+branch even macro-fuses into a single uop.
#include <immintrin.h>
#include <stdbool.h>
bool vec_equal(__m256i a, __m256i b) {
__m256i pcmp = _mm256_cmpeq_epi32(a, b); // epi8 is fine too
unsigned bitmask = _mm256_movemask_epi8(pcmp);
return (bitmask == 0xffffffffU);
}
生成的asm应该为 vpcmpeqd/vpmovmskb/cmp 0xffffffff/je
,在Intel CPU上仅为3 ups.
The resulting asm should be vpcmpeqd / vpmovmskb / cmp 0xffffffff / je
, which is only 3 uops on Intel CPUs.
vptest
为2微秒,并且不与 jcc
进行宏融合,因此与 movmsk
/ cmp <相等或更差/code>用于测试打包比较的结果.(请参见 http://agner.org/optimize/
vptest
is 2 uops and doesn't macro-fuse with jcc
, so equal or worse than movmsk
/ cmp
for testing the result of a packed-compare. (See http://agner.org/optimize/
这篇关于比较AVX/AVX2中的两个向量(c)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!