SSE2内在函数 - 比较无符号整数 [英] SSE2 intrinsics - comparing unsigned integers
问题描述
我有兴趣在添加无符号8位整数时识别溢出值,并将结果饱和到0xFF:
__m128i m1 = _mm_loadu_si128(/ * 16 8位无符号整数* /);
__m128i m2 = _mm_loadu_si128(/ * 16 8位无符号整数* /);
__m128i m3 = _mm_adds_epu8(m1,m2);
我有兴趣对这些无符号整数执行比较,类似于 _mm_cmplt_epi8
for signed:
__ m128i mask = _mm_cmplt_epi8(m3,m1);
m1 = _mm_or_si128(m3,mask);
如果epu8等效项可用, mask
将具有 0xFF
其中 m3 [i] m1 [i]
(overflow!), 0x00否则
,我们将能够饱和 m1
使用or,因此 m1
将保存有效的加法结果,以及 0xFF
。
问题是, _mm_cmplt_epi8
执行签名比较,因此例如 m1 [ i] = 0x70
和 m2 [i] = 0x10
,则 m3 [i] = 0x80
和 mask [i] = 0xFF
,这显然不是我需要的。
使用VS2012。 / p>
我会感谢另一种方法来执行此操作。谢谢!
无符号8位向量比较的一种方法是利用 _mm_max_epu8
,它返回无符号8位int元素的最大值。您可以比较两个元素的(无符号)最大值与其中一个源元素的相等性,然后返回相应的结果。这转换为> =
或 <=
的两条指令,以及 >
或<
。
示例代码:
#include< stdio.h>
#include< emmintrin.h> // SSE2
#define _mm_cmpge_epu8(a,b)\
_mm_cmpeq_epi8(_mm_max_epu8(a,b),a)
#define _mm_cmple_epu8 b)_mm_cmpge_epu8(b,a)
#define _mm_cmpgt_epu8(a,b)\
_mm_xor_si128(_mm_cmple_epu8(a,b),_mm_set1_epi8(-1))
#define _mm_cmplt_epu8(a,b)_mm_cmpgt_epu8(b,a)
int main(void)
{
__m128i va = _mm_setr_epi8(0,0,1, 1,1,127,127,127,128,128,128,254,254,254,255,255);
__m128i vb = _mm_setr_epi8(0,255,0,1,255,0,127,255,0,128,255,0,254,255,0,255);
__m128i v_ge = _mm_cmpge_epu8(va,vb);
__m128i v_le = _mm_cmple_epu8(va,vb);
__m128i v_gt = _mm_cmpgt_epu8(va,vb);
__m128i v_lt = _mm_cmplt_epu8(va,vb);
printf(va =%4vhhu \\\
,va);
printf(vb =%4vhhu \\\
,vb);
printf(v_ge =%4vhhu \\\
,v_ge);
printf(v_le =%4vhhu \\\
,v_le);
printf(v_gt =%4vhhu \\\
,v_gt);
printf(v_lt =%4vhhu \\\
,v_lt);
return 0;
}
编译并运行:
$ gcc -Wall _mm_cmplt_epu8.c&& ./a.out
va = 0 0 1 1 1 127 127 127 128 128 128 254 254 254 255 255
vb = 0 255 0 1 255 0 127 255 0 128 255 0 254 255 0 255 b $ b v_ge = 255 0 255 255 0 255 255 0 255 255 0 255 255 0 255 255
v_le = 255 255 0 255 255 0 255 255 0 255 255 0 255 255 0 255
v_gt = 0 0 255 0 0 255 0 0 255 0 0 255 0 0 255 0
v_lt = 0 255 0 0 255 0 0 255 0 0 255 0 0 255 0 0
I'm interested in identifying overflowing values when adding unsigned 8-bit integers, and saturating the result to 0xFF:
__m128i m1 = _mm_loadu_si128(/* 16 8-bit unsigned integers */);
__m128i m2 = _mm_loadu_si128(/* 16 8-bit unsigned integers */);
__m128i m3 = _mm_adds_epu8(m1, m2);
I would be interested in performing comparison for less than on these unsigned integers, similar to _mm_cmplt_epi8
for signed:
__m128i mask = _mm_cmplt_epi8 (m3, m1);
m1 = _mm_or_si128(m3, mask);
If an "epu8" equivalent was available, mask
would have 0xFF
where m3[i] < m1[i]
(overflow!), 0x00 otherwise
, and we would be able to saturate m1
using the "or", so m1
will hold the addition result where valid, and 0xFF
where it overflowed.
Problem is, _mm_cmplt_epi8
performs a signed comparison, so for instance if m1[i] = 0x70
and m2[i] = 0x10
, then m3[i] = 0x80
and mask[i] = 0xFF
, which is obviously not what I require.
Using VS2012.
I would appreciate another approach for performing this. Thanks!
One way of implementing compares for unsigned 8 bit vectors is to exploit _mm_max_epu8
, which returns the maximum of unsigned 8 bit int elements. You can compare for equality the (unsigned) maximum value of two elements with one of the source elements and then return the appropriate result. This translates to 2 instructions for >=
or <=
, and 3 instructions for >
or <
.
Example code:
#include <stdio.h>
#include <emmintrin.h> // SSE2
#define _mm_cmpge_epu8(a, b) \
_mm_cmpeq_epi8(_mm_max_epu8(a, b), a)
#define _mm_cmple_epu8(a, b) _mm_cmpge_epu8(b, a)
#define _mm_cmpgt_epu8(a, b) \
_mm_xor_si128(_mm_cmple_epu8(a, b), _mm_set1_epi8(-1))
#define _mm_cmplt_epu8(a, b) _mm_cmpgt_epu8(b, a)
int main(void)
{
__m128i va = _mm_setr_epi8(0, 0, 1, 1, 1, 127, 127, 127, 128, 128, 128, 254, 254, 254, 255, 255);
__m128i vb = _mm_setr_epi8(0, 255, 0, 1, 255, 0, 127, 255, 0, 128, 255, 0, 254, 255, 0, 255);
__m128i v_ge = _mm_cmpge_epu8(va, vb);
__m128i v_le = _mm_cmple_epu8(va, vb);
__m128i v_gt = _mm_cmpgt_epu8(va, vb);
__m128i v_lt = _mm_cmplt_epu8(va, vb);
printf("va = %4vhhu\n", va);
printf("vb = %4vhhu\n", vb);
printf("v_ge = %4vhhu\n", v_ge);
printf("v_le = %4vhhu\n", v_le);
printf("v_gt = %4vhhu\n", v_gt);
printf("v_lt = %4vhhu\n", v_lt);
return 0;
}
Compile and run:
$ gcc -Wall _mm_cmplt_epu8.c && ./a.out
va = 0 0 1 1 1 127 127 127 128 128 128 254 254 254 255 255
vb = 0 255 0 1 255 0 127 255 0 128 255 0 254 255 0 255
v_ge = 255 0 255 255 0 255 255 0 255 255 0 255 255 0 255 255
v_le = 255 255 0 255 255 0 255 255 0 255 255 0 255 255 0 255
v_gt = 0 0 255 0 0 255 0 0 255 0 0 255 0 0 255 0
v_lt = 0 255 0 0 255 0 0 255 0 0 255 0 0 255 0 0
这篇关于SSE2内在函数 - 比较无符号整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!