是否可以使用 PTEST 来测试两个寄存器是否都为零或其他条件? [英] Can PTEST be used to test if two registers are both zero or some other condition?
问题描述
你可以用 SSE4.1 ptest
做什么其他比测试单个寄存器是否全为零?
What can you do with SSE4.1 ptest
other than testing if a single register is all-zero?
您能否结合使用 SF 和 CF 来测试有关两个未知输入寄存器的任何有用信息?
Can you use a combination of SF and CF to test anything useful about two unknown input registers?
PTEST 有什么用?您认为检查打包比较的结果(如 PCMPEQD 或 CMPPS)会很好,但至少在英特尔 CPU 上,使用 PTEST + JCC 进行比较和分支比使用 PMOVMSK(B/PS/PD) + 宏融合 CMP 花费更多的 uops+JCC.
What is PTEST good for? You'd think it would be good for checking the result of a packed-compare (like PCMPEQD or CMPPS), but at least on Intel CPUs, it costs more uops to compare-and-branch using PTEST + JCC than with PMOVMSK(B/PS/PD) + macro-fused CMP+JCC.
推荐答案
不,除非我遗漏了一些聪明的东西,带有两个未知寄存器的 ptest
通常对于检查关于两者的某些属性没有用他们.(除了明显的东西,你已经想要一个按位与,比如两个位图之间的交集).
No, unless I'm missing something clever, ptest
with two unknown registers is generally not useful for checking some property about both of them. (Other than obvious stuff you'd already want a bitwise-AND for, like intersection between two bitmaps).
测试两个寄存器是否全为零,或将它们放在一起,然后针对自身进行 PTEST.
To test two registers for both being all-zero, OR them together and PTEST that against itself.
ptest xmm0, xmm1
产生两个结果:
- ZF = 是
xmm0 &xmm1
全零? - CF = 是
(~xmm0) &xmm1
全零?
- ZF = is
xmm0 & xmm1
all-zero? - CF = is
(~xmm0) & xmm1
all-zero?
如果第二个向量全为零,则标志完全不依赖于第一个向量中的位.
将全零"检查视为 AND 和 ANDNOT 结果的 NOT(bitwise horizontal-OR())
可能很有用.但可能不会,因为这对我的大脑来说太多了,无法轻松思考.垂直与然后水平或的序列确实可能让您更容易理解为什么 PTEST 没有告诉您很多关于两个未知寄存器的组合的信息,就像整数 TEST 指令一样.
It may be useful to think of the "is-all-zero" checks as a NOT(bitwise horizontal-OR())
of the AND and ANDNOT results. But probably not, because that's too many steps for my brain to think through easily. That sequence of vertical-AND and then horizontal-OR does maybe make it easier to understand why PTEST doesn't tell you much about a combination of two unknown registers, just like the integer TEST instruction.
这是 2 位 ptest a,mask
的真值表.希望这有助于考虑 128b 输入的 0 和 1 混合.
Here's a truth table for a 2-bit ptest a,mask
. Hopefully this helps in thinking about mixes of zeros and ones with 128b inputs.
注意CF(a,mask) == ZF(~a,mask)
.
a mask ZF CF
00 00 1 1
01 00 1 1
10 00 1 1
11 00 1 1
00 01 1 0
01 01 0 1
10 01 1 0
11 01 0 1
00 10 1 0
01 10 1 0
10 10 0 1
11 10 0 1
00 11 1 0
01 11 0 0
10 11 0 0
11 11 0 1
<小时>
英特尔的内在函数指南为它列出了 2 个有趣的内在函数.请注意 args 的命名:a
和 mask
是一个线索,它们告诉您 a
的部分由已知的 AND 掩码选择.
Intel's intrinsics guide lists 2 interesting intrinsics for it. Note the naming of the args: a
and mask
are a clue that they tell you about the parts of a
selected by a known AND-mask.
_mm_test_mix_ones_zeros (__m128i a, __m128i mask)
:返回(ZF == 0 && CF == 0)
_mm_test_all_zeros (__m128i a, __m128i mask)
:返回ZF
_mm_test_mix_ones_zeros (__m128i a, __m128i mask)
: returns(ZF == 0 && CF == 0)
_mm_test_all_zeros (__m128i a, __m128i mask)
: returnsZF
还有更简单命名的版本:
There's also the more simply-named versions:
int _mm_testc_si128 (__m128i a, __m128i b)
:返回CF
int _mm_testnzc_si128 (__m128i a, __m128i b)
:返回(ZF == 0 && CF == 0)
int _mm_testz_si128 (__m128i a, __m128i b)
:返回ZF
int _mm_testc_si128 (__m128i a, __m128i b)
: returnsCF
int _mm_testnzc_si128 (__m128i a, __m128i b)
: returns(ZF == 0 && CF == 0)
int _mm_testz_si128 (__m128i a, __m128i b)
: returnsZF
这些内在函数有 AVX2 __m256i
版本,但该指南仅列出了 __m128i
操作数的 all_zeros 和 mix_ones_zeros 备用名称版本.
There are AVX2 __m256i
versions of those intrinsics, but the guide only lists the all_zeros and mix_ones_zeros alternate-name versions for __m128i
operands.
如果你想从 C 或 C++ 测试其他一些条件,你应该使用 testc
和 testz
和相同的操作数,并希望你的编译器意识到它只是需要做一个 PTEST,甚至希望使用单个 JCC、SETCC 或 CMOVCC 来实现您的逻辑.(我建议检查 asm,至少对于您最关心的编译器.)
If you want to test some other condition from C or C++, you should use testc
and testz
with the same operands, and hope that your compiler realizes that it only needs to do one PTEST, and hopefully even use a single JCC, SETCC, or CMOVCC to implement your logic. (I'd recommend checking the asm, at least for the compiler you care about most.)
请注意,_mm_testz_si128(v, set1(0xff))
始终与 _mm_testz_si128(v,v)
相同,因为 AND 就是这样工作的.但 CF 结果并非如此.
Note that _mm_testz_si128(v, set1(0xff))
is always the same as _mm_testz_si128(v,v)
, because that's how AND works. But that's not true for the CF result.
您可以使用
You can check for a vector being all-ones using
bool is_all_ones = _mm_testc_si128(v, _mm_set1_epi8(0xff));
这可能并不比 PCMPEQB 对全 1 向量的速度更快,但代码量更小,然后是通常的 movemask + cmp.它并没有避免对向量常量的需要.
This is probably no faster, but smaller code-size, than a PCMPEQB against a vector of all-ones, then the usual movemask + cmp. It doesn't avoid the need for a vector constant.
PTEST 的优势在于它不会破坏任何输入操作数,即使没有 AVX.
PTEST does have the advantage that it doesn't destroy either input operand, even without AVX.
这篇关于是否可以使用 PTEST 来测试两个寄存器是否都为零或其他条件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!