_mm_testc_ps和_mm_testc_pd与_mm_testc_si128 [英] _mm_testc_ps and _mm_testc_pd vs _mm_testc_si128

查看：82 发布时间：2020/9/15 5:38:01 c x86 simd avx sse4

本文介绍了_mm_testc_ps和_mm_testc_pd与_mm_testc_si128的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您知道，前两个是特定于AVX的内部函数，第二个是SSE4.1内部函数.两组内在函数都可用于检查2个浮点向量的相等性.我的特定用例是:

As you know, the first two are AVX-specific intrinsics and the second is a SSE4.1 intrinsic. Both sets of intrinsics can be used to check for equality of 2 floating-point vectors. My specific use case is:

_mm_cmpeq_ps或_mm_cmpeq_pd，然后是
_mm_testc_ps或_mm_testc_pd，并带有适当的掩码

_mm_cmpeq_ps or _mm_cmpeq_pd, followed by
_mm_testc_ps or _mm_testc_pd on the result, with an appropriate mask

但是AVX为旧式"内在函数提供了等效项，因此在将结果转换为__m128i之后，我也许可以使用_mm_testc_si128.我的问题是，这两个用例中的哪一个可带来更好的性能，在哪里可以找到AVX提供的哪些旧版SSE指令.

But AVX provides equivalents for "legacy" intrinsics, so I might be able to use _mm_testc_si128, after a cast of the result to __m128i. My questions are, which of the two use cases results in better performance and where I can find out what legacy SSE instructions are provided by AVX.

推荐答案

糟糕，我没有仔细阅读问题.您正在谈论在cmpeqps之后使用它们.如果您已经有口罩，它们总是比movmskps / test慢. cmpps/ptest / jcc是4 oups. cmpps/movmskps eax, xmm0/test eax,eax/jnz是3 oups. (将test/jnz融合到单个uop中).另外，这些指令都不是多指令的，因此也没有解码瓶颈.

Oops, I didn't read the question carefully. You're talking about using these after a cmpeqps. They're always slower than movmskps / test if you already have a mask. cmpps / ptest / jcc is 4 uops. cmpps / movmskps eax, xmm0 / test eax,eax / jnz is 3 uops. (test/jnz fuse into a single uop). Also, none of the instructions are multi-uop, so no decode bottlenecks.

只有在可以充分利用AND或ANDN操作来避免较早执行步骤时，才使用ptest/vtestps/pd.在比较ptest与替代方法之前，我已经发布了答案.我想我确实找到了一次ptest获胜的案例，但是很难使用.是的，找到了它:有人想要一个FP比较对于NaN == NaN 是正确的.这是我发现ptest的进位标志结果的唯一机会之一.

Only use ptest / vtestps/pd when you can take full advantage of the AND or ANDN operation to avoid an earlier step. I've posted answers before where I compared ptest vs. an alternative. I think I did find one case once where ptest was a win, but it's hard to use. Yup, found it: someone wanted an FP compare that was true for NaN == NaN. It's one of the only times I've ever found a use for the carry flag result of ptest.

如果比较结果的高位元素是垃圾"，那么您仍然可以使用movmskps廉价地忽略它:

If the high element of a compare result is "garbage", then you can still ignore it cheaply with movmskps:

_mm_movemask_ps(vec) & 0b0111 == 0  // tests for none of the first three being true

这是完全免费的. x86 test指令的工作方式与ptest十分相似:您可以将其与立即掩码一起使用，而不是针对自身测试寄存器. (实际上，这花费很小的钱:机器代码多一个字节，因为test eax, 3比test eax, eax长一个字节，但它们的运行方式相同.)

This is totally free. The x86 test instruction works a lot like ptest: You can use it with an immediate mask instead of to test a register against itself. (It actually has a tiny cost: one extra byte of machine code, because test eax, 3 is one byte longer than test eax, eax, but they run identically.).

请参见 x86 Wiki的问题，以获取指南链接(Agner Fog的指南非常适合在指导级别进行性能分析).有每个旧版SSE指令的AVX版本，但有些只有128位宽.它们都获得一个额外的操作数(因此dest不必是src reg之一)，这节省了mov指令来复制寄存器.

See the x86 wiki for links to guides (Agner Fog's guide is good for perf analysis at the instruction level). There's an AVX version of every legacy SSE instruction, but some are only 128 bits wide. They all get an extra operand (so the dest doesn't have to be one of the src regs), which saves on mov instructions to copy registers.

回答您没有提出的问题:

Answer to a question you didn't ask:

_mm_testc_ps和_mm_testc_si128都不能用于比较浮点数是否相等. vtestps类似于ptest，但仅对每个float元素的符号位起作用.

Neither _mm_testc_ps nor _mm_testc_si128 can be used to compare floats for equality. vtestps is like ptest, but only operates on the sign bits of each float element.

它们都计算(~x) & y(在符号位或完整寄存器上)，这并不能告诉您它们是否相等，甚至符号位是否相等.

They all compute (~x) & y (on sign bits or on the full register), which doesn't tell you whether they're equal, or even whether the sign bits are equal.

请注意，即使检查浮点数的位相等性(使用pcmpeqd)也不同于cmpeqps(实现C的==运算符)，因为-0.0并不按位等于0.0 .两个按位相同的NaN不相等.如果一个或两个操作数均为NaN，则比较是无序的(这意味着不相等).

Note that even checking for bitwise equality of floats (with pcmpeqd) isn't the same as cmpeqps (which implements C's == operator), because -0.0 isn't bitwise equal to 0.0. And two bitwise-identical NaNs aren't equal to each other. The comparison is unordered (which means not equal) if either or both operand is NaN.

这篇关于_mm_testc_ps和_mm_testc_pd与_mm_testc_si128的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

_mm_testc_ps和_mm_testc_pd与_mm_testc_si128 [英] _mm_testc_ps and _mm_testc_pd vs _mm_testc_si128

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

_mm_testc_ps和_mm_testc_pd与_mm_testc_si128 [英] _mm_testc_ps and _mm_testc_pd vs _mm_testc_si128

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭