检查如果两个SSE寄存器并不都为零,不破坏他们 [英] Checking if TWO SSE registers are not both zero without destroying them

查看:538
本文介绍了检查如果两个SSE寄存器并不都为零,不破坏他们的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想测试两个的 SSE 的寄存器并不都为零,不破坏它们。

I want to test if two SSE registers are not both zero without destroying them.

这是code我目前有:

This is the code I currently have:

uint8_t *src;  // Assume it is initialized and 16-byte aligned
__m128i xmm0, xmm1, xmm2;

xmm0 = _mm_load_si128((__m128i const*)&src[i]); // Need to preserve xmm0 & xmm1
xmm1 = _mm_load_si128((__m128i const*)&src[i+16]);
xmm2 = _mm_or_si128(xmm0, xmm1);
if (!_mm_testz_si128(xmm2, xmm2)) { // Test both are not zero
}

这是最好的方式(最多可使用SSE 4.2)?

Is this the best way (using up to SSE 4.2)?

推荐答案

我学到了一些东西,从这个问题非常有用。让我们先看看一些标量code

I learned something useful from this question. Let's first look at some scalar code

extern foo2(int x, int y);
void foo(int x, int y) {
    if((x || y)!=0) foo2(x,y);
}

编译此类似这样的 GCC -O3 -S -masm =英特尔test.c以和重要组件

 mov       eax, edi   ; edi = x, esi = y -> copy x into eax
 or        eax, esi   ; eax = x | y and set zero flag in FLAGS if zero
 jne       .L4        ; jump not zero

现在让我们来看看测试SIMD寄存器为零。不同于标量code没有SIMD标记注册。然而,SSE4.1有哪些可以设置零标志(和标志)的标量标志位寄存器SIMD测试说明。

Now let's look at testing SIMD registers for zero. Unlike scalar code there is no SIMD FLAGS register. However, with SSE4.1 there are SIMD test instructions which can set the zero flag (and carry flag) in the scalar FLAGS register.

extern foo2(__m128i x, __m128i y);
void foo(__m128i x, __m128i y) {
    __m128i z = _mm_or_si128(x,y);
    if (!_mm_testz_si128(z,z)) foo2(x,y);
}

C99 -msse4.1 -O3 -masm =英特尔-S test_SSE.c编译和重要组件

movdqa      xmm2, xmm0 ; xmm0 = x, xmm1 = y, copy x into xmm2
por         xmm2, xmm1 ; xmm2 = x | y
ptest       xmm2, xmm2 ; set zero flag if zero
jne         .L4        ; jump not zero 

请注意,这需要多一个指令,因为包装的逐位或不设置零标志。还要注意的是标版和SIMD版本都需要使用额外的寄存器( EAX 在标量情况和 XMM2 在SIMD情况)。 因此,要回答你的问题你目前的解决方案是你可以做的最好的。

Notice that this takes one more instruction because the packed bit-wise OR does not set the zero flag. Notice also that both the scalar version and the SIMD version need to use an additional register (eax in the scalar case and xmm2 in the SIMD case). So to answer your question your current solution is the best you can do.

但是,如果你没有使用SSE4.1处理器或更好的,你将不得不使用 _mm_movemask_epi8 另一种方法,只需要SSE2是使用 _mm_movemask_epi8

extern foo2(__m128i x, __m128i y);
void foo(__m128i x, __m128i y) {
    if (_mm_movemask_epi8(_mm_or_si128(x,y))) foo2(x,y);   
}

重要组件

movdqa      xmm2, xmm0
por         xmm2, xmm1
pmovmskb    eax, xmm2
test        eax, eax
jne         .L4

请注意,这需要多一个指令然后用SSE4.1 PTEST 指令。

到现在为止我已经使用了 pmovmaskb 指令,因为延迟是pre Sandy Bridge处理器比 PTEST 。不过,我之前的Haswell意识到了这一点。在Haswell的 pmovmaskb 的延迟时间比 PTEST 的延迟差。他们都有相同的吞吐量。但是,在这种情况下,这是不是真的很重要。什么是重要的(这是我以前没有意识到)是 pmovmaskb 不设置标志注册,因此它需要另一个指令。 所以,现在我将使用 PTEST 在我的关键循环。谢谢你的问题。

Until now I have been using the pmovmaskb instruction because the latency is better on pre Sandy Bridge processors than with ptest. However, I realized this before Haswell. On Haswell the latency of pmovmaskb is worse than the latency of ptest. They both have the same throughput. But in this case this is not really important. What's important (which I did not realize before) is that pmovmaskb does not set the FLAGS register and so it requires another instruction. So now I'll be using ptest in my critical loop. Thank you for your question.

编辑:由OP的建议,有一种方法可以在不使用其他SSE寄存器来完成

as suggested by the OP there is a way this can be done without using another SSE register.

extern foo2(__m128i x, __m128i y);
void foo(__m128i x, __m128i y) {
    if (_mm_movemask_epi8(x) | _mm_movemask_epi8(y)) foo2(x,y);    
}

从GCC相关的组件:

The relevant assembly from GCC is:

pmovmskb    eax, xmm0
pmovmskb    edx, xmm1
or          edx, eax
jne         .L4

而不是使用其他XMM寄存器这款采用双标量寄存器。

Instead of using another xmm register this uses two scalar registers.

需要注意的是更少的指令不一定意味着更好的性能。其中这些解决方案是最好的?你必须测试他们每个人找出来。

Note that fewer instructions does not necessarily mean better performance. Which of these solutions is best? You have to test each of them to find out.

这篇关于检查如果两个SSE寄存器并不都为零,不破坏他们的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆