什么是逻辑SSE内部函数之间的区别? [英] What's the difference between logical SSE intrinsics?

查看:300
本文介绍了什么是逻辑SSE内部函数之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有不同类型的逻辑上证所内部函数有什么区别?例如,如果我们采取或操作,有三个内部函数:_mm_or_ps,_mm_or_pd和_mm_or_si128所有这一切都做同样的事情:计算的按位的OR的操作数。我的问题:


  1. 有没有使用一种或另一种内在的(在适当的压铸类)之间的差异。会不会有像在一些具体情况更长的执行任何隐藏费用?


  2. 这些内在函数映射到三个不同的x86指令(POR,ORPS,orpd)。有没有人有英特尔为什么是浪费precious运code空间数指令,做同样的事情什么想法?



解决方案

我觉得这三个实际上是相同的,即128位的位操作。究其原因不同的形式存在,可能是历史的,但我不能肯定。我想这是的可能的,有可能是在浮点版本进行了一些额外的行为,例如当有NaN的,但是这是纯猜测。对于正常输入的指令似乎是可以互换的,例如

 的#include<&stdio.h中GT;
#包括LT&;&emmintrin.h GT;
#包括LT&;&pmmintrin.h GT;
#包括LT&;&xmmintrin.h GT;INT主要(无效)
{
    __m128i一个= _mm_set1_epi32(1);
    __m128i B = _mm_set1_epi32(2);
    __m128i C = _mm_or_si128(A,B);    __m128 X = _mm_set1_ps(1.25f);
    __m128 Y = _mm_set1_ps(1.5F);
    __m128 Z = _mm_or_ps(X,Y);    输出(A =%VLD,B =%VLD,C =%VLD \\ n,A,B,C);
    的printf(X =%VF,Y =%VF,Z =%VF \\ n,X,Y,Z);    C =(__m128i)_mm_or_ps((__ M128)一,(__m128)B);
    Z =(__m128)_mm_or_si128((__ m128i)X,(__m128i)Y);    输出(A =%VLD,B =%VLD,C =%VLD \\ n,A,B,C);
    的printf(X =%VF,Y =%VF,Z =%VF \\ n,X,Y,Z);    返回0;
}$ GCC -Wall -msse3 por.c -o POR$ ./por一个= 1 1 1 1,B = 2 2 2 2中,c = 3 3 3 3
X = 1.250000 1.250000 1.250000 1.250000,Y = 1.500000 1.500000 1.500000 1.500000,Z = 1.750000 1.750000 1.750000 1.750000
一个= 1 1 1 1,B = 2 2 2 2中,c = 3 3 3 3
X = 1.250000 1.250000 1.250000 1.250000,Y = 1.500000 1.500000 1.500000 1.500000,Z = 1.750000 1.750000 1.750000 1.750000

Is there any difference between logical SSE intrinsics for different types? For example if we take OR operation, there are three intrinsics: _mm_or_ps, _mm_or_pd and _mm_or_si128 all of which do the same thing: compute bitwise OR of their operands. My questions:

  1. Is there any difference between using one or another intrinsic (with appropriate type casting). Won't there be any hidden costs like longer execution in some specific situation?

  2. These intrinsics maps to three different x86 instructions (por, orps, orpd). Does anyone have any ideas why Intel is wasting precious opcode space for several instructions which do the same thing?

解决方案

I think all three are effectively the same, i.e. 128 bit bitwise operations. The reason different forms exist is probably historical, but I'm not certain. I guess it's possible that there may be some additional behaviour in the floating point versions, e.g. when there are NaNs, but this is pure guesswork. For normal inputs the instructions seem to be interchangeable, e.g.

#include <stdio.h>
#include <emmintrin.h>
#include <pmmintrin.h>
#include <xmmintrin.h>

int main(void)
{
    __m128i a = _mm_set1_epi32(1);
    __m128i b = _mm_set1_epi32(2);
    __m128i c = _mm_or_si128(a, b);

    __m128 x = _mm_set1_ps(1.25f);
    __m128 y = _mm_set1_ps(1.5f);
    __m128 z = _mm_or_ps(x, y);

    printf("a = %vld, b = %vld, c = %vld\n", a, b, c);
    printf("x = %vf, y = %vf, z = %vf\n", x, y, z);

    c = (__m128i)_mm_or_ps((__m128)a, (__m128)b);
    z = (__m128)_mm_or_si128((__m128i)x, (__m128i)y);

    printf("a = %vld, b = %vld, c = %vld\n", a, b, c);
    printf("x = %vf, y = %vf, z = %vf\n", x, y, z);

    return 0;
}

$ gcc -Wall -msse3 por.c -o por

$ ./por

a = 1 1 1 1, b = 2 2 2 2, c = 3 3 3 3
x = 1.250000 1.250000 1.250000 1.250000, y = 1.500000 1.500000 1.500000 1.500000, z = 1.750000 1.750000 1.750000 1.750000
a = 1 1 1 1, b = 2 2 2 2, c = 3 3 3 3
x = 1.250000 1.250000 1.250000 1.250000, y = 1.500000 1.500000 1.500000 1.500000, z = 1.750000 1.750000 1.750000 1.750000

这篇关于什么是逻辑SSE内部函数之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆