SSE半载(_mm_loadh_pi/_mm_loadl_pi)发出警告 [英] SSE half loads (_mm_loadh_pi / _mm_loadl_pi) issue warnings

查看:62
本文介绍了SSE半载(_mm_loadh_pi/_mm_loadl_pi)发出警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从英特尔网站借来了矩阵求逆算法: http://download.intel.com/design/PentiumIII/sml/24504301.pdf

I have borrowed a matrix inversion algorithm from Intel website: http://download.intel.com/design/PentiumIII/sml/24504301.pdf

它使用_mm_loadh_pi和_mm_loadl_pi加载4x4矩阵系数并同时进行部分混洗.我的应用程序的性能改善非常显着,如果我使用_mm_load_ps对矩阵进行经典的加载/随机播放,则速度会稍慢一些.

It uses _mm_loadh_pi and _mm_loadl_pi to load the 4x4 matrix coefficients and do a partial shuffling at the same time. The performance improvement in my app is significant, and if I do a classic load/shuffle of the matrix using _mm_load_ps, it's slightly slower.

但是这种加载方法会发出编译警告:"tmp1在此函数中未初始化使用"

But this load approach issues compilation warnings : "tmp1 is used uninitialized in this function"

__m128 tmp1;
tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)), (__m64*)(src+ 4));

从某种意义上讲,这是因为tmp1是_mm_loadl_pi的输入参数,并且会影响结果.

Which makes sense in a way, since tmp1 is an input parameter of _mm_loadl_pi, and affects the result.

但是,详细查看代码的作用表明tmp1不需要初始化.并且初始化会稍微减慢代码的速度(可测量).

However, looking in details to what the code does shows that tmp1 needs no initialization. And initialization slightly slows down the code (it's measurable).

您是否有可能在无需初始化tmp1的情况下以可移植的方式删除警告?

Do you have any idea on how to remove the warning, if possible in a portable way, without having to initialize tmp1?

推荐答案

我尝试了3种编译器:MS Visual Studio 2012,gcc481和Intel icl 13.1.正如您所指出的,它们都发出警告.我发现gcc和MS都会自动为tmp1生成初始化代码,即使它们警告缺少初始化. MS编译器生成不良的内存访问:movaps xmm0,xmmword ptr [rsp]. Gcc生成更有效的xorps xmm0,xmm0.因此,对于gcc,添加tmp1=_mm_setzero_ps()可以消除警告,并生成与不使用时完全相同的代码.对于MS,添加tmp1=_mm_setzero_ps()可使代码更短甚至更快.只有Intel编译器足够聪明,可以避免不必要的初始化.这是MS和gcc编译器可能的解决方法:

I tried 3 compilers: MS Visual Studio 2012, gcc481, and Intel icl 13.1. They all warn as you point out. I found that both gcc and MS automatically generate initialization code for tmp1, even as they warn about lack of initialization. The MS compiler generates an undesirable memory access: movaps xmm0,xmmword ptr [rsp]. Gcc generates a more efficient xorps xmm0,xmm0. So in the case of gcc, adding tmp1=_mm_setzero_ps() eliminates the warning and produces exactly the same code as without. In the case of MS, adding tmp1=_mm_setzero_ps() makes the code shorter and probably faster. Only the Intel compiler is smart enough to avoid the unneeded initialization. Here is a possible workaround for MS and gcc compilers:

    __m128 tmp1 = _mm_loadh_pi(_mm_load_ps (src), (__m64*)(src + 4));

代码生成为:

movaps      xmm0,xmmword ptr [rcx]
movhps      xmm0,qword ptr [rcx+10h]

它看起来更短,但是应该进行基准测试以确保它更快.

It looks shorter, but should be benchmarked to make sure it is faster.

2013/9/12:各种警告抑制方法的测试代码:

09/12/2013: test code for different warning suppression ideas:

#include <xmmintrin.h>
#include <stdint.h>
#include <stdio.h>

//---------------------------------------------------------------------------
// original code from http://download.intel.com/design/PentiumIII/sml/24504301.pdf
__m128 func1 (float *src)
    {
    __m128 tmp1;
    tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)), (__m64*)(src+ 4));
    return tmp1;
    }

//---------------------------------------------------------------------------
// original code plus tmp1 initialization
__m128 func2 (float *src)
    {
    __m128 tmp1 = _mm_loadh_pi(_mm_loadl_pi (_mm_setzero_ps (), (__m64*)(src)), (__m64*)(src + 4));
    return tmp1;
    }

//---------------------------------------------------------------------------
// use redundant load to eliminate warning 
__m128 func3 (float *src)
    {
    __m128 tmp1 = _mm_loadh_pi(_mm_load_ps (src), (__m64*)(src + 4));
    return tmp1;
    }

//---------------------------------------------------------------------------

static void dump (void *data)
    {
    float *f16 = data;
    int index;

    for (index = 0; index < 4; index++)
        printf ("%g ", f16 [index]);
    printf ("\n");
    }

//---------------------------------------------------------------------------

int main (void)
    {
    float f [8] = {1, 2, 3, 4, 5, 6, 7, 8};
    __m128 tmp;

    tmp = func1 (f);
    dump (&tmp);
    tmp = func2 (f);
    dump (&tmp);
    tmp = func3 (f);
    dump (&tmp);
    return 0;
    }

构建命令:

gcc  -O3 -Wall -Wfatal-errors sample.c -osample.exe
objdump -Mintel --disassemble sample.exe > disasm.txt

cl -Ox -Zi -W4 sample.c
dumpbin -disasm -symbols sample.exe > disasm.txt

icl -Ox -Zi sample.c                                           
dumpbin -disasm -symbols sample.exe > disasm.txt                  

这篇关于SSE半载(_mm_loadh_pi/_mm_loadl_pi)发出警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆