数学的SSE矢量化“POW”功能GCC [英] SSE vectorization of math 'pow' function gcc

查看:292
本文介绍了数学的SSE矢量化“POW”功能GCC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图向量化包含在数学库使用了POW功能的循环。我知道英特​​尔编译器支持SSE指令使用'战俘'的 - 但我似乎无法得到它与海湾合作委员会(我认为)运行。这是我与工作的情况:

I was trying to vectorize a loop that contains the use of the 'pow' function in the math library. I am aware intel compiler supports use of 'pow' for sse instructions - but I can't seem to get it to run with gcc ( I think ). This is the case I am working with:

int main(){
        int i=0;
        float a[256],
        b[256];

        float x= 2.3;


        for  (i =0 ; i<256; i++){
                a[i]=1.5;
        }

        for (i=0; i<256; i++){
                b[i]=pow(a[i],x);
        }

        for (i=0; i<256; i++){
                b[i]=a[i]*a[i];
        }
    return 0;

}

我用下面的编译:

I'm compiling with the following:

gcc -O3 -Wall -ftree-vectorize -msse2 -ftree-vectorizer-verbose=5 code.c -o runthis

这是对使用gcc 4.2版OS X 10.5.8(我用4.5以及,无法判断它什么矢量 - 因为它没有任何输出)。看来,没有循环的矢量化 - 有一个排成一条直线的问题或其他一些问题,我需要T选用限制?如果我写的循环作为一个功能之一,我体验到更详细的输出(code):

This is on os X 10.5.8 using gcc version 4.2 (I used 4.5 as well and couldn't tell if it had vectorized anything - as it didn't output anything at all). It appears that none of the loops vectorize - is there an allignment issue or some other issue that I need t use restrict? If I write one of the loops as a function I get slightly more verbose output(code):

void pow2(float *a, float * b, int n) {
        int i;
        for (i=0; i<n; i++){
                b[i]=a[i]*a[i];
        }
}

输出(使用7级详细输出):

output (using level 7 verbose output):

note: not vectorized: can't determine dependence between *D.2878_13 and *D.2877_8
bad data dependence.

我看着 GCC自动矢量页面但剪掉帮助了。如果这是不可能的gcc版本是我在哪里能找到的资源做了战俘使用POW - 等价功能(我主要处理的整数次幂)

I looked at the gcc auto-vectorization page but that didnt' help to much. If it is not possible to use pow in the gcc version what where could I find the resource to do a pow - equivalent function (I'm mostly dealing with integer powers).

修改的,所以我只是挖成这样其他来源又是如何向量化这个?!

Edit so I was just digging into so other source- how did it vectorize this?!:

void array_op(double * d,int len,double value,void (*f)(double*,double*) ) { 
    for ( int i = 0; i < len; i++ ){
        f(&d[i],&value);
    }
};

相关的gcc的输出:

The relevant gcc output:

note: Profitability threshold is 3 loop iterations.

note: LOOP VECTORIZED.

现在好了,我不知所措 - 奇怪 - 'D'和'价值'是由一个函数,GCC不知道修改?也许我需要更多一点彻底地测试这一部分,以确保结果的矢量部分正确。还在寻找一个矢量数学库 - ?为什么没有任何开源的人

Well now I'm at a loss -- 'd' and 'value' are modified by a function that gcc doesn't know about - strange? Maybe I need to test this portion a little more thoroughly to make sure the results are correct for the vectorized portion. Still looking for a vectorized math library - why aren't there any open source ones?

推荐答案

使用 __限制或消费的输入写入到输出应该帮助之前(分配给本地变量)。

Using __restrict or consuming inputs (assigning to local vars) before writing outputs should help.

因为它是现在,编译器不能向量化,因为 A 可能别名 B ,这样算下来并行4乘和写回4个值可能不正确。

As it is now, the compiler cannot vectorize because a might alias b, so doing 4 multiplies in parallel and writing back 4 values might not be correct.

(注意: __限制将不能保证编译器向量化,但这么多,可以说,现在,它肯定不能的)

(Note that __restrict won't guarantee that the compiler vectorizes, but so much can be said that right now, it sure cannot).

这篇关于数学的SSE矢量化“POW”功能GCC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆