多少开销可以-fPIC标志补充的吗? [英] How much overhead can the -fPIC flag add?

查看:164
本文介绍了多少开销可以-fPIC标志补充的吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我测试一个简单的code,它计算曼德尔布罗分形。我已被被检查它根据在检查某点属​​于曼德尔布罗设置或不函数迭代次数性能。
令人惊讶的是,我加入-fPIC标志后得到的时间有很大的区别。从我读的开销通常是可以忽略的,我碰到的最高开销约为6%。我是30%左右。任何意见将是AP preciated!

I am testing a simple code which calculates Mandelbrot fractal. I have been checking it's performance depending on the number of iterations in the function that checks if a point belongs to the Mandelbrot set or not. The surprising thing is that I am getting a big difference in times after adding the -fPIC flag. From what I read the overhead is usually negligible and the highest overhead I came across was about 6%. My is around 30%. Any advice will be appreciated!

我用的是-O3标志,GCC 4.7.2,Ubuntu的12.04.2,x86_64的。
结果看起来如下

I use the -O3 flag, gcc 4.7.2, Ubuntu 12.04.2, x86_64. The results look as follow


    #iter     C (fPIC)  C       C/C(fPIC)
    1         0.01      0.01    1.00 
    100       0.04      0.03    0.75 
    200       0.06      0.04    0.67 
    500       0.15      0.1     0.67 
    1000      0.28      0.19    0.68
    2000      0.56      0.37    0.66 
    4000      1.11      0.72    0.65 
    8000      2.21      1.47    0.67
   16000      4.42      2.88    0.65 
   32000      8.8       5.77    0.66 
   64000      17.6      11.53   0.66

命令我使用的:

gcc -O3 -fPIC fractalMain.c fractal.c -o ffpic
gcc -O3 fractalMain.c fractal.c -o f

code:fractalMain.c

#include <time.h>
#include <stdio.h>
#include <stdbool.h>
#include "fractal.h"

int main()
{
    int iterNumber[] = {1, 100, 200, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000};
    int it;
    for(it = 0; it < 11; ++it)
    {
        clock_t start = clock();
        fractal(iterNumber[it]);
        clock_t end = clock();
        double millis = (end - start)*1000 / CLOCKS_PER_SEC/(double)1000;
        printf("Iter: %d, time: %lf \n", iterNumber[it], millis);
    }
    return 0;
}

code:fractal.h

#ifndef FRACTAL_H
#define FRACTAL_H
    void fractal(int iter);
#endif

code:fractal.c

#include <stdio.h>
#include <stdbool.h>
#include "fractal.h"

void multiplyComplex(double a_re, double a_im, double b_re, double b_im, double* res_re, double* res_im)
{
    *res_re = a_re*b_re - a_im*b_im;
    *res_im = a_re*b_im + a_im*b_re;
}

void sqComplex(double a_re, double a_im, double* res_re, double* res_im)
{
    multiplyComplex(a_re, a_im, a_re, a_im, res_re, res_im);
} 

bool isInSet(double P_re, double P_im, double C_re, double C_im, int iter)
{
    double zPrev_re = P_re;
    double zPrev_im = P_im;
    double zNext_re = 0;
    double zNext_im = 0;
    double* p_zNext_re = &zNext_re;
    double* p_zNext_im = &zNext_im;
    int i;  
    for(i = 1; i <= iter; ++i)
    {
        sqComplex(zPrev_re, zPrev_im, p_zNext_re, p_zNext_im);
        zNext_re = zNext_re + C_re;
        zNext_im = zNext_im + C_im;
        if(zNext_re*zNext_re+zNext_im*zNext_im > 4)
        {
            return false;
        }
        zPrev_re = zNext_re;
        zPrev_im = zNext_im;
    }
    return true;
}

bool isMandelbrot(double P_re, double P_im, int iter)
{
    return isInSet(0, 0, P_re, P_im, iter);
}
void fractal(int iter)
{
    int noIterations = iter;
    double xMin = -1.8;
    double xMax = 1.6;
    double yMin = -1.3;
    double yMax = 0.8;
    int xDim = 512;
    int yDim = 384;
    double P_re, P_im;
    int nop;
    int x, y;

    for(x = 0; x < xDim; ++x)
        for(y = 0; y < yDim; ++y)
        {
            P_re = (double)x*(xMax-xMin)/(double)xDim+xMin;
            P_im = (double)y*(yMax-yMin)/(double)yDim+yMin;
            if(isMandelbrot(P_re, P_im, noIterations))
                nop = x+y;
        }
        printf("%d", nop);
}

比较后面的故事

这可能看起来有点人工构建可执行文件时(按照一个评论)添加-fPIC标志。解释这样几句话:第一,我只编译的程序为可执行文件,并想比较我的Lua code,它要求从C的isMandelbrot功能,所以我创建了一个共享对象从Lua调用它 - 并且有大时间差。但不明白为什么他们与迭代次数越来越多。最终发现,这是因为-fPIC的。当我创建这就要求我的LUA脚本一个小C程序(所以我有效地做同样的事情,只是不需要的.so) - 时代非常相似,C(无-fPIC)。所以我在一些配置在过去的几天里检查,它一再表明两组非常相似的结果:没有更快-fPIC和更慢。

Story behind the comparison

It might look a bit artificial to add the -fPIC flag when building executable (as per one of the comments). So a few words of explanation: first I only compiled the program as executable and wanted to compare to my Lua code, which calls the isMandelbrot function from C. So I created a shared object to call it from lua - and had big time differences. But couldn't understand why they were growing with number of iterations. In the end found out that it was because of the -fPIC. When I create a little c program which calls my lua script (so effectively I do the same thing, only don't need the .so) - the times are very similar to C (without -fPIC). So I have checked it in a few configurations over the last few days and it consistently shows two sets of very similar results: faster without -fPIC and slower with it.

推荐答案

原来,当你编译没有 -fPIC 选项 multiplyComplex sqComplex isInSet isMandelbrot 是由编译器自动内联。如果定义这些功能为静态与 -fPIC 编译时,因为编译器将免费进行内联,你可能会得到相同的性能。

It turns out that when you compile without the -fPIC option multiplyComplex, sqComplex, isInSet and isMandelbrot are inlined automatically by the compiler. If you define those functions as static you will likely get the same performance when compiling with -fPIC because the compiler will be free to perform inlining.

之所以编译器无法自动内联辅助函数有符号插入做。位置无关code需要通过全局偏移表间接地访问所有的全局数据,即。非常相同的限制适用于函数调用,它必须要经过过程链接表。由于符号可能会被另外一个在运行时得到插入(见<一href=\"http://stackoverflow.com/questions/426230/what-is-the-ld-$p$pload-trick\"><$c$c>LD_$p$pLOAD),编译器不能简单地认为它是安全的内联具有全球知名度的功能。

The reason why the compiler is unable to automatically inline the helper functions has to do with symbol interposition. Position independent code is required to access all global data indirectly, i.e. through the global offset table. The very same constraint applies to function calls, which have to go through the procedure linkage table. Since a symbol might get interposed by another one at runtime (see LD_PRELOAD), the compiler cannot simply assume that it is safe to inline a function with global visibility.

如果您编译没有 -fPIC 非常相同的假设可以进行的,也就是说,编译器可以安全地假设,在可执行文件中定义的全局符号不能插入,因为查找范围始于可执行文件本身,然后后面的所有其他库,包括preloaded的。

The very same assumption can be made if you compile without -fPIC, i.e. the compiler can safely assume that a global symbol defined in the executable cannot be interposed because the lookup scope begins with the executable itself which is then followed by all other libraries, including the preloaded ones.

有关更透彻地了解看看下面的

For a more thorough understanding have a look at the following paper.

这篇关于多少开销可以-fPIC标志补充的吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆