过度优化 [英] Over optimization

查看:62
本文介绍了过度优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当性能基准测试我自己的一个函数对抗

等效的运行时函数时,我发现用/ Ox就可以优化它的

自己的函数以便它在第一次循环后没有完全调用*。

< psuedocode>

#ifdef USE_MY_FUNC

#define testfunc( x)myfunc(x)

#else

#define testfunc(x)runtimefunc(x)

#endif

QPC(开始);

for(long i = 1; i< BIGNUMBER; i ++)

dummyvar = testfunc(argv [1]);

QPC(mid);

for(long i = 1; i< BIGNUMBER; i ++)

dummyvar = runtimefunc(argv [1]);

QPC(结束);

printf(" operation take%I64u \ n",mid - start);

printf(" control拿了%I64u \ n",结束 - 中间);

< / code>


我认为发生的事情是/ Ox on,知道这一点,第一个(BIGNUMBER - 1)循环完全消失了



唯一改变的是'假'等等......

我的困境是:

我不喜欢我想强迫它使用所有循环的值,比如

将它打印到屏幕上,因为这会大大淡化我的

函数的时间,即每个循环的大部分时间不会花在做被测试的

函数上,它会花在打印上。

我不想把调试编译参数打开,因为这会使它是完全愚蠢的,并且可能会产生一种错觉,即我自己的功能比标准实现更好,或许它不是什么时候我的程序

是在发布模式下编译的。


所以,我的问题是什么编译器设置是最好的,所以

编译器不是在优化方面完全愚蠢,但也不是绝对完全聪明的
。 (即尽可能接近发布

模式,给予公平的测试,但要敲掉一个(或多个)设置

允许它完全消除呼叫)

即拨打*我*告诉它要做的电话,但要尽可能快地完成这些电话。

。即我告诉它我想要它实际上*做*,而不是我想要最终结果是什么。


希望这样做感觉


谢谢

When performance benchmark testing one of my own functions against the
equivalent runtime function, I found that with /Ox on it optimized away its
own function so that it didn''t call it *at all* after the first loop.
<psuedocode>
#ifdef USE_MY_FUNC
#define testfunc(x) myfunc(x)
#else
#define testfunc(x) runtimefunc(x)
#endif
QPC(start);
for(long i = 1; i<BIGNUMBER; i++)
dummyvar = testfunc(argv[1]);
QPC(mid);
for(long i = 1; i<BIGNUMBER; i++)
dummyvar = runtimefunc(argv[1]);
QPC(end);
printf("operation took %I64u\n", mid - start);
printf("control took %I64u\n", end - mid);
</code>

I think what is happening is that with /Ox on, it''s completely done away
with the first (BIGNUMBER - 1) loops, knowing that the only thing that
changes is ''dummy'', etc....
my dilemma is that:
I don''t want to force it to "use" the value on all the loops, say by
printing it to the screen, as this would heavily dilute the time of my
function, i.e. most of the time of each loop wouldn''t be spent doing the
function being tested, it would be spent printing.
I don''t want to put debug compilation arguments on as this would make it
completely dumb and possibly create the illusion that my own function is
better than a standard implementation when perhaps it isn''t when my program
is compiled in release mode.

So, my question is what compiler settings is the best to put on so that the
compiler isn''t completely dumb in terms of optimization, but isn''t a
absolute complete smartass either. (i.e. as close as possible to release
mode, to give a fair test, but to knock off the one (or more) settings that
allow it to completely eliminate calls)
i.e. make the calls that *I''m* telling it to do, but to do them as fast as
possible. i.e. i''m telling it what I want it to actually *do*, not what I
want the end result to be.

Hope this makes sense

Thanks

推荐答案

Bonj写道:
当性能基准测试我自己的一个函数与
等效的运行时函数相比时,我发现用/ Ox对它进行了优化,使其自身功能完全没有调用它*第一个循环。


请注意,对于clc和clc ++,你都是*主题*(因为

它与实现相关而不是tan语言本身)。

查找并阅读相应的常见问题解答,然后再次在任一组中发布

(这是Usenet方式)。


那说,为什么不是以下内容:

< psuedocode>
#ifdef USE_MY_FUNC
#define testfunc(x)myfunc(x)
#否则
#define testfunc(x)runtimefunc(x)
#endif
QPC(开始);
for(long i = 1; i< BIGNUMBER; i ++)
dummyvar = testfunc(argv [1]);
dummyvar + = testfunc(argv [1]); QPC(mid);
for(long i = 1; i< BIGNUMBER; i ++)
dummyvar = runtimefunc(argv [1]);
dummyvar + = runtimefunc(argv [1]);


这将确保每次通过

循环调用该函数。 QPC(结束);
printf(操作花了%I64u \ n,中间 - 开始);
printf(控制花了%I64u \ n,结束 - 中间);
< / code>

我认为正在发生的事情是,使用/ Ox on,它完全消失了
与第一个(BIGNUMBER - 1)循环,知道
改变的唯一因素是'假'等等......
我的困境是:
我不想强迫它使用所有循环上的值,比如将其打印到屏幕上,因为这会大大淡化我的
函数的时间,即每个循环的大部分时间都不会用于执行
正在测试的功能,它将用于打印。
我不想把调试编译参数打开,因为这会使它完全愚蠢并可能产生我自己的功能的错觉当我的程序在发布模式下编译时,它可能比标准实现更好。

所以,我的问题是什么编译器设置是最好的这样,
编译器在优化方面并不是完全愚蠢,但也不是绝对完全聪明的。 (即尽可能接近释放
模式,进行公平测试,但要敲掉一个(或多个)设置,以便它完全消除呼叫)
即打电话给我*告诉它做,但要尽可能快地完成它们。即我告诉它我想要它实际上*做*,而不是我想要的最终结果。
When performance benchmark testing one of my own functions against the
equivalent runtime function, I found that with /Ox on it optimized away its
own function so that it didn''t call it *at all* after the first loop.
Please be advised that you''re *off topic* for both c.l.c and c.l.c++ (as
it has to do with an implementation rather tan either language itself).
Find and read the appropriate FAQs before posting in either group again
(it''s the Usenet way).

That said, why not the following:
<psuedocode>
#ifdef USE_MY_FUNC
#define testfunc(x) myfunc(x)
#else
#define testfunc(x) runtimefunc(x)
#endif
QPC(start);
for(long i = 1; i<BIGNUMBER; i++)
dummyvar = testfunc(argv[1]); dummyvar += testfunc(argv[1]); QPC(mid);
for(long i = 1; i<BIGNUMBER; i++)
dummyvar = runtimefunc(argv[1]); dummyvar += runtimefunc(argv[1]);

This would ensure that the function will be called each time through the
loop. QPC(end);
printf("operation took %I64u\n", mid - start);
printf("control took %I64u\n", end - mid);
</code>

I think what is happening is that with /Ox on, it''s completely done away
with the first (BIGNUMBER - 1) loops, knowing that the only thing that
changes is ''dummy'', etc....
my dilemma is that:
I don''t want to force it to "use" the value on all the loops, say by
printing it to the screen, as this would heavily dilute the time of my
function, i.e. most of the time of each loop wouldn''t be spent doing the
function being tested, it would be spent printing.
I don''t want to put debug compilation arguments on as this would make it
completely dumb and possibly create the illusion that my own function is
better than a standard implementation when perhaps it isn''t when my program
is compiled in release mode.

So, my question is what compiler settings is the best to put on so that the
compiler isn''t completely dumb in terms of optimization, but isn''t a
absolute complete smartass either. (i.e. as close as possible to release
mode, to give a fair test, but to knock off the one (or more) settings that
allow it to completely eliminate calls)
i.e. make the calls that *I''m* telling it to do, but to do them as fast as
possible. i.e. i''m telling it what I want it to actually *do*, not what I
want the end result to be.




HTH,< br $> b $ b - g

-

Artie Gold - 德克萨斯州奥斯汀
http://it-matters.blogspot.com (新帖子12/5)
http://www.cafepress.com/goldsays



HTH,
--ag
--
Artie Gold -- Austin, Texas
http://it-matters.blogspot.com (new post 12/5)
http://www.cafepress.com/goldsays


Bonj写道:
当性能基准测试我自己的一个函数与
等效的运行时函数相比时,我发现用/ Ox对它进行了优化,使其自身的功能得以优化在第一次循环之后它根本没有调用它*
[...]
When performance benchmark testing one of my own functions against the
equivalent runtime function, I found that with /Ox on it optimized away its
own function so that it didn''t call it *at all* after the first loop.
[...]




首先,请参阅Artie Gold'的关于时事性的评论

和他关于如何减少优化的建议。


其次,另一种经常打败聚合的方法ssive

优化器是从

时序线束中单独编译你的函数。当编译器处理时间

程序时,它不会看到你的功能因此无法b $ b检测到它没有副作用,不能(通常)内联

,等等。 />

第三,还有另一种技巧是让时机

程序使用函数指针来调用被测试的函数和b $ b测试的函数,并且根据信息设置指针的值

在编译时不可用。例如,


extern void func1(void);

extern void func2(void);

int main(int argc ,char ** argv){

void(* func)(void)=(argc == 1)? func1:func2;

/ *现在运行你的循环,每次调用`func''* /


最后,运行它通常是一个好主意在开始计时循环之前,测试过的

至少运行一次。

这有助于确保其代码页和任何数据

页面它引用在开始之前是内存驻留的;

任何分页或地址映射或MMU调整或其他仅在第一次调用时发生的
杂项将是

不太可能污染你的时间结果。


获得准确无偏的时间*和*

有用可能是一个令人惊讶的棘手商业。


-
Er ****** ***@sun.com



First, please see Artie Gold''s comments on topicality
and his suggestion on how to reduce the optimization.

Second, another method that often defeats aggressive
optimizers is to compile your function separately from the
timing harness. When the compiler processes the timing
program it does not "see" your function and thus can''t
detect that it has no side effects, can''t (usually) in-line
it, and so on.

Third, still another technique is to have the timing
program use a function pointer to call the functions being
tested, and to set the pointer''s value based on information
not available at compile time. For example,

extern void func1(void);
extern void func2(void);
int main(int argc, char **argv) {
void (*func)(void) = (argc == 1) ? func1 : func2;
/* now run your loop, calling `func'' each time */

Finally, it is often a good idea to run the tested
function at least once before you start the timing loop.
That helps to ensure that its code pages and whatever data
pages it references are memory-resident before you begin;
any paging or address-mapping or MMU adjustments or other
miscellany that occur only on the very first call will be
less likely to pollute your timing results.

Obtaining a timing that is accurate and unbiased *and*
useful can be a surprisingly tricky business.

--
Er*********@sun.com




" Artie Gold" < AR ******* @ austin.rr.com> skrev i meddelandet

news:32 ************* @ individual.net ...

"Artie Gold" <ar*******@austin.rr.com> skrev i meddelandet
news:32*************@individual.net...
Bonj写道:
当性能基准测试我自己的一个函数对等同的运行时函数时,我发现用/ Ox就可以优化掉它自己的函数,所以它没有调用它*所有*
在第一次循环之后。
When performance benchmark testing one of my own functions against
the equivalent runtime function, I found that with /Ox on it
optimized away its own function so that it didn''t call it *at all*
after the first loop.



请注意,你是关于clc和clc ++的主题* *(因为它必须做一个实现相当棕褐色的语言
本身)。找到并阅读相应的常见问题解答,然后再次在
组中发布(这是Usenet方式)。

那就是说,为什么不是以下内容:



Please be advised that you''re *off topic* for both c.l.c and c.l.c++
(as it has to do with an implementation rather tan either language
itself). Find and read the appropriate FAQs before posting in either
group again (it''s the Usenet way).

That said, why not the following:

< psuedocode>
#ifdef USE_MY_FUNC
#define testfunc(x)myfunc(x)
#else
#define testfunc(x)runtimefunc(x)
#endif
QPC(开始);
for(long i = 1; i< BIGNUMBER; i ++)
dummyvar = testfunc(argv [1]);
<psuedocode>
#ifdef USE_MY_FUNC
#define testfunc(x) myfunc(x)
#else
#define testfunc(x) runtimefunc(x)
#endif
QPC(start);
for(long i = 1; i<BIGNUMBER; i++)
dummyvar = testfunc(argv[1]);


dummyvar + = testfunc(argv [1]);


dummyvar += testfunc(argv[1]);

QPC(mid);
for(long i = 1; i< BIGNUMBER; i ++)
dummyvar = runtimefunc(argv [1]);
QPC(mid);
for(long i = 1; i<BIGNUMBER; i++)
dummyvar = runtimefunc(argv[1]);


dummyvar + = runtimefunc(argv [1]);

这将确保每次通过
循环。


dummyvar += runtimefunc(argv[1]);

This would ensure that the function will be called each time through
the loop.




不,你不能确定。


如果编译器足够聪明,可以实现那个runtimefunc总是

返回相同的结果,它也可能足够聪明地相乘那个

结果来自BIGNUMBER。

Bo Persson



No, you can''t be sure.

If the compiler is smart enough to realize that runtimefunc always
returns the same result, it might also be smart enough to multiply that
result by BIGNUMBER.
Bo Persson


这篇关于过度优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆