超级基本的效率问题 [英] Superbasic efficiency question

查看:87
本文介绍了超级基本的效率问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究科学计算应用程序。我需要一个名为Element的类
,它只不过是一个整数的集合,或者是

节点。并且只有方法int getNode(int i)。


我想以最有效的方式实现。所以我

召集了我的编程智慧并问自己:我想要

有成员如int a,b,c,d,e或单个成员如此as int

a [5]。所以我编写了以下片段并用-O3标志编译它:


int main(char * argv [],int argc){


/ *

int a,b,c,d,e;

for(int i = 0; i< 1000000000; i ++){

a = 1;

b = 2;

c = 3;

d = 4;

e = 5 ;

}


* /

int a [5];

for(int i = 0; i <1000000000; i ++){

a [0] = 1;

a [1] = 2;

a [2] = 3;

a [3] = 4;

a [4] = 5;

}


返回0;

}


第一个(注释掉的)版本的运行速度是原来的两倍。 (对于双打

而不是整数,它是4的因子)。所以我的简单部分

认为这回答了我的问题。剩下的部分告诉我,

从来没有那么简单。最后,我愤世嫉俗的部分认为它无关紧要,而且程序的其他部分必然会耗费更多时间。


请帮我解决内部斗争。

非常感谢提前!


Aaron Fude

解决方案
亚伦,


上的玩具代码段性能不是信息的任何更

比玩具代码片段上的表现更好。

请帮我解决内部斗争。




10写下尽可能以最清晰,最易维护的方式编写代码。

20在分析器下运行代码。

30优化实际性能瓶颈(相反,比如,

想象中的性能瓶颈)。

40 GOTO 20


如果您对30有任何问题,请回到小组。在那之前,

你和我们都只是在黑暗中拍摄。


Joseph

" Joseph Turian" <涂**** @ gmail.com>在消息中写道

news:11 ********************** @ c13g2000cwb.googlegr oups.com ...

亚伦,在玩具代码段

性能不是信息的任何更
比在玩具代码段的性能。
请帮助我解决内部斗争。
10尽可能以最清晰,最易于维护的方式编写代码。
20在分析器下运行代码。
30优化实际性能瓶颈(相对于,比如说,
假想的性能瓶颈)。




优化可能需要进行重大更改,甚至是主要的设计更改。如果你从一开始就知道速度很重要,那么请从

开始考虑它。玩具代码片段可以帮助您尽早确定需要如何处理某些事情。

40 GOTO 20

如果你有任何问题30,回到小组。在那之前,你和我们都只是在黑暗中拍摄。




DW


aa*******@gmail.com 写道:

我''我致力于科学计算应用。我需要一个名为Element的类,它只不过是一个整数集合,或者是节点。并且只有方法int getNode(int i)。

我想以最高效的方式实现。所以我召集了我的编程智慧,并问自己:我是否想要成员如int a,b,c,d,e或单个成员如int
a [ 5]。所以我编写了以下代码片段并用-O3标志编译它:

int main(char * argv [],int argc){

/ *
int a,b,c,d,e;
for(int i = 0; i< 1000000000; i ++){
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}

* /

int a [5];
for(int i = 0; i< 1000000000; i ++){
a [0] = 1;
a [1] = 2;
a [2] = 3 ;
a [3] = 4;
a [4] = 5;
}

返回0;
}
认为这回答了我的问题。剩下的部分告诉我,它永远不会那么简单。最后,我的愤世嫉俗的一部分认为它无关紧要,程序的其他部分必然会耗费更多的时间。



编译器很可能重新安排你的代码


int a,b,c,d,e;

a = 1;

b = 2;

c = 3;

d = 4;

e = 5;

for(int i = 0; i< 1000000000; i ++){

}


或者,编译器可能会将值放在寄存器中。


有一个更深层次的设计问题。


这些值真的相关吗?你在

串联中对它们进行操作吗?您是否认为使用指向成员的指针编写

模板函数可能会很有趣?其中一个值?


如果它们是真正独立的,我会使用5个单独的值。那个

的方式会更难以遇到其他问题,比如过去阵列

界限或使用错误索引等问题。


无论如何,下面是编译器无法(轻松)进行上面的

优化的示例。结果与以下内容完全相同:

gcc版本4.0.0 20050102(实验性)

#include< ostream>

#include< iostream>


struct X

{

virtual void F()= 0; //很难让编译器优化这个

};


struct A

{

int a,b,c,d,e;

};


struct B

{

int a [5];

};

struct Av

:A,X

{

Av()

:A()

{

}


virtual void F()

{

a = 1;

b = 2;

c = 3;

d = 4;

e = 5;

}

};


struct Bv

:B,X

{

Bv()

:B()

{

}


虚拟空虚F()

{

a [ 0] = 1;

a [1] = 2;

a [2] = 3;

a [3] = 4;

a [4] = 5;

a [5] = 6;

}

};


int main(int argc,char ** argv)

{

X * x;

if(argc> ; = 2)

{

std :: cout<< 制作A \ n;

x =新Av;

}

其他

{

std :: cout<< 制作一个B'h;

x =新的Bv;

}


for(int i = 0; i< 1000000000; i ++)

{

x-> F();

}


}


I''m working on a scientific computing application. I need a class
called Element which is no more than a collection of integers, or
"nodes" and has only on method int getNode(int i).

I would like to implement in the most efficient was possible. So I
summoned up my programming intellect and asked myself: Do I want to
have members such as int a, b, c, d, e or a single member such as int
a[5]. So I wrote the following snippet and compiled it with a -O3 flag:

int main(char *argv[], int argc) {

/*
int a, b, c, d, e;
for (int i = 0; i < 1000000000; i++) {
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}

*/
int a[5];
for (int i = 0; i < 1000000000; i++) {
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
}

return 0;
}

The first (commented out) version ran twice as fast. (For doubles
instead of ints, it was a factor of 4). So the simpleton part of me
thinks that that answers my question. The remaining part tells me that
it is never that simple. Finally, the cynical part of me thinks that it
all doesn''t matter and other parts of the program are bound to be far
more time consuming.

Please help me resolve my internal struggle.
Many thanks in advance!

Aaron Fude

解决方案

Aaron,

Performance on the toy code snippet is not informative of anything more
than performance on the toy code snippet.

Please help me resolve my internal struggle.



10 Write the code the most clear, maintable way possible.
20 Run the code under a profiler.
30 Optimize the actual performance bottleneck (as opposed to, say,
imaginary performance bottlenecks).
40 GOTO 20

If you have any problems with 30, get back to the group. Until then,
both you and us are just shooting in the dark.

Joseph


"Joseph Turian" <tu****@gmail.com> wrote in message
news:11**********************@c13g2000cwb.googlegr oups.com...

Aaron,

Performance on the toy code snippet is not informative of anything more
than performance on the toy code snippet.

Please help me resolve my internal struggle.
10 Write the code the most clear, maintable way possible.
20 Run the code under a profiler.
30 Optimize the actual performance bottleneck (as opposed to, say,
imaginary performance bottlenecks).



Optimizing might require significant changes, even major design changes. If
you know from the outset that speed is important, then consider it from the
outset. Toy code snippets can help you determine early on how you''ll need to
go about doing certain things.
40 GOTO 20

If you have any problems with 30, get back to the group. Until then,
both you and us are just shooting in the dark.



DW


aa*******@gmail.com wrote:

I''m working on a scientific computing application. I need a class
called Element which is no more than a collection of integers, or
"nodes" and has only on method int getNode(int i).

I would like to implement in the most efficient was possible. So I
summoned up my programming intellect and asked myself: Do I want to
have members such as int a, b, c, d, e or a single member such as int
a[5]. So I wrote the following snippet and compiled it with a -O3 flag:

int main(char *argv[], int argc) {

/*
int a, b, c, d, e;
for (int i = 0; i < 1000000000; i++) {
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}

*/
int a[5];
for (int i = 0; i < 1000000000; i++) {
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
}

return 0;
}

The first (commented out) version ran twice as fast. (For doubles
instead of ints, it was a factor of 4). So the simpleton part of me
thinks that that answers my question. The remaining part tells me that
it is never that simple. Finally, the cynical part of me thinks that it
all doesn''t matter and other parts of the program are bound to be far
more time consuming.



It is more than likely that the compiler re-arranged your code

int a, b, c, d, e;
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
for (int i = 0; i < 1000000000; i++) {
}

Or, perhaps the compiler placed the values in registers.

There is a deeper design question for you.

Are these values really related ? Do you do operations on them in
tandem ? Would you ever think that it might be interesting to write a
template function with a "pointer to member" of one of these values ?

I would go with the 5 separate values if they are truly separate. That
way it will be harder to run into other problems like going past array
bounds or issues with using the wrong index etc.

Anyhow, below is an example where the compiler can''t (easily) make the
optimization above. The results are essentially identical with:
gcc version 4.0.0 20050102 (experimental)
#include <ostream>
#include <iostream>

struct X
{
virtual void F() = 0; // hard for compiler to optimize this
};

struct A
{
int a, b, c, d, e;
};

struct B
{
int a[5];
};
struct Av
: A, X
{
Av()
: A()
{
}

virtual void F()
{
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}
};

struct Bv
: B, X
{
Bv()
: B()
{
}

virtual void F()
{
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
a[5] = 6;
}
};

int main( int argc, char ** argv )
{
X * x;
if ( argc >= 2 )
{
std::cout << "Making an A\n";
x = new Av;
}
else
{
std::cout << "Making a B\n";
x = new Bv;
}

for (int i = 0; i < 1000000000; i++)
{
x->F();
}

}


这篇关于超级基本的效率问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆