更快的()循环? [英] Faster for() loops?

查看:65
本文介绍了更快的()循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hi Folks, http://www.abarnett。 demon.co.uk/tutorial.html#FASTFOR Page

状态:for(i = 0; i< 10; i ++){...} i遍历值

0,1,2,3,4,5,6,7,8,9如果您不关心循环计数器的顺序,

你可以改为:for(i = 10; i--;){...}使用此代码,我通过值9,8,7,6,5,4,3循环

,2,1,0,循环应该更快。这个

有效,因为处理i - 的速度更快。作为测试条件,

表示我是非零?如果是这样,则减少并继续。对于原始的

代码,处理器必须计算从10中减去i。结果

非零吗?如果是这样,增加i并继续。在紧密循环中,这会产生相当大的差异。

在现代优化编译器的基础上,它有多远?并且

它会对嵌入式系统产生重大影响???


谢谢,

-Neo

你真的认为,你认为真实的是真的吗?

解决方案




Neo写道:

[...]在紧密循环中,这会产生一个

相当大的差异。
它有多远..根据现代优化编译器?它会对嵌入式系统产生重大影响吗?




没有什么比测试理论的实验了。我刚试过

AVRGCC


void countDown(void){

int i;

for(i = 10; i!= 0; i--)doSomething();

}

void countUp(void){

int i;

for(i = 0; i< 10; i ++)doSomething();

}


生成代码是


000000ce< countDown>:

}


void countDown(void){

ce:cf 93 push r28

d0:df 93 push r29

int i;

for(i = 10; i!= 0; i--)doSomething();

d2:ca e0 ldi r28,0x0A; 10

d4:d0 e0 ldi r29,0x00; 0

d6:0e 94 5d 00致电0xba

da:21 97 sbiw r28,0x01; 1

dc:e1 f7 brne。-8; 0xd6

de:df 91 pop r29

e0:cf 91 pop r28

e2:08 95 ret


000000e4< countUp>:

}

void countUp(void){

e4:cf 93 push r28

e6:df 93 push r29

e8:c9 e0 ldi r28,0x09; 9

ea:d0 e0 ldi r29,0x00; 0

int i;

for(i = 0; i< 10; i ++)doSomething();

ec:0e 94 5d 00叫0xba

f0:21 97 sbiw r28,0x01; 1

f2:d7 ff sbrs r29,7

f4:fb cf rjmp。-10; 0xec

f6:df 91 pop r29

f8:cf 91 pop r28

fa:08 95 ret


倒计时而不是向上计算可以保存一整条指令。它可能会产生一个

的差异。


然而,如果循环中的任何内容,编译器也无法优化

取决于关于''我'的价值。

void countDown(void){

int i;

for(i = 10; i != 0; i--)doSomething(i);

}

void countUp(void){

int i;

for(i = 0; i< 10; i ++)doSomething(i);

}


成为


void countDown(void){

ce:cf 93 push r28

d0:df 93 push r29

int i ;

for(i = 10; i!= 0; i--)doSomething(i);

d2:ca e0 ldi r28,0x0A; 10

d4:d0 e0 ldi r29,0x00; 0

d6:ce 01 movw r24,r28

d8:0e 94 5d 00 call 0xba

dc:21 97 sbiw r28,0x01; 1

de:d9 f7 brne。-10; 0xd6

e0:df 91 pop r29

e2:cf 91 pop r28

e4:08 95 ret


000000e6< countUp>:

}

void countUp(void){

e6:cf 93 push r28

e8:df 93 push r29

int i;

for(i = 0; i< 10; i ++)doSomething(i);

ea:c0 e0 ldi r28,0x00; 0

ec:d0 e0 ldi r29,0x00; 0

ee:ce 01 movw r24,r28

f0:0e 94 5d 00 call 0xba

f4:21 96 adiw r28,0x01; 1

f6:ca 30 cpi r28,0x0A; 10

f8:d1 05 cpc r29,r1

fa:cc f3 brlt。-14; 0xee

fc:df 91 pop r29

fe:cf 91 pop r28

100:08 95 ret


这次有2条额外的指示。我不认为这是一件很重要的事情。展开循环会得到更好的结果。


欢呼,


Al


Neo写道:

Hi Folks, http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
状态:for(i = 0; i< 10; i ++){.. 。}我循环通过值
0,1,2,3,4,5,6,7,8,9如果你不关心循环计数器的顺序,
你可以这样做:for(i = 10; i--;){...}使用这段代码,我循环通过值9,8,7,6,5,4,3,2, 1.0,循环应该更快。
这是有效的,因为它更快地处理我 - 作为测试条件,
表示我是非零?如果是这样,则减少并继续。对于
原始代码,处理器必须计算从10中减去i。
结果是否为非零?如果是这样,增加i并继续。在紧密的循环中,这会产生相当大的差异。
根据现代优化编译器,它有多远?并且它会在嵌入式系统中产生显着差异???




如果零(或非零)机器指令,许多微处理器都会减少jmp

所以一个不错的优化编译器应该知道这一点并使用它来倒数

到零循环。计数通常需要一个比较,然后是一个jmp零(或

非零),这将稍微慢一些。


Ian


"新" < TI ********************* @ yahoo.com>在消息中写道

news:43 ****** @ news.microsoft.com ...

Hi Folks, http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR 页面
状态:for(i = 0; i< 10; i ++){...} i遍历值
0,1,2,3,4,5,6,7,8, 9如果你不关心循环计数器的顺序,
你可以这样做:for(i = 10; i--;){...}使用这段代码,我循环
通过值9,8,7,6,5,4,3,2,1,0,循环应该更快。
这是有效的,因为它更快处理i - ;作为测试条件,
表示我是非零?如果是这样,则减少并继续。对于
原始代码,处理器必须计算从10中减去i。
结果是否为非零?如果是这样,增加i并继续。在紧密的循环中,这会产生相当大的差异。
根据现代优化编译器,它有多远?并且它会在嵌入式系统的情况下产生显着的差异吗?

谢谢,
-Neo
你真的在想,你觉得什么是真的是真的吗?"




答案是依赖于实现。


写作的一个主要优点如果您愿意,可以使用C语言写出可理解的,可维护的代码。这种手动优化与

相反。如果你真的需要关心一个循环需要多少个b / b
指令周期,那就用汇编语言编写它。否则,对于

为了那些跟在你后面的人,请你可读地编写你的C和

让编译器进行优化。现在,对于大多数正常的编译器来说,大多数编译器都可以尽可能地优化
。操作。


问候,

-

Peter Bushell
http://www.software-integrity.com/


Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7,8,9 If you don''t care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster. This
works because it is quicker to process "i--" as the test condition, which
says "is i non-zero? If so, decrement it and continue.". For the original
code, the processor has to calculate "subtract i from 10. Is the result
non-zero? if so, increment i and continue.". In tight loops, this make a
considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Thanks,
-Neo
"Do U really think, what U think real is really real?"

解决方案

Hi,

Neo wrote:
[...] In tight loops, this make a

considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???



There is nothing like an experiment to test a theory. I just tried with
AVRGCC

void countDown(void){
int i;
for(i=10; i!=0; i--) doSomething();
}
void countUp(void){
int i;
for(i=0;i<10;i++) doSomething();
}

The generated code is

000000ce <countDown>:
}

void countDown(void){
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething();
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: 0e 94 5d 00 call 0xba
da: 21 97 sbiw r28, 0x01 ; 1
dc: e1 f7 brne .-8 ; 0xd6
de: df 91 pop r29
e0: cf 91 pop r28
e2: 08 95 ret

000000e4 <countUp>:
}
void countUp(void){
e4: cf 93 push r28
e6: df 93 push r29
e8: c9 e0 ldi r28, 0x09 ; 9
ea: d0 e0 ldi r29, 0x00 ; 0
int i;
for(i=0;i<10;i++) doSomething();
ec: 0e 94 5d 00 call 0xba
f0: 21 97 sbiw r28, 0x01 ; 1
f2: d7 ff sbrs r29, 7
f4: fb cf rjmp .-10 ; 0xec
f6: df 91 pop r29
f8: cf 91 pop r28
fa: 08 95 ret

Counting down instead of up saves one whole instruction. It could make a
difference I suppose.

However, the compiler cannot optimise as well if anything in the loop
depends on the value of ''i''.
void countDown(void){
int i;
for(i=10; i!=0; i--) doSomething(i);
}
void countUp(void){
int i;
for(i=0;i<10;i++) doSomething(i);
}

Becomes

void countDown(void){
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething(i);
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: ce 01 movw r24, r28
d8: 0e 94 5d 00 call 0xba
dc: 21 97 sbiw r28, 0x01 ; 1
de: d9 f7 brne .-10 ; 0xd6
e0: df 91 pop r29
e2: cf 91 pop r28
e4: 08 95 ret

000000e6 <countUp>:
}
void countUp(void){
e6: cf 93 push r28
e8: df 93 push r29
int i;
for(i=0;i<10;i++) doSomething(i);
ea: c0 e0 ldi r28, 0x00 ; 0
ec: d0 e0 ldi r29, 0x00 ; 0
ee: ce 01 movw r24, r28
f0: 0e 94 5d 00 call 0xba
f4: 21 96 adiw r28, 0x01 ; 1
f6: ca 30 cpi r28, 0x0A ; 10
f8: d1 05 cpc r29, r1
fa: cc f3 brlt .-14 ; 0xee
fc: df 91 pop r29
fe: cf 91 pop r28
100: 08 95 ret

This time there are a whole 2 extra instructions. I don''t think this is
such a big deal. Unrolling the loop would give a better result.

cheers,

Al


Neo wrote:

Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7,8,9 If you don''t care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
This works because it is quicker to process "i--" as the test condition,
which says "is i non-zero? If so, decrement it and continue.". For the
original code, the processor has to calculate "subtract i from 10. Is the
result non-zero? if so, increment i and continue.". In tight loops, this
make a considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???



Many micros have a decrement jmp if zero (or non zero) machine instruction
so a decent optimising compiler should know this and use it in count down
to zero loops. Counting up often needs a compare followed by a jmp zero (or
non zero) which will be a tad slower.

Ian


"Neo" <ti*********************@yahoo.com> wrote in message
news:43******@news.microsoft.com...

Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7,8,9 If you don''t care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
This works because it is quicker to process "i--" as the test condition,
which says "is i non-zero? If so, decrement it and continue.". For the
original code, the processor has to calculate "subtract i from 10. Is the
result non-zero? if so, increment i and continue.". In tight loops, this
make a considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Thanks,
-Neo
"Do U really think, what U think real is really real?"



The answer is "implementation-dependent".

A major advantage of writing in C is that you can, if you choose, write
understandable, maintainable code. This kind of hand-optimisation has the
opposite effect. If you really need to care about exactly how many
instruction cycle a loop takes, code it in assembly language. Otherwise, for
the sake of those that come after you, please write your C readably and
leave the compiler to do the optimisation. These days, most compilers can
optimise almost as well as you can, for most "normal" operations.

Regards,
--
Peter Bushell
http://www.software-integrity.com/


这篇关于更快的()循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆