更快的（）循环？ [英] Faster for() loops?

查看：65 发布时间：2019/6/4 20:10:09 c

本文介绍了更快的（）循环？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Hi Folks， http：//www.abarnett。 demon.co.uk/tutorial.html#FASTFOR Page

状态：for（i = 0; i< 10; i ++）{...} i遍历值

0,1,2,3,4,5,6,7,8,9如果您不关心循环计数器的顺序，

你可以改为：for（i = 10; i--;）{...}使用此代码，我通过值9,8,7,6,5,4,3循环

，2,1,0，循环应该更快。这个

有效，因为处理i - 的速度更快。作为测试条件，

表示我是非零？如果是这样，则减少并继续。对于原始的

代码，处理器必须计算从10中减去i。结果

非零吗？如果是这样，增加i并继续。在紧密循环中，这会产生相当大的差异。

在现代优化编译器的基础上，它有多远？并且

它会对嵌入式系统产生重大影响???

谢谢，

-Neo

你真的认为，你认为真实的是真的吗？

解决方案

Neo写道：

[...]在紧密循环中，这会产生一个
相当大的差异。
它有多远..根据现代优化编译器？它会对嵌入式系统产生重大影响吗？

没有什么比测试理论的实验了。我刚试过

AVRGCC

void countDown（void）{

int i;

for（i = 10; i！= 0; i--）doSomething（）;

}

void countUp（void）{

int i;

for（i = 0; i< 10; i ++）doSomething（）;

}

生成代码是

000000ce< countDown>：

}

void countDown（void）{

ce：cf 93 push r28

d0：df 93 push r29

int i;

for（i = 10; i！= 0; i--）doSomething（）;

d2：ca e0 ldi r28,0x0A; 10

d4：d0 e0 ldi r29,0x00; 0

d6：0e 94 5d 00致电0xba

da：21 97 sbiw r28,0x01; 1

dc：e1 f7 brne。-8; 0xd6

de：df 91 pop r29

e0：cf 91 pop r28

e2：08 95 ret

000000e4< countUp>：

}

void countUp（void）{

e4：cf 93 push r28

e6：df 93 push r29

e8：c9 e0 ldi r28,0x09; 9

ea：d0 e0 ldi r29,0x00; 0

int i;

for（i = 0; i< 10; i ++）doSomething（）;

ec：0e 94 5d 00叫0xba

f0：21 97 sbiw r28,0x01; 1

f2：d7 ff sbrs r29,7

f4：fb cf rjmp。-10; 0xec

f6：df 91 pop r29

f8：cf 91 pop r28

fa：08 95 ret

倒计时而不是向上计算可以保存一整条指令。它可能会产生一个

的差异。

然而，如果循环中的任何内容，编译器也无法优化

取决于关于''我'的价值。

void countDown（void）{

int i;

for（i = 10; i ！= 0; i--）doSomething（i）;

}

void countUp（void）{

int i;

for（i = 0; i< 10; i ++）doSomething（i）;

}

成为

void countDown（void）{

ce：cf 93 push r28

d0：df 93 push r29

int i ;

for（i = 10; i！= 0; i--）doSomething（i）;

d2：ca e0 ldi r28,0x0A; 10

d4：d0 e0 ldi r29,0x00; 0

d6：ce 01 movw r24，r28

d8：0e 94 5d 00 call 0xba

dc：21 97 sbiw r28,0x01; 1

de：d9 f7 brne。-10; 0xd6

e0：df 91 pop r29

e2：cf 91 pop r28

e4：08 95 ret

000000e6< countUp>：

}

void countUp（void）{

e6：cf 93 push r28

e8：df 93 push r29

int i;

for（i = 0; i< 10; i ++）doSomething（i）;

ea：c0 e0 ldi r28,0x00; 0

ec：d0 e0 ldi r29,0x00; 0

ee：ce 01 movw r24，r28

f0：0e 94 5d 00 call 0xba

f4：21 96 adiw r28,0x01; 1

f6：ca 30 cpi r28,0x0A; 10

f8：d1 05 cpc r29，r1

fa：cc f3 brlt。-14; 0xee

fc：df 91 pop r29

fe：cf 91 pop r28

100：08 95 ret

这次有2条额外的指示。我不认为这是一件很重要的事情。展开循环会得到更好的结果。

欢呼，

Al

Neo写道：

Hi Folks， http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
状态：for（i = 0; i< 10; i ++）{.. 。}我循环通过值
0,1,2,3,4,5,6,7,8,9如果你不关心循环计数器的顺序，
你可以这样做：for（i = 10; i--;）{...}使用这段代码，我循环通过值9,8,7,6,5,4,3,2， 1.0，循环应该更快。
这是有效的，因为它更快地处理我 - 作为测试条件，
表示我是非零？如果是这样，则减少并继续。对于
原始代码，处理器必须计算从10中减去i。
结果是否为非零？如果是这样，增加i并继续。在紧密的循环中，这会产生相当大的差异。
根据现代优化编译器，它有多远？并且它会在嵌入式系统中产生显着差异???

如果零（或非零）机器指令，许多微处理器都会减少jmp

所以一个不错的优化编译器应该知道这一点并使用它来倒数

到零循环。计数通常需要一个比较，然后是一个jmp零（或

非零），这将稍微慢一些。

Ian

"新" < TI ********************* @ yahoo.com>在消息中写道

news：43 ****** @ news.microsoft.com ...
Hi Folks， http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR 页面
状态：for（i = 0; i< 10; i ++）{...} i遍历值
0,1,2,3,4,5,6,7,8， 9如果你不关心循环计数器的顺序，
你可以这样做：for（i = 10; i--;）{...}使用这段代码，我循环
通过值9,8,7,6,5,4,3,2,1,0，循环应该更快。
这是有效的，因为它更快处理i - ;作为测试条件，
表示我是非零？如果是这样，则减少并继续。对于
原始代码，处理器必须计算从10中减去i。
结果是否为非零？如果是这样，增加i并继续。在紧密的循环中，这会产生相当大的差异。
根据现代优化编译器，它有多远？并且它会在嵌入式系统的情况下产生显着的差异吗？

谢谢，
-Neo
你真的在想，你觉得什么是真的是真的吗？"

答案是依赖于实现。

写作的一个主要优点如果您愿意，可以使用C语言写出可理解的，可维护的代码。这种手动优化与

相反。如果你真的需要关心一个循环需要多少个b / b
指令周期，那就用汇编语言编写它。否则，对于

为了那些跟在你后面的人，请你可读地编写你的C和

让编译器进行优化。现在，对于大多数正常的编译器来说，大多数编译器都可以尽可能地优化
。操作。

问候，

-

Peter Bushell
http://www.software-integrity.com/

Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7,8,9 If you don''t care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster. This
works because it is quicker to process "i--" as the test condition, which
says "is i non-zero? If so, decrement it and continue.". For the original
code, the processor has to calculate "subtract i from 10. Is the result
non-zero? if so, increment i and continue.". In tight loops, this make a
considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Thanks,
-Neo
"Do U really think, what U think real is really real?"

解决方案

Hi,

Neo wrote:
[...] In tight loops, this make a
considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

There is nothing like an experiment to test a theory. I just tried with
AVRGCC

void countDown(void){
int i;
for(i=10; i!=0; i--) doSomething();
}
void countUp(void){
int i;
for(i=0;i<10;i++) doSomething();
}

The generated code is

000000ce <countDown>:
}

void countDown(void){
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething();
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: 0e 94 5d 00 call 0xba
da: 21 97 sbiw r28, 0x01 ; 1
dc: e1 f7 brne .-8 ; 0xd6
de: df 91 pop r29
e0: cf 91 pop r28
e2: 08 95 ret

000000e4 <countUp>:
}
void countUp(void){
e4: cf 93 push r28
e6: df 93 push r29
e8: c9 e0 ldi r28, 0x09 ; 9
ea: d0 e0 ldi r29, 0x00 ; 0
int i;
for(i=0;i<10;i++) doSomething();
ec: 0e 94 5d 00 call 0xba
f0: 21 97 sbiw r28, 0x01 ; 1
f2: d7 ff sbrs r29, 7
f4: fb cf rjmp .-10 ; 0xec
f6: df 91 pop r29
f8: cf 91 pop r28
fa: 08 95 ret

Counting down instead of up saves one whole instruction. It could make a
difference I suppose.

However, the compiler cannot optimise as well if anything in the loop
depends on the value of ''i''.
void countDown(void){
int i;
for(i=10; i!=0; i--) doSomething(i);
}
void countUp(void){
int i;
for(i=0;i<10;i++) doSomething(i);
}

Becomes

void countDown(void){
ce: cf 93 push r28
d0: df 93 push r29
int i;
for(i=10; i!=0; i--) doSomething(i);
d2: ca e0 ldi r28, 0x0A ; 10
d4: d0 e0 ldi r29, 0x00 ; 0
d6: ce 01 movw r24, r28
d8: 0e 94 5d 00 call 0xba
dc: 21 97 sbiw r28, 0x01 ; 1
de: d9 f7 brne .-10 ; 0xd6
e0: df 91 pop r29
e2: cf 91 pop r28
e4: 08 95 ret

000000e6 <countUp>:
}
void countUp(void){
e6: cf 93 push r28
e8: df 93 push r29
int i;
for(i=0;i<10;i++) doSomething(i);
ea: c0 e0 ldi r28, 0x00 ; 0
ec: d0 e0 ldi r29, 0x00 ; 0
ee: ce 01 movw r24, r28
f0: 0e 94 5d 00 call 0xba
f4: 21 96 adiw r28, 0x01 ; 1
f6: ca 30 cpi r28, 0x0A ; 10
f8: d1 05 cpc r29, r1
fa: cc f3 brlt .-14 ; 0xee
fc: df 91 pop r29
fe: cf 91 pop r28
100: 08 95 ret

This time there are a whole 2 extra instructions. I don''t think this is
such a big deal. Unrolling the loop would give a better result.

cheers,

Al

Neo wrote:

Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7,8,9 If you don''t care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
This works because it is quicker to process "i--" as the test condition,
which says "is i non-zero? If so, decrement it and continue.". For the
original code, the processor has to calculate "subtract i from 10. Is the
result non-zero? if so, increment i and continue.". In tight loops, this
make a considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Many micros have a decrement jmp if zero (or non zero) machine instruction
so a decent optimising compiler should know this and use it in count down
to zero loops. Counting up often needs a compare followed by a jmp zero (or
non zero) which will be a tad slower.

Ian

"Neo" <ti*********************@yahoo.com> wrote in message
news:43******@news.microsoft.com...
Hi Folks,http://www.abarnett.demon.co.uk/tutorial.html#FASTFOR Page
states:for( i=0; i<10; i++){ ... }i loops through the values
0,1,2,3,4,5,6,7,8,9 If you don''t care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }Using this code, i loops
through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
This works because it is quicker to process "i--" as the test condition,
which says "is i non-zero? If so, decrement it and continue.". For the
original code, the processor has to calculate "subtract i from 10. Is the
result non-zero? if so, increment i and continue.". In tight loops, this
make a considerable difference.
How far it holds true.. in the light of modern optimizing compilers? and
will it make a significant difference in case of embedded systems???

Thanks,
-Neo
"Do U really think, what U think real is really real?"

The answer is "implementation-dependent".

A major advantage of writing in C is that you can, if you choose, write
understandable, maintainable code. This kind of hand-optimisation has the
opposite effect. If you really need to care about exactly how many
instruction cycle a loop takes, code it in assembly language. Otherwise, for
the sake of those that come after you, please write your C readably and
leave the compiler to do the optimisation. These days, most compilers can
optimise almost as well as you can, for most "normal" operations.

Regards,
--
Peter Bushell
http://www.software-integrity.com/

这篇关于更快的（）循环？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

更快的（）循环？ [英] Faster for() loops?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

更快的（）循环？ [英] Faster for() loops?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭