循环中的常量条件:编译器优化 [英] Constant condition in a loop: compiler optimization
问题描述
//预处理器
#include< iostream>
#include< vector>
$ b $ //内部分支
void f1(std :: vector< int>& v,const int x = 0)
{
for(unsigned int i = 1; i v [i] =(x!= 0)? (v [i-1] * x):(v [i-1] + v [i-1]);
$ b //外部分支
void f2(std :: vector< int>& v,const int x = 0)
($ unsigned int i = 1; i v [i] = v [i -1] * X; $ unsigned int i = 1; i v [i] = v [i- 1] + v [I-1];
$ b $主要
int main()
{
std :: vector< int> v(10,2);
f1(v);
f2(v);
返回0;
$ b $ p
$ b 它演示了产生相同结果的两个函数的行为:
-
f1
:条件在循环内进行测试
-
f2
:条件在循环之外被测试
分支基于 x
,它被声明为 const
。
我的问题是:编译器是否足够智能,能够转换 f2 $中的 f1
c $ c>当所有的优化级别都打开吗?
解决方案最好的方法来检查你的编译器是否提升了条件循环将在完全优化编译之后真正检查程序集。
在构建您的示例之后:
g ++ -O3 -c example.cpp -o example.o
objdump -d -M intel example.o> example.S
这是我得到的 f1
:
$ b $ pre $ code> 00000020< f1(std :: vector< int,std :: allocator< int>>& int)>:
; ...
23:8b 54 24 10 mov edx,DWORD PTR [esp + 0x10]
27:8b 7c 24 14 mov edi,DWORD PTR [esp + 0x14]
2b:8b 02 mov eax,DWORD PTR [edx]
2d:8b 4a 04 mov ecx,DWORD PTR [edx + 0x4]
30:29 c1 sub ecx,eax
32:c1 f9 02 sar ecx,0x2
35:83 f9 01 cmp ecx,0x1
38:76 d jbe 57 3a:31 db xor ebx,ebx
3c:85 ff test edi,edi
3e:ba 01 00 00 00 mov edx,0x1
43:75 b jne 60< ; f1(std :: vector< int,std :: allocator< int>>&,int)+ 0x40>
45:8b 34 18 mov esi,DWORD PTR [eax + ebx * 1]
48:83 c3 04 add ebx,0x4
4b:01 f6 add esi,esi
4d:89 34 90 mov DWORD PTR [eax + edx * 4],esi
50:83 c2 01 add edx,0x1
53:39 d1 cmp ecx,edx
55:75 ee jne 45 57:5b pop ebx
58:5e pop esi
59:5f pop edi
5a:c3 ret
5b:90 nop
5c:8d 74 26 00 lea esi,[esi + eiz * 1 + 0x0]
60:8b 34 18 mov esi,DWORD PTR [eax + ebx * 1]
63:83 c3 04 add ebx,0x4
66:0f af f7 imul esi,edi
69:89 34 90 mov DWORD PTR [eax + edx * 4],esi
6c:83 c2 01 add edx,0x1
6f:39ca cmp edx,ecx
71:75 ed jne 60 73:eb e2 jmp 57
在第3c行中,您发现要检查的条件:
;如果(x!= 0)
3c:85 ff test edi,edi
43:75 b jne 60; ...
在检查之后, x
不再被测试,循环只是为每个部分执行。从第45行到第55行的第一个循环在 x == 0
时完成。从第60行开始到第71行的第二个循环是在 x!= 0
时完成的。
在这种情况下,gcc能够在完全优化启用的情况下提取条件。
Consider the following code:
// Preprocessor
#include <iostream>
#include <vector>
// Internal branching
void f1(std::vector<int>& v, const int x = 0)
{
for (unsigned int i = 1; i < v.size(); ++i) {
v[i] = (x != 0) ? (v[i-1]*x) : (v[i-1]+v[i-1]);
}
}
// External branching
void f2(std::vector<int>& v, const int x = 0)
{
if (x != 0) {
for (unsigned int i = 1; i < v.size(); ++i) {
v[i] = v[i-1]*x;
}
} else {
for (unsigned int i = 1; i < v.size(); ++i) {
v[i] = v[i-1]+v[i-1];
}
}
}
// Main
int main()
{
std::vector<int> v(10, 2);
f1(v);
f2(v);
return 0;
}
It illustrates the behaviour of two functions that produce the same result:
f1
: the condition is tested inside of the loopf2
: the condition is tested outside of the loop
The branching is based on x
, which is declared as const
.
My question is: are compilers sufficiently intelligent to transform f1
in f2
when all optimization levels are turned on?
The best way to check if your compiler is hoisting the conditional out the loop is to really inspect the assembly after compiling it with full optimizations.
After building your example with:
g++ -O3 -c example.cpp -o example.o
objdump -d -M intel example.o > example.S
Here's what I got for f1
:
00000020 <f1(std::vector<int, std::allocator<int> >&, int)>:
; ...
23: 8b 54 24 10 mov edx,DWORD PTR [esp+0x10]
27: 8b 7c 24 14 mov edi,DWORD PTR [esp+0x14]
2b: 8b 02 mov eax,DWORD PTR [edx]
2d: 8b 4a 04 mov ecx,DWORD PTR [edx+0x4]
30: 29 c1 sub ecx,eax
32: c1 f9 02 sar ecx,0x2
35: 83 f9 01 cmp ecx,0x1
38: 76 d jbe 57 <f1(std::vector<int, std::allocator<int> >&, int)+0x37>
3a: 31 db xor ebx,ebx
3c: 85 ff test edi,edi
3e: ba 01 00 00 00 mov edx,0x1
43: 75 b jne 60 <f1(std::vector<int, std::allocator<int> >&, int)+0x40>
45: 8b 34 18 mov esi,DWORD PTR [eax+ebx*1]
48: 83 c3 04 add ebx,0x4
4b: 01 f6 add esi,esi
4d: 89 34 90 mov DWORD PTR [eax+edx*4],esi
50: 83 c2 01 add edx,0x1
53: 39 d1 cmp ecx,edx
55: 75 ee jne 45 <f1(std::vector<int, std::allocator<int> >&, int)+0x25>
57: 5b pop ebx
58: 5e pop esi
59: 5f pop edi
5a: c3 ret
5b: 90 nop
5c: 8d 74 26 00 lea esi,[esi+eiz*1+0x0]
60: 8b 34 18 mov esi,DWORD PTR [eax+ebx*1]
63: 83 c3 04 add ebx,0x4
66: 0f af f7 imul esi,edi
69: 89 34 90 mov DWORD PTR [eax+edx*4],esi
6c: 83 c2 01 add edx,0x1
6f: 39 ca cmp edx,ecx
71: 75 ed jne 60 <f1(std::vector<int, std::allocator<int> >&, int)+0x40>
73: eb e2 jmp 57 <f1(std::vector<int, std::allocator<int> >&, int)+0x37>
On line 3c, you find the conditional being checked:
; if(x != 0)
3c: 85 ff test edi,edi
43: 75 b jne 60 ; ...
From that point on after the check, x
is never tested again and the loops are just executed for each part. The first loop starting from line 45 to 55 is done when x == 0
. The second loop starting from line 60 to 71 is done when x != 0
.
So yes, at least in this case, gcc is able to hoist the conditional out of the loop with full optimizations enable.
这篇关于循环中的常量条件:编译器优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!