为什么LLVM不通过优化浮点指令? [英] Why don't LLVM passes optimize floating point instructions?
问题描述
$ b $ p $ source $ source
定义i32 @bleh(i32%x){
条目:
%addtmp = add i32%x,%x
%addtmp1 = add i32%addtmp,%x
%addtmp2 = add i32%addtmp1,%x
%addtmp3 = add i32%addtmp2,%x
%addtmp4 = add i32%addtmp3,1
%addtmp5 = add i32%addtmp4,2
addtmp6 = add i32%addtmp5,3
%multmp = mul i32%x,3
addtmp7 = add i32%addtmp6,%multmp
ret i32%addtmp7
}
为什么当我使用 即 是否由于标记设置不同?我知道 我使用的是LLVM 3.1。 不可能说优化是不可能的。 第一行可以安全地转换为 这行可以转换为 source- fp.ll:
$ p $ define double @bleh(double%x){
entry :
%addtmp = fadd double%x,%x
%addtmp1 = fadd double%addtmp,%x
%addtmp2 = fadd double%addtmp1,%x
%addtmp3 = fadd double%addtmp2,%x
%addtmp4 = fadd double%addtmp3,1.000000e + 00
%addtmp5 = fadd double%addtmp4,2.000000e + 00
%addtmp6 = fadd double%addtmp5 ,3.000000e + 00
%multmp = fmul double%x,3.0000 00e + 00
%addtmp7 = fadd double%addtmp6,%multmp
ret double%addtmp7
}
opt -O3 source [-fp] .ll -o opt.source [-fp] .ll -S
i32
一个得到优化,但是 double
一个没有?我期望 fadd
合并成一个 fmul
。相反,它看起来完全一样。
i32
对于 double
是不可行的某些优化。但缺乏简单的常量折叠是我无法理解的。
%addtmp = fadd double%x我将通过前几行来显示转换的位置和不允许的位置,%x
fmul double%x 2.0e + 0
,但实际上这并不是大多数架构上的优化( fadd
)通常比 fmul快
,并且不需要产生常量 2.0
)。请注意,禁止溢出,这个操作是确切的(就像所有的两个幂的缩放比例)。
%addtmp1 = fadd double%addtmp ,%x
fmul double%x 3.0e 0
。为什么这是一个合法的转变?因为产生%addtmp
的计算是精确的,所以只计算一次舍入,无论这个计算是否为 x * 3
或 x + x + x
。由于这些是IEEE-754基本操作,因此正确舍入,结果也是相同的。怎么溢出?
%addtmp2 = fadd double%addtmp1,%x
这是不能合法转换为常量* x的第一行。 4 * x
会精确计算,而不会舍入,而 x + x + x + x
会产生两个舍入: code> x + x + x 会舍入一次,然后再次添加 x
。 b
%addtmp3 = fadd double%addtmp2,%x
这里同上; 5 * x
会产生一个舍入; x + x + x + x + x
会导致三次错误。
用 3 * x
替换 x + x + x
。然而,子表达式 x + x
已经存在于其他地方,所以优化器很容易选择不使用这个转换(因为它可以利用现有的部分结果不)。
See above. I wrote to sample functions:
source.ll:
define i32 @bleh(i32 %x) {
entry:
%addtmp = add i32 %x, %x
%addtmp1 = add i32 %addtmp, %x
%addtmp2 = add i32 %addtmp1, %x
%addtmp3 = add i32 %addtmp2, %x
%addtmp4 = add i32 %addtmp3, 1
%addtmp5 = add i32 %addtmp4, 2
%addtmp6 = add i32 %addtmp5, 3
%multmp = mul i32 %x, 3
%addtmp7 = add i32 %addtmp6, %multmp
ret i32 %addtmp7
}
source-fp.ll:
define double @bleh(double %x) {
entry:
%addtmp = fadd double %x, %x
%addtmp1 = fadd double %addtmp, %x
%addtmp2 = fadd double %addtmp1, %x
%addtmp3 = fadd double %addtmp2, %x
%addtmp4 = fadd double %addtmp3, 1.000000e+00
%addtmp5 = fadd double %addtmp4, 2.000000e+00
%addtmp6 = fadd double %addtmp5, 3.000000e+00
%multmp = fmul double %x, 3.000000e+00
%addtmp7 = fadd double %addtmp6, %multmp
ret double %addtmp7
}
Why is it that when I optimize both functions using
opt -O3 source[-fp].ll -o opt.source[-fp].ll -S
that the i32
one gets optimized but the double
one doesn't? I expected the fadd
to get combined to a single fmul
. Instead it looks exactly the same.
Is it due to the flags being set differently? I am aware of certain optimizations that are possible for i32
that are not doable for double
. But the absence of simple constant folding is beyond my understanding.
I am using LLVM 3.1.
It's not quite true to say that no optimization is possible. I'll go through the first few lines to show where transformations are and are not allowed:
%addtmp = fadd double %x, %x
This first line could safely be transformed to fmul double %x 2.0e+0
, but that's not actually an optimization on most architectures (fadd
is generally as fast or faster than fmul
, and doesn't require producing the constant 2.0
). Note that barring overflow, this operation is exact (like all scaling by powers of two).
%addtmp1 = fadd double %addtmp, %x
This line could be transformed to fmul double %x 3.0e+0
. Why is this a legal transformation? Because the computation that produced %addtmp
was exact, so only a single rounding is been incurred whether this is computed as x * 3
or x + x + x
. Because these are IEEE-754 basic operations and therefore correctly rounded, the result is the same either way. What about overflow? Neither may overflow unless the other does as well.
%addtmp2 = fadd double %addtmp1, %x
This is the first line that cannot be legally transformed into constant * x. 4 * x
would compute exactly, without any rounding, whereas x + x + x + x
incurs two roundings: x + x + x
is rounded once, then adding x
may round a second time.
%addtmp3 = fadd double %addtmp2, %x
Ditto here; 5 * x
would incur one rounding; x + x + x + x + x
incurs three.
The only line that might be beneficially transformed would be replacing x + x + x
with 3 * x
. However, the subexpression x + x
is already present elsewhere, so an optimizer easily could choose not to employ this transform (since it can take advantage of the existing partial result if it does not).
这篇关于为什么LLVM不通过优化浮点指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!