为什么LLVM不通过优化浮点指令? [英] Why don't LLVM passes optimize floating point instructions?

查看:136
本文介绍了为什么LLVM不通过优化浮点指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

见上面。我写了一些示例函数:
$ b $ p $ source $ source

 定义i32 @bleh(i32%x){
条目:
%addtmp = add i32%x,%x
%addtmp1 = add i32%addtmp,%x
%addtmp2 = add i32%addtmp1,%x
%addtmp3 = add i32%addtmp2,%x
%addtmp4 = add i32%addtmp3,1
%addtmp5 = add i32%addtmp4,2
addtmp6 = add i32%addtmp5,3
%multmp = mul i32%x,3
addtmp7 = add i32%addtmp6,%multmp
ret i32%addtmp7
}

source- fp.ll:

$ p $ define double @bleh(double%x){
entry :
%addtmp = fadd double%x,%x
%addtmp1 = fadd double%addtmp,%x
%addtmp2 = fadd double%addtmp1,%x
%addtmp3 = fadd double%addtmp2,%x
%addtmp4 = fadd double%addtmp3,1.000000e + 00
%addtmp5 = fadd double%addtmp4,2.000000e + 00
%addtmp6 = fadd double%addtmp5 ,3.000000e + 00
%multmp = fmul double%x,3.0000 00e + 00
%addtmp7 = fadd double%addtmp6,%multmp
ret double%addtmp7
}

为什么当我使用

opt -O3 source [-fp] .ll -o opt.source [-fp] .ll -S



i32 一个得到优化,但是 double 一个没有?我期望 fadd 合并成一个 fmul 。相反,它看起来完全一样。



是否由于标记设置不同?我知道 i32 对于 double 是不可行的某些优化。但缺乏简单的常量折叠是我无法理解的。



我使用的是LLVM 3.1。

解决方案

不可能说优化是不可能的。

 %addtmp = fadd double%x我将通过前几行来显示转换的位置和不允许的位置,%x 

第一行可以安全地转换为 fmul double%x 2.0e + 0 ,但实际上这并不是大多数架构上的优化( fadd )通常比 fmul快,并且不需要产生常量 2.0 )。请注意,禁止溢出,这个操作是确切的(就像所有的两个幂的缩放比例)。

 %addtmp1 = fadd double%addtmp ,%x 

这行可以转换为 fmul double%x 3.0e 0 。为什么这是一个合法的转变?因为产生%addtmp 的计算是精确的,所以只计算一次舍入,无论这个计算是否为 x * 3 x + x + x 。由于这些是IEEE-754基本操作,因此正确舍入,结果也是相同的。怎么溢出?

 %addtmp2 = fadd double%addtmp1,%x 

code>

这是不能合法转换为常量* x的第一行。 4 * x 会精确计算,而不会舍入,而 x + x + x + x 会产生两个舍入: code> x + x + x 会舍入一次,然后再次添加 x 。 b

 %addtmp3 = fadd double%addtmp2,%x 

这里同上; 5 * x 会产生一个舍入; x + x + x + x + x 会导致三次错误。



3 * x 替换 x + x + x 。然而,子表达式 x + x 已经存在于其他地方,所以优化器很容易选择不使用这个转换(因为它可以利用现有的部分结果不)。

See above. I wrote to sample functions:

source.ll:

define i32 @bleh(i32 %x) {
entry:
  %addtmp = add i32 %x, %x
  %addtmp1 = add i32 %addtmp, %x
  %addtmp2 = add i32 %addtmp1, %x
  %addtmp3 = add i32 %addtmp2, %x
  %addtmp4 = add i32 %addtmp3, 1
  %addtmp5 = add i32 %addtmp4, 2
  %addtmp6 = add i32 %addtmp5, 3
  %multmp = mul i32 %x, 3
  %addtmp7 = add i32 %addtmp6, %multmp
  ret i32 %addtmp7
}

source-fp.ll:

define double @bleh(double %x) {
entry:
  %addtmp = fadd double %x, %x
  %addtmp1 = fadd double %addtmp, %x
  %addtmp2 = fadd double %addtmp1, %x
  %addtmp3 = fadd double %addtmp2, %x
  %addtmp4 = fadd double %addtmp3, 1.000000e+00
  %addtmp5 = fadd double %addtmp4, 2.000000e+00
  %addtmp6 = fadd double %addtmp5, 3.000000e+00
  %multmp = fmul double %x, 3.000000e+00
  %addtmp7 = fadd double %addtmp6, %multmp
  ret double %addtmp7
}

Why is it that when I optimize both functions using

opt -O3 source[-fp].ll -o opt.source[-fp].ll -S

that the i32 one gets optimized but the double one doesn't? I expected the fadd to get combined to a single fmul. Instead it looks exactly the same.

Is it due to the flags being set differently? I am aware of certain optimizations that are possible for i32 that are not doable for double. But the absence of simple constant folding is beyond my understanding.

I am using LLVM 3.1.

解决方案

It's not quite true to say that no optimization is possible. I'll go through the first few lines to show where transformations are and are not allowed:

  %addtmp = fadd double %x, %x

This first line could safely be transformed to fmul double %x 2.0e+0, but that's not actually an optimization on most architectures (fadd is generally as fast or faster than fmul, and doesn't require producing the constant 2.0). Note that barring overflow, this operation is exact (like all scaling by powers of two).

  %addtmp1 = fadd double %addtmp, %x

This line could be transformed to fmul double %x 3.0e+0. Why is this a legal transformation? Because the computation that produced %addtmp was exact, so only a single rounding is been incurred whether this is computed as x * 3 or x + x + x. Because these are IEEE-754 basic operations and therefore correctly rounded, the result is the same either way. What about overflow? Neither may overflow unless the other does as well.

  %addtmp2 = fadd double %addtmp1, %x

This is the first line that cannot be legally transformed into constant * x. 4 * x would compute exactly, without any rounding, whereas x + x + x + x incurs two roundings: x + x + x is rounded once, then adding x may round a second time.

  %addtmp3 = fadd double %addtmp2, %x

Ditto here; 5 * x would incur one rounding; x + x + x + x + x incurs three.

The only line that might be beneficially transformed would be replacing x + x + x with 3 * x. However, the subexpression x + x is already present elsewhere, so an optimizer easily could choose not to employ this transform (since it can take advantage of the existing partial result if it does not).

这篇关于为什么LLVM不通过优化浮点指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆