具体来说,编译器做什么来积极优化生成的字节码? [英] Specifically what does a compiler do to aggressively optimize generated bytecode?
问题描述
我一直在阅读各种编译器的功能,我遇到了许多编译器报告执行的术语积极优化。 LLVM例如引用以下编译时优化特征:
I have been reading up on the functionality of various compilers and I've come across the term "aggressive optimization" that many compilers are reported to perform. LLVM, for example cites the following compile-time optimization features:
- 特定于内存/指针
- 数据流
- 算术
- 删除死码
- 内联
- Memory/pointer specific
- Loop transforms
- Data flow
- Arithmetic
- Dead code elimination
- Inlining
这是什么意思?假设你有以下代码片段,你如何优化生成的字节代码来运行任何比编译器生成的更快?我特别感兴趣的是优化JIT驱动的运行时的字节码,例如C#,Java和Flash。这很棘手,因为JIT只支持处理器通常执行的操作码子集,这限制了您可以执行的优化数量。仍然,我有兴趣了解什么是可能的,什么转换可以推动虚拟机的限制。
What does this mean specifically? Say you had the following code snippet, how could you optimize the generated byte code to run any faster than what the compiler generated? I'm specifically interested in optimizing the bytecode of JIT-powered runtimes such as C#, Java and Flash. This is tricky because the JIT only supports a subset of the opcodes that the processor usually does, which limits the amount of optimization you can do. Still, I'm interested to see whats possible and exactly what transformations could push the limits of the VM.
虚构的代码块:
for (i = 0; i < 100; i++){
in = dataIn[i];
if ((in % 5) == 0){
out = ((in / 2) >> 16) - 10;
}else{
out = ((in << 5) / 2) * 50 + 10;
}
dataOut[i] = out;
}
编译器生成的近似伪代码,用于基于堆栈的JIT VM作为Flash Player:(原谅我的任何错误,这是完全手写的!)
Approximate pseudo code generated by the compiler, for a stack-based JIT VM such as Flash Player: (forgive me for any mistakes, this is entirely handwritten!)
// i = 0
label: "forInit"
push 0
writeTo "i"
// while i < 100
label: "forStart"
push "i"
push 100
jumpIfMoreThan "forEnd"
// in = dataIn[i];
push "i"
push "dataIn"
readProp
saveTo "in"
// if ((in % 5) == 0)
push "in"
push 5
mod
push 0
jumpIfNotEquals "ifPart2"
label: ifPart1
// out = ((in / 2) >> 16) - 10;
push "in"
push 2
divide
push 16
rightshift
push 10
minus
writeTo "out"
goto "ifEnd"
// else
label: ifPart2
// out = ((in << 5) / 2) * 50 + 10;
push "in"
push 5
leftshift
push 2
divide
push 50
multiply
push 10
add
writeTo "out"
// dataOut[i] = out;
label: ifEnd
push "out"
push "i"
push "dataOut"
writeProp
// i++
push "i"
increment
writeTo "i"
// while i < 100
goto "forStart"
label: "forEnd"
推荐答案
编译器可以做两个简单的优化:
Here are two simple optimizations a compiler could make:
out = ((i / 2) >> 16) - 10;
可以减少为
out = (i >> 17) - 10;
和
out = ((i << 5) / 2) * 50 + 10;
可以减少为
out = (i << 4) * 50 + 10;
要回答您的问题如何优化生成的字节代码运行速度比编译器生成?这是另一个版本的字节码,它有一些优化。
To answer your question "how could you optimize the generated byte code to run any faster than what the compiler generated?" Here is another version of the bytecode that has some optimizations.
// i = 0
label: "forInit"
push 0
writeTo "i"
// while i < 100
label: "forStart"
push "i"
push 100
jumpIfMoreThan "forEnd"
// in = dataIn[i];
push "i"
push "dataIn"
readProp
saveTo "in"
// if ((in % 5) == 0)
push "in"
push 5
mod
push 0
jumpIfNotEquals "ifPart2"
label: ifPart1
// optimization: remove unnecessary /2
// out = ((in / 2) >> 16) - 10;
push "in"
push 17
rightshift
push 10
minus
// optimization: don't need out var since value on stack
// dataOut[i] = out;
push "i"
push "dataOut"
writeProp
// optimization: avoid branch to common loop end
// i++
push "i"
increment
writeTo "i"
goto "forStart"
// else
label: ifPart2
// optimization: remove unnecessary /2
// out = ((in << 5) / 2) * 50 + 10;
push "in"
push 4
leftshift
push 50
multiply
push 10
add
// optimization: don't need out var since value on stack
// dataOut[i] = out;
push "i"
push "dataOut"
writeProp
// optimization: avoid branch to common loop end
// i++
push "i"
increment
writeTo "i"
goto "forStart"
label: "forEnd"
这篇关于具体来说,编译器做什么来积极优化生成的字节码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!