什么是内联函数的好启发式? [英] What are good heuristics for inlining functions?
问题描述
考虑到你只是试图优化速度,什么是好的启发式决定是否内联函数?显然,代码大小应该是重要的,但是当(例如)gcc或icc确定是否内联函数调用时,通常使用任何其他因素?
- 除了内存大小和缓存问题,另一个考虑是注册压力。从编译器的观点来看,来自内联过程的增加的变量可能消耗额外的寄存器,并且在寄存器压力已经很高的区域中,这可能会强制溢出,这导致额外的RAM访问。
使用JIT编译器和运行时类加载的语言有其他折衷,因为虚拟方法不是静态的,但JIT可以收集运行时分析信息,例如方法调用频率:
-
在即时编译器中设计,实现和评估优化(针对Java)讨论静态方法和动态加载类的方法内联及其对性能的改进。 p>
在Google Scholar上搜索会显示一些论文,例如
在Google图书搜索展示了相当多的书籍与论文或
-
The Compiler Design Handbook:Optimizations and Machine Code Generation 有一章关于编译器设计中的统计和机器学习技术,启发式设置各种参数,剖析结果。本章参考了Vaswani等人的文章:微结构敏感的经验模型用于编译器优化,其中他们提出使用经验建模
技术来构建用于编译器优化的微架构敏感模型。 -
(其他一些书从程序员的角度来讨论,例如 C ++ for Game Programmers ,它经常讨论内联函数的危险以及内联和宏之间的差异编译器经常忽略程序员的内联请求if他们可以确定他们会做更多的危害比好;这可以覆盖宏作为最后的手段。)
Considering that you're trying solely to optimize for speed, what are good heuristics for deciding whether to inline a function or not? Obviously code size should be important, but are there any other factors typically used when (say) gcc or icc is determining whether to inline a function call? Has there been any significant academic work in the area?
Wikipedia has a few paragraphs about this, with some links at the bottom:
- In addition to memory size and cache issues, another consideration is register pressure. From the compiler's point of view "the added variables from the inlined procedure may consume additional registers, and in an area where register pressure is already high this may force spilling, which causes additional RAM accesses."
Languages with JIT compilers and runtime class loading have other tradeoffs since the virtual methods aren't known statically, yet the JIT can collect runtime profiling information, such as method call frequency:
Design, Implementation, and Evaluation of Optimizations in a Just-in-Time Compiler (for Java) talks about method inlining of static methods and dynamically loaded classes and its improvements on performance.
Practicing JUDO: Java Under Dynamic Optimizations claims that their "inlining policy is based on the code size and profiling information. If the execution frequency of a method entry is below a certain threshold, the method is then not inlined because it is regarded as a cold method. To avoid code explosion, we do not inline a method with a bytecode size of more than 25 bytes. . . . To avoid inlining along a deep call chain, inlining stops when the accumulated inlined bytecode size along the call chain exceeds 40 bytes." Although they have runtime profiling information (method call frequency) they are still careful to avoid inlining large functions or chains of functions to prevent bloat.
A search on Google Scholar reveals a number of papers, such as
- The effect of code expanding optimizations on instruction cache design
- Function Inlining under Code Size Constraints for Embedded Processors
A search on Google Books reveals quite a number of books with papers or chapters about function inlining in various contexts.
The Compiler Design Handbook: Optimizations and Machine Code Generation has a chapter about Statisical and Machine Learning Techniques in Compiler Design, with heuristics to set various parameters, profiling the results. This chapter references the Vaswani et al paper Microarchitecture Sensitive Empirical Models for Compiler Optimizations where they propose "the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations".
(Some other books talk about inling from the programmer's point of view, such as C++ for Game Programmers, which talks about the dangers of inlining functions too often and the differences between inlining and macros. Compilers often ignore the programmer's inline requests if they can determine that they would do more harm than good; this can be overridden with macros as a last resort.)
这篇关于什么是内联函数的好启发式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!