什么时候Hotspot可以在堆栈上分配对象? [英] When can Hotspot allocate objects on the stack?

查看:198
本文介绍了什么时候Hotspot可以在堆栈上分配对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从Java 6的某个地方开始,Hotspot JVM可以进行转义分析并在堆栈上而不是在垃圾收集堆上分配非转义对象。这样可以加快生成的代码并降低垃圾收集器的压力。

Since somewhere around Java 6, the Hotspot JVM can do escape analysis and allocate non-escaping objects on the stack instead of on the garbage collected heap. This results in a speedup of the generated code and reduces pressure on the garbage collector.

Hotspot能够堆叠分配对象的规则是什么?换句话说,我什么时候可以依靠它进行堆栈分配?

What are the rules for when Hotspot is able to stack allocate objects? In other words when can I rely on it to do stack allocation?

编辑:这个问题是重复的,但是(IMO)答案下面是一个比原问题更好的答案。

edit: This question is a duplicate, but (IMO) the answer below is a better answer than what is available at the original question.

推荐答案

我做了一些实验,以便查看Hotspot的时间能够堆叠分配。事实证明,它的堆栈分配比基于可用文档。 ChoiEscape Analysis for Java引用的论文表明,只能分配给局部变量的对象总是可以堆栈分配。但事实并非如此。

I have done some experimentation in order to see when Hotspot is able to stack allocate. It turns out that its stack allocation is quite a bit more limited than what you might expect based on the available documentation. The referenced paper by Choi "Escape Analysis for Java" suggests that an object that is only ever assigned to local variables can always be stack allocated. But that is not true.

所有这些都是当前Hotspot实现的实现细节,因此它们可能会在将来的版本中进行更改。这是指我的OpenJDK安装,版本为1.8.0_121,适用于X86-64。

All of this are implementation details of the current Hotspot implementation, so they could change in future versions. This refers to my OpenJDK install which is version 1.8.0_121 for X86-64.

根据相当多的实验,简短摘要似乎是:

The short summary, based on quite a bit of experimentation, seems to be:

如果


    内联的所有用途
  • 永远不会将其分配给任何静态或对象字段,仅分配给本地变量

  • 在程序的每个点,哪些局部变量包含对象的引用必须是JIT时间可确定的,并且不依赖于任何不可预测的条件控制流。

  • 如果对象是一个数组,它的大小必须在JIT时知道,索引必须使用JIT时间常量。

  • all its uses are inlined
  • it is never assigned to any static or object fields, only to local variables
  • at each point in the program, which local variables contain references to the object must be JIT-time determinable, and not depend on any unpredictable conditional control flow.
  • If the object is an array, its size must be known at JIT time and indexing into it must use JIT-time constants.

要了解这些条件何时成立,您需要了解Hotspot的工作原理。由于涉及许多非本地因素,依赖Hotspot确定在某种情况下进行堆栈分配可能存在风险。特别是知道是否所有内联都很难预测。

To know when these conditions hold you need to know quite a bit about how Hotspot works. Relying on Hotspot to definately do stack allocation in a certain situation can be risky, as a lot of non-local factors are involved. Especially knowing if everything is inlined can be difficult to predict.

实际上,如果你只是使用它们进行迭代,简单的迭代器通常是堆栈可分配的。对于复合对象,只能对外层对象进行堆栈分配,因此列表和其他集合总是会导致堆分配。

Practically speaking, simple iterators will usually be stack allocatable if you just use them to iterate. For composite objects only the outer object can ever be stack allocated, so lists and other collections always cause heap allocation.

如果您有 HashMap< Integer,Something> ,您可以在 myHashMap中使用它.get(42) 42 可能会在测试程序中堆栈分配,但它不会在完整的应用程序中,因为您可以确定在整个程序中HashMaps中将有两种以上类型的键对象,因此键上的hashCode和equals方法不会内联。

If you have a HashMap<Integer,Something> and you use it in myHashMap.get(42), the 42 may stack allocate in a test program, but it will not in a full application because you can be sure that there will be more than two types of key objects in HashMaps in the entire program, and therefore the hashCode and equals methods on the key won't inline.

除此之外看不到任何普遍适用的规则,它将取决于代码的具体情况。

Beyond that I don't see any generally applicable rules, and it will depend on the specifics of the code.

首先要知道的是,在内联之后执行转义分析。这意味着Hotspot的转义分析在这方面比Choi论文中的描述更强大,因为从方法返回但在调用方法本地的对象仍然可以进行堆栈分配。因此,如果你这样做,迭代器几乎总是可以堆栈分配。 for(Foo item:myList){...} (并且 myList.iterator()的实现很简单足够,他们通常是。)

The first important thing to know is that escape analysis is performed after inlining. This means that Hotspot's escape analysis is in this respect more powerful than the description in the Choi paper, since an object returned from a method but local to the caller method can still be stack allocated. Because of this iterators can nearly always be stack allocated if you do e.g. for(Foo item : myList) {...} (and the implementation of myList.iterator() is simple enough, which they usually are.)

Hotspot只有在确定方法热时才编译优化版本的方法,所以代码不会运行很多次根本没有得到优化,在这种情况下,没有堆栈分配或内联。但对于那些你通常不关心的方法。

Hotspot only compiles optimized versions of methods once it determines the method is 'hot', so code that is not run a lot of times does not get optimized at all, in which case there is no stack allocation or inlining whatsoever. But for those methods you usually don't care.

内联决策基于分析数据Hotspot首先收集。声明的类型并不重要,即使方法是虚拟的,Hotspot也可以根据它在分析期间看到的对象类型来内联它。类似的东西适用于分支(即if语句和其他控制流构造):如果在分析期间Hotspot从未看到某个分支被采用,它将基于从不采用分支的假设来编译和优化代码。在这两种情况下,如果Hotspot无法证明其假设始终为真,那么它将在已编译的代码中插入检查,称为不常见的陷阱,如果遇到此类陷阱,Hotspot将进行去优化并可能重新优化考虑到新信息。

Inlining decisions are based on profiling data that Hotspot collects first. The declared types do not matter so much, even if a method is virtual Hotspot can inline it based on the types of the objects it sees during profiling. Something similar holds for branches (i.e. if-statements and other control flow constructs): If during profiling Hotspot never sees a certain branch being taken, it will compile and optimize the code based on the assumption that the branch is never taken. In both cases, if Hotspot cannot prove that its assumptions will always be true, it will insert checks in the compiled code known as 'uncommon traps', and if such a trap is hit Hotspot will de-optimize and possibly re-optimize taking the new information into account.

Hotspot将分析哪些对象类型作为呼叫站点的接收者。如果Hotspot在调用站点只看到一种类型或只有两种不同的类型,则它能够内联被调用的方法。如果只有一个或两个非常常见的类型,并且其他类型的出现频率低得多,Hotspot还应该能够内联常见类型的方法,包括检查它需要采取哪些代码。 (我不完全确定最后一种情况,有一两种常见类型和更多不常见的类型)。如果有两种以上的常见类型,Hotspot根本不会内联调用,而是生成间接调用的机器代码。

Hotspot will profile which object types occur as receivers at which call sites. If Hotspot only sees a single type or only two distinct types occuring at a call site, it is able to inline the called method. If there are only one or two very common types and other types occur much less often Hotspot should also still be able to inline the methods of the common types, including a check for which code it needs to take. (I'm not entirely sure about this last case with one or two common types and more uncommon types though). If there are more than two common types, Hotspot will not inline the call at all but instead generate machine code for an indirect call.

'Type'这里指的是确切的对象的类型。不考虑已实现的接口或共享超类。即使在调用站点出现不同的接收器类型,但它们都继承了方法的相同实现(例如,所有从 Object <继承 hashCode 的多个类/ code>),Hotspot仍会生成间接调用而不是内联调用。 (所以i.m.o.在这种情况下,热点是非常愚蠢的。我希望未来的版本可以改善这一点。)

'Type' here refers to the exact type of an object. Implemented interfaces or shared superclasses are not taken into account. Even if different receiver types occur at a call site but they all inherit the same implementation of a method (e.g. multiple classes that all inherit hashCode from Object), Hotspot will still generate an indirect call and not inline. (So i.m.o. hotspot is quite stupid in such cases. I hope future versions improve this.)

Hotspot也只会内联不太大的方法。 不太大由 -XX:MaxInlineSize = n -XX:FreqInlineSize = n 选项决定。 JVM字节码大小低于MaxInlineSize的Inlinable方法总是内联的,如果调用热,则内联JVM字节码大小低于FreqInlineSize的方法。更大的方法永远不会内联。默认情况下,MaxInlineSize是35,而FreqInlineSize是平台相关的,但对我来说它是325.所以如果你想让它们内联,请确保你的方法不是太大。它有时可以帮助从大方法中拆分出公共路径,以便可以将其内联到其调用者中。

Hotspot will also only inline methods that are not too big. 'Not too big' is determined by the -XX:MaxInlineSize=n and -XX:FreqInlineSize=n options. Inlinable methods with a JVM bytecode size below MaxInlineSize are always inlined, methods with a JVM bytecode size below FreqInlineSize are inlined if the call is 'hot'. Larger methods are never inlined. By default MaxInlineSize is 35 and FreqInlineSize is platform dependent but for me it is 325. So make sure your methods are not too big if you want them inlined. It can sometimes help to split out the common path from a large method, so that it can be inlined into its callers.

关于性能分析的一个重要事项是,性能分析站点基于JVM字节码,它本身不以任何方式内联。所以如果你有例如静态方法

One important thing to know about profiling is that profiling sites are based on the JVM bytecode, which itself is not inlined in any way. So if you have e.g. a static method

static <T,U> List<U> map(List<T> list, Function<T,U> func) {
    List<U> result = new ArrayList();
    for(T item : list) { result.add(func.call(item)); }
    return result; 
}

映射SAM 功能在列表上可调用并返回转换后的列表,Hotspot会将对 func.call 的调用视为单个程序范围的调用站点。您可以在程序中的多个位置调用此映射函数,在每个调用站点传递不同的函数(但对于一个调用站点则相同)。在这种情况下,您可能希望Hotspot能够内联 map ,然后还可以调用 func.call 每次使用 map 时,只有一个 func 类型。如果是这样的话,Hotspot将能够非常紧密地优化循环。不幸的是,Hotspot对此并不够聪明。它只为 func.call 调用站点保留一个配置文件,将所有传递给< func 的类型归为地图在一起。您可能会使用两个以上的 func 的不同实现,因此Hotspot将无法内联对 func.call的调用链接了解更多详情,以及< a href =http://web.archive.org/web/20161225133928/http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem =noreferrer >归档链接原来似乎已消失。

that maps a SAM Function callable over a list and returns the transformed list, Hotspot will treat the call to func.call as a single program-wide call site. You might call this map function at several spots in your program, passing a different func in at each call site (but the same one for one call site). In that case you might expect that Hotspot is able to inline map, and then also the call to func.call since at every use of map there is only a single func type. If this were so, Hotspot would be able to optimize the loop down very tightly. Unfortunately Hotspot is not smart enough for that. It only keeps a single profile for the func.call call site, lumping all the func types that you pass to map together. You will probably use more than two different implementations of func, so Hotspot will not be able to inline the call to func.call. Link for more details, and archived link as the original appears to be gone.

(另外,在 Kotlin 等效循环可以完全内联,因为Kotlin编译器可以在字节码级别进行内联调用。因此,对于某些用途,它可能比Java快得多。)

(As an aside, in Kotlin the equivalent loop can be fully inlined as the Kotlin compiler can do inlining of calls at the bytecode level. So for some uses it could be significantly faster than Java.)

另一个重要的事情是Hotspot实际上并没有实现对象的堆栈分配。相反,它实现标量替换,这意味着对象被解构为其组成字段,并且这些字段是像普通局部变量一样分配的堆栈。这意味着根本没有任何物体。标量替换仅在从不需要创建指向堆栈分配对象的指针时才有效。某些形式的堆栈分配在例如C ++或Go将能够在堆栈上分配完整的对象,然后将引用或指针传递给它们到被调用的函数,但在Hotspot中这不起作用。因此,如果需要将对象引用传递给非内联方法,即使引用不会转义被调用的方法,Hotspot也将始终堆分配这样的对象。

Another important thing to know is that Hotspot does not actually implement stack allocation of objects. Instead it implements scalar replacement, which means that an object is deconstructed into its constituent fields and those fields are stack allocated like normal local variables. This means that there is no object left at all. Scalar replacement only works if there is never a need to create a pointer to the stack-allocated object. Some forms of stack allocation in e.g. C++ or Go would be able to allocate full objects on the stack and then pass references or pointers to them to called functions, but in Hotspot this does not work. Therefore if there is ever a need to pass an object reference to a non-inlined method, even if the reference would not escape the called method, Hotspot will always heap-allocate such an object.

原则上,Hotspot可能更聪明,但现在却不是。

In principle Hotspot could be smarter about this, but right now it is not.

我使用以下程序和变体来查看Hotspot何时进行标量替换。

I used the following program and variations to see when Hotspot will do scalar replacement.

// Minimal example for which the JVM does not scalarize the allocation. If field is final, or the second allocation is unconditional, it will.

class Scalarization {

        int field = 0xbd;
        long foo(long i) { return i * field; }


        public static void main(String[] args) {
                long result = 0;
                for(long i=0; i<100; i++) {
                        result += test();
                }
                System.out.println("Result: "+result);
        }


        static long test() {
                long ctr = 0x5;
                for(long i=0; i<0x10000; i++) {

                Scalarization s = new Scalarization();
                ctr = s.foo(ctr);
                if(i == 0) s = new Scalarization();
                ctr = s.foo(ctr);
                }
                return ctr;
        }
}

如果使用<$ c编译并运行此程序$ c> javac Scalarization.java; java -verbose:gc Scalarization 你可以看到标量替换是否符合垃圾收集的数量。如果标量替换工作,我的系统上没有垃圾收集,如果标量替换不起作用,我会看到一些垃圾收集。

If you compile and run this program with javac Scalarization.java; java -verbose:gc Scalarization you can see if scalar replacement worked by the number of garbage collections. If scalar replacement works, no garbage collection happened on my system, if scalar replacement did not work I see a few garbage collections.

Hotspot能够显着运行的变量显着比没有它的版本更快。我验证了生成的机器代码(说明)以确保Hotspot没有执行任何操作意外的优化。如果热点能够标量替换分配,那么它还可以在循环上进行一些额外的优化,展开几次迭代然后将这些迭代组合在一起。因此,在scalarized版本中,每个迭代器执行多个源代码级迭代的工作时,有效循环计数较低。所以速度差异不仅仅是由于分配和垃圾收集开销。

Variants that Hotspot is able to scalarize run significantly faster than versions where it does not. I verified the generated machine code (instructions) to make sure Hotspot was not doing any unexpected optimizations. If hotspot is able to scalar replace the allocations, it can then also do some additional optimizations on the loop, unrolling it a few iterations and then combining those iterations together. So in the scalarized versions the effective loop count is lower with each iteraton doing the work of multiple source code level iterations. So the speed difference is not only due to allocation and garbage collection overhead.

我试过一个号码上述计划的变化。标量替换的一个条件是永远不能将对象分配给对象(或静态)字段,并且可能也不会分配给数组。所以在代码中

I tried a number of variations on the above program. One condition for scalar replacement is that the object must never be assigned to an object (or static) field, and presumably also not into an array. So in code like

Foo f = new Foo();
bar.field = foo;

Foo 对象不能被标量替换。即使 bar 本身被标量替换,也是如此,如果你再也不使用 bar.field 。因此,只能将对象分配给局部变量。

the Foo object cannot be scalar replaced. This holds even if bar itself is scalar replaced, and also if you never again use bar.field. So an object can only ever be assigned to local variables.

仅凭这一点还不够,Hotspot还必须能够在JIT时间静态地确定哪个对象实例将成为呼叫的目标。例如,使用以下 foo test 的实现并删除字段导致堆分配:

That alone is not enough, Hotspot must also be able to determine statically at JIT-time which object instance will be the target of a call. For example, using the following implementations of foo and test and removing field causes heap allocation:

long foo(long i) { return i * 0xbb; }

static long test() {
    long ctr = 0x5;
    for(long i=0; i<0x10000; i++) {
        Scalarization s = new Scalarization();
        ctr = s.foo(ctr);
        if(i == 50) s = new Scalarization();
        ctr = s.foo(ctr);
    }
    return ctr;
}

如果然后删除第二个赋值的条件,则不再发生堆分配:

While if you then remove the conditional for the second assignment no more heap allocation occurs:

static long test() {
    long ctr = 0x5;
    for(long i=0; i<0x10000; i++) {
        Scalarization s = new Scalarization();
        ctr = s.foo(ctr);
        s = new Scalarization();
        ctr = s.foo(ctr);
    }
    return ctr;
}

在这种情况下,Hotspot可以静态地确定哪个实例是每次调用的目标 s.foo

In this case Hotspot can determine statically which instance is the target for each call to s.foo.

另一方面,即使第二次转让给 s Scalarization 的子类,具有完全不同的实现,只要赋值是无条件的,Hotspot仍然会分配分配。

On the other hand, even if the second assignment to s is a subclass of Scalarization with a completely different implementation, as long as the assignment is unconditional Hotspot will still scalarize the allocations.

Hotspot似乎无法将对象移动到之前被标量替换的堆中(至少在没有去优化的情况下)。标量替换是一种全有或全无的事情。因此,在原始的测试方法中, Scalarization 的所有分配总是发生在堆上。

Hotspot does not appear to be able to move an object to the heap that was previously scalar replaced (at least not without deoptimizing). Scalar replacement is an all-or-nothing affair. So in the original test method both allocations of Scalarization always happen on the heap.

一个重要的细节是Hotspot将根据其分析数据预测条件。如果从不执行条件赋值,Hotspot将根据该假设编译代码,然后可能能够进行标量替换。如果在稍后的某个时间点确实采取了条件,Hotspot将需要使用这个新假设重新编译代码。由于Hotspot无法再静态地确定以下调用的接收器实例,因此新代码不会进行标量替换。

One important detail is that Hotspot will predict conditionals based on its profiling data. If a conditional assignment is never executed, Hotspot will compile code under that assumption, and then might be able to do scalar replacement. If at a later point in time the condtion does get taken, Hotspot will need to recompile the code with this new assumption. The new code will not do scalar replacement since Hotspot can no longer determine the receiver instance of following calls statically.

例如,在测试的此变体中

static long limit = 0;

static long test() {
    long ctr = 0x5;
    long i = limit;
    limit += 0x10000;
    for(; i<limit; i++) { // In this form if scalarization happens is nondeterministic: if the condition is hit before profiling starts scalarization happens, else not.

        Scalarization s = new Scalarization();
        ctr = s.foo(ctr);
        if(i == 0xf9a0) s = new Scalarization();
        ctr = s.foo(ctr);
    }
    return ctr;
}

条件指令仅在程序生命周期内执行一次。如果此分配发生得足够早,在Hotspot开始对 test 方法进行完整分析之前,Hotspot从不会注意到所采用的条件并编译执行标量替换的代码。如果在采取条件时已经开始进行性能分析,Hotspot将不会进行标量替换。使用 0xf9a0 的测试值,标量替换是否发生在我的计算机上是不确定的,因为完全在分析开始时可能会有所不同(例如,因为分析和优化的代码是在后台线程上编译的) 。因此,如果我运行上面的变体,它有时会做一些垃圾收集,有时则不会。

the conditional assignemnt is only executed once during the lifetime of the program. If this assignment occurs early enough, before Hotspot starts full profiling of the test method, Hotspot never notices the conditional being taken and compiles code that does scalar replacement. If profiling has already started when the conditional is taken, Hotspot will not do scalar replacement. With the test value of 0xf9a0, whether scalar replacement happens is nondeterministic on my computer, since exactly when profiling starts can vary (e.g. because profiling and optimized code is compiled on background threads). So if I run the above variant it sometimes does a few garbage collections, and sometimes does not.

Hotspot的静态代码分析比C / C ++和其他更加有限。静态编译器可以这样做,因此Hotspot在通过几个条件和其他控制结构来跟踪方法中的控制流以确定变量引用的实例时并不聪明,即使它对于程序员或更智能的编译器是静态可确定的。在许多情况下,分析信息将弥补这一点,但需要注意的事项。

Hotspot's static code analysis is much more limited than what C/C++ and other static compilers can do, so Hotspot is not as smart in following the control flow in a method through several conditionals and other control structures to determine the instance that a variable refers to, even if it would be statically determinable for the programmer or a smarter compiler. In many cases the profiling information will make up for that, but it is something to be aware of.

如果在JIT时间知道它们的大小,则可以分配堆栈。但是,除非Hotspot还能在JIT时间静态地确定索引值,否则不支持索引到数组中。所以堆栈分配的数组是没用的。由于大多数程序不直接使用数组而是使用标准集合,因此这不是非常相关,因为嵌入对象(例如包含ArrayList中的数据的数组)由于其嵌入式而需要进行堆分配。我认为这种限制的原因是对局部变量不存在索引操作,因此这需要额外的代码生成功能以用于非常罕见的用例。

Arrays can be stack allocated if their size is known at JIT time. However indexing into an array is not supported unless Hotspot can also statically determine the index value at JIT-time. So stack allocated arrays are pretty useless. Since most programs don't use arrays directly but use the standard collections this is not very relevant, as embedded objects such as the array containing the data within an ArrayList already need to be heap-allocated due to their embedded-ness. I suppose the reasoning for this restriction is that there exists no indexing operation on local variables so this would require additional code generation functionality for a pretty rare use case.

这篇关于什么时候Hotspot可以在堆栈上分配对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆