Java 8流对象占用大量内存 [英] Java 8 stream objects significant memory usage

查看:247
本文介绍了Java 8流对象占用大量内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在查看某些性能分析结果时,我注意到在紧密循环(而不是另一个嵌套循环)中使用流会导致类型为java.util.stream.ReferencePipelinejava.util.ArrayList$ArrayListSpliterator的对象的大量内存开销.我将有问题的流转换为foreach循环,并且显着减少了内存消耗.

In looking at some profiling results, I noticed that using streams within a tight loop (used instead of another nested loop) incurred a significant memory overhead of objects of types java.util.stream.ReferencePipeline and java.util.ArrayList$ArrayListSpliterator. I converted the offending streams to foreach loops, and the memory consumption decreased significantly.

我知道流不能保证比普通循环做得更好,但是我的印象是这种差别可以忽略不计.在这种情况下,好像增加了40%.

I know that streams make no promises about performing any better than ordinary loops, but I was under the impression that the difference would be negligible. In this case it seemed like it was a 40% increase.

这是我为隔离问题而编写的测试类.我用JFR监视了内存消耗和对象分配:

Here is the test class I wrote to isolate the problem. I monitored memory consumption and object allocation with JFR:

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.Random;
import java.util.function.Predicate;

public class StreamMemoryTest {

    private static boolean blackHole = false;

    public static List<Integer> getRandListOfSize(int size) {
        ArrayList<Integer> randList = new ArrayList<>(size);
        Random rnGen = new Random();
        for (int i = 0; i < size; i++) {
            randList.add(rnGen.nextInt(100));
        }
        return randList;
    }

    public static boolean getIndexOfNothingManualImpl(List<Integer> nums, Predicate<Integer> predicate) {

        for (Integer num : nums) {
            // Impossible condition
            if (predicate.test(num)) {
                return true;
            }
        }
        return false;
    }

    public static boolean getIndexOfNothingStreamImpl(List<Integer> nums, Predicate<Integer> predicate) {
        Optional<Integer> first = nums.stream().filter(predicate).findFirst();
        return first.isPresent();
    }

    public static void consume(boolean value) {
        blackHole = blackHole && value;
    }

    public static boolean result() {
        return blackHole;
    }

    public static void main(String[] args) {
        // 100 million trials
        int numTrials = 100000000;
        System.out.println("Beginning test");
        for (int i = 0; i < numTrials; i++) {
            List<Integer> randomNums = StreamMemoryTest.getRandListOfSize(100);
            consume(StreamMemoryTest.getIndexOfNothingStreamImpl(randomNums, x -> x < 0));
            // or ...
            // consume(StreamMemoryTest.getIndexOfNothingManualImpl(randomNums, x -> x < 0));
            if (randomNums == null) {
                break;
            }
        }
        System.out.print(StreamMemoryTest.result());
    }
}

流实施:

Memory Allocated for TLABs 64.62 GB

Class   Average Object Size(bytes)  Total Object Size(bytes)    TLABs   Average TLAB Size(bytes)    Total TLAB Size(bytes)  Pressure(%)
java.lang.Object[]                          415.974 6,226,712   14,969  2,999,696.432   44,902,455,888  64.711
java.util.stream.ReferencePipeline$2        64      131,264     2,051   2,902,510.795   5,953,049,640   8.579
java.util.stream.ReferencePipeline$Head     56      72,744      1,299   3,070,768.043   3,988,927,688   5.749
java.util.stream.ReferencePipeline$2$1      24      25,128      1,047   3,195,726.449   3,345,925,592   4.822
java.util.Random                            32      30,976      968     3,041,212.372   2,943,893,576   4.243
java.util.ArrayList                         24      24,576      1,024   2,720,615.594   2,785,910,368   4.015
java.util.stream.FindOps$FindSink$OfRef     24      18,864      786     3,369,412.295   2,648,358,064   3.817
java.util.ArrayList$ArrayListSpliterator    32      14,720      460     3,080,696.209   1,417,120,256   2.042

手动实施:

Memory Allocated for TLABs 46.06 GB

Class   Average Object Size(bytes)  Total Object Size(bytes)    TLABs   Average TLAB Size(bytes)    Total TLAB Size(bytes)  Pressure(%)
java.lang.Object[]      415.961     4,190,392       10,074      4,042,267.769       40,721,805,504  82.33
java.util.Random        32          32,064          1,002       4,367,131.521       4,375,865,784   8.847
java.util.ArrayList     24          14,976          624         3,530,601.038       2,203,095,048   4.454

是否还有其他人遇到流对象本身消耗内存的问题? /这是一个已知问题吗?

Has anyone else encountered issues with the stream objects themselves consuming memory? / Is this a known issue?

推荐答案

尽管实验设置有些问题,但使用Stream API确实可以分配更多的内存.我从未使用过JFR,但是我在JOL上的发现与您的发现非常相似.

Using Stream API you indeed allocate more memory, though your experimental setup is somewhat questionable. I've never used JFR, but my findings using JOL are quite similar to yours.

请注意,您不仅要测量在ArrayList查询期间分配的堆,而且还要在其创建和填充期间进行测量.单个ArrayList的分配和填充期间的分配应如下所示(64位,压缩的OOP,通过

Note that you measure not only the heap allocated during the ArrayList querying, but also during its creation and population. The allocation during the allocation and population of single ArrayList should look like this (64bits, compressed OOPs, via JOL):

 COUNT       AVG       SUM   DESCRIPTION
     1       416       416   [Ljava.lang.Object;
     1        24        24   java.util.ArrayList
     1        32        32   java.util.Random
     1        24        24   java.util.concurrent.atomic.AtomicLong
     4                 496   (total)

因此分配的最大内存是ArrayList内部用于存储数据的Object[]数组. AtomicLong是Random类实现的一部分.如果执行此100_000_000次,则在两个测试中至少应分配496*10^8/2^30 = 46.2 Gb.不过,可以跳过此部分,因为这两个测试应该相同.

So the most memory allocated is the Object[] array used inside ArrayList to store the data. AtomicLong is a part of Random class implementation. If you perform this 100_000_000 times, then you should have at least 496*10^8/2^30 = 46.2 Gb allocated in both tests. Nevertheless this part could be skipped as it should be identical for both tests.

另一个有趣的事情是内联. JIT很聪明,可以内联整个getIndexOfNothingManualImpl(通过java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining StreamMemoryTest):

Another interesting thing here is inlining. JIT is smart enough to inline the whole getIndexOfNothingManualImpl (via java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining StreamMemoryTest):

  StreamMemoryTest::main @ 13 (59 bytes)
     ...
     @ 30   StreamMemoryTest::getIndexOfNothingManualImpl (43 bytes)   inline (hot)
       @ 1   java.util.ArrayList::iterator (10 bytes)   inline (hot)
        \-> TypeProfile (2132/2132 counts) = java/util/ArrayList
         @ 6   java.util.ArrayList$Itr::<init> (6 bytes)   inline (hot)
           @ 2   java.util.ArrayList$Itr::<init> (26 bytes)   inline (hot)
             @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
       @ 8   java.util.ArrayList$Itr::hasNext (20 bytes)   inline (hot)
        \-> TypeProfile (215332/215332 counts) = java/util/ArrayList$Itr
         @ 8   java.util.ArrayList::access$100 (5 bytes)   accessor
       @ 17   java.util.ArrayList$Itr::next (66 bytes)   inline (hot)
         @ 1   java.util.ArrayList$Itr::checkForComodification (23 bytes)   inline (hot)
         @ 14   java.util.ArrayList::access$100 (5 bytes)   accessor
       @ 28   StreamMemoryTest$$Lambda$1/791452441::test (8 bytes)   inline (hot)
        \-> TypeProfile (213200/213200 counts) = StreamMemoryTest$$Lambda$1
         @ 4   StreamMemoryTest::lambda$main$0 (13 bytes)   inline (hot)
           @ 1   java.lang.Integer::intValue (5 bytes)   accessor
       @ 8   java.util.ArrayList$Itr::hasNext (20 bytes)   inline (hot)
         @ 8   java.util.ArrayList::access$100 (5 bytes)   accessor
     @ 33   StreamMemoryTest::consume (19 bytes)   inline (hot)

反汇编实际上表明在预热之后没有执行迭代器分配.由于转义分析成功地告诉JIT迭代器对象不会转义,因此仅对其进行了标量.如果Iterator实际分配了它,则将另外占用32个字节:

Disassembly actually shows that no allocation of iterator is performed after warm-up. Because escape analysis successfully tells JIT that iterator object does not escape, it's simply scalarized. Were the Iterator actually allocated it would take additionally 32 bytes:

 COUNT       AVG       SUM   DESCRIPTION
     1        32        32   java.util.ArrayList$Itr
     1                  32   (total)

请注意,JIT也可以完全消除迭代.默认情况下,您的blackhole是false,因此无论value做什么,blackhole = blackhole && value都不会更改它,并且value计算可以完全排除在外,因为它没有任何副作用.我不确定它是否真的做到了(对我而言,阅读反汇编非常困难),但这是可能的.

Note that JIT could also remove iteration at all. Your blackhole is false by default, so doing blackhole = blackhole && value does not change it regardless of the value, and value calculation could be excluded at all, as it does not have any side effects. I'm not sure whether it actually did this (reading disassembly is quite hard for me), but it's possible.

尽管getIndexOfNothingStreamImpl似乎也可以内联所有内容,但是由于流API中有太多相互依赖的对象,因此转义分析失败,因此会发生实际分配.因此,它确实增加了五个其他对象(该表由JOL输出手动组成):

However while getIndexOfNothingStreamImpl also seems to inline everything inside, escape analysis fails as there are too many interdependent objects inside the stream API, so actual allocations occur. Thus it really adds five additional objects (the table is composed manually from JOL outputs):

 COUNT       AVG       SUM   DESCRIPTION
     1        32        32   java.util.ArrayList$ArrayListSpliterator
     1        24        24   java.util.stream.FindOps$FindSink$OfRef
     1        64        64   java.util.stream.ReferencePipeline$2
     1        24        24   java.util.stream.ReferencePipeline$2$1
     1        56        56   java.util.stream.ReferencePipeline$Head
     5                 200   (total)

因此,对此特定流的每次调用实际上都会分配200个额外的字节.在执行100_000_000次迭代时,总Stream版本应比手动版本分配10 ^ 8 * 200/2 ^ 30 = 18.62Gb,这与您的结果很接近.我认为Random中的AtomicLong也已标量,但是在预热迭代期间同时存在IteratorAtomicLong(直到JIT实际上创建了最优化的版本).这将解释数字中的细微差异.

So every invocation of this particular stream actually allocates 200 additional bytes. As you perform 100_000_000 iterations, in total Stream version should allocate 10^8*200/2^30 = 18.62Gb more than manual version which is close to your result. I think, AtomicLong inside Random is scalarized as well, but both Iterator and AtomicLong are present during the warmup iterations (until JIT actually creates the most optimized version). This would explain the minor discrepancies in the numbers.

此额外的200字节分配不取决于流大小,而是取决于中间流操作的数量(尤其是,每个附加过滤步骤将增加64 + 24 = 88字节).但是请注意,这些对象通常是短暂的,可以快速分配,并且可以由次要GC收集.在大多数实际应用中,您可能不必担心这一点.

This additional 200 bytes allocation does not depend on the stream size, but depends on the number of intermediate stream operations (in particular, every additional filter step would add 64+24=88 bytes more). However note that these objects are usually short-lived, allocated quickly and can be collected by minor GC. In most of real-life applications you probably should not have to worry about this.

这篇关于Java 8流对象占用大量内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆