Java 8流对象占用大量内存 [英] Java 8 stream objects significant memory usage
问题描述
在查看某些性能分析结果时,我注意到在紧密循环(而不是另一个嵌套循环)中使用流会导致类型为java.util.stream.ReferencePipeline
和java.util.ArrayList$ArrayListSpliterator
的对象的大量内存开销.我将有问题的流转换为foreach循环,并且显着减少了内存消耗.
In looking at some profiling results, I noticed that using streams within a tight loop (used instead of another nested loop) incurred a significant memory overhead of objects of types java.util.stream.ReferencePipeline
and java.util.ArrayList$ArrayListSpliterator
. I converted the offending streams to foreach loops, and the memory consumption decreased significantly.
我知道流不能保证比普通循环做得更好,但是我的印象是这种差别可以忽略不计.在这种情况下,好像增加了40%.
I know that streams make no promises about performing any better than ordinary loops, but I was under the impression that the difference would be negligible. In this case it seemed like it was a 40% increase.
这是我为隔离问题而编写的测试类.我用JFR监视了内存消耗和对象分配:
Here is the test class I wrote to isolate the problem. I monitored memory consumption and object allocation with JFR:
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.Random;
import java.util.function.Predicate;
public class StreamMemoryTest {
private static boolean blackHole = false;
public static List<Integer> getRandListOfSize(int size) {
ArrayList<Integer> randList = new ArrayList<>(size);
Random rnGen = new Random();
for (int i = 0; i < size; i++) {
randList.add(rnGen.nextInt(100));
}
return randList;
}
public static boolean getIndexOfNothingManualImpl(List<Integer> nums, Predicate<Integer> predicate) {
for (Integer num : nums) {
// Impossible condition
if (predicate.test(num)) {
return true;
}
}
return false;
}
public static boolean getIndexOfNothingStreamImpl(List<Integer> nums, Predicate<Integer> predicate) {
Optional<Integer> first = nums.stream().filter(predicate).findFirst();
return first.isPresent();
}
public static void consume(boolean value) {
blackHole = blackHole && value;
}
public static boolean result() {
return blackHole;
}
public static void main(String[] args) {
// 100 million trials
int numTrials = 100000000;
System.out.println("Beginning test");
for (int i = 0; i < numTrials; i++) {
List<Integer> randomNums = StreamMemoryTest.getRandListOfSize(100);
consume(StreamMemoryTest.getIndexOfNothingStreamImpl(randomNums, x -> x < 0));
// or ...
// consume(StreamMemoryTest.getIndexOfNothingManualImpl(randomNums, x -> x < 0));
if (randomNums == null) {
break;
}
}
System.out.print(StreamMemoryTest.result());
}
}
流实施:
Memory Allocated for TLABs 64.62 GB
Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB Size(bytes) Total TLAB Size(bytes) Pressure(%)
java.lang.Object[] 415.974 6,226,712 14,969 2,999,696.432 44,902,455,888 64.711
java.util.stream.ReferencePipeline$2 64 131,264 2,051 2,902,510.795 5,953,049,640 8.579
java.util.stream.ReferencePipeline$Head 56 72,744 1,299 3,070,768.043 3,988,927,688 5.749
java.util.stream.ReferencePipeline$2$1 24 25,128 1,047 3,195,726.449 3,345,925,592 4.822
java.util.Random 32 30,976 968 3,041,212.372 2,943,893,576 4.243
java.util.ArrayList 24 24,576 1,024 2,720,615.594 2,785,910,368 4.015
java.util.stream.FindOps$FindSink$OfRef 24 18,864 786 3,369,412.295 2,648,358,064 3.817
java.util.ArrayList$ArrayListSpliterator 32 14,720 460 3,080,696.209 1,417,120,256 2.042
手动实施:
Memory Allocated for TLABs 46.06 GB
Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB Size(bytes) Total TLAB Size(bytes) Pressure(%)
java.lang.Object[] 415.961 4,190,392 10,074 4,042,267.769 40,721,805,504 82.33
java.util.Random 32 32,064 1,002 4,367,131.521 4,375,865,784 8.847
java.util.ArrayList 24 14,976 624 3,530,601.038 2,203,095,048 4.454
是否还有其他人遇到流对象本身消耗内存的问题? /这是一个已知问题吗?
Has anyone else encountered issues with the stream objects themselves consuming memory? / Is this a known issue?
推荐答案
尽管实验设置有些问题,但使用Stream API确实可以分配更多的内存.我从未使用过JFR,但是我在JOL上的发现与您的发现非常相似.
Using Stream API you indeed allocate more memory, though your experimental setup is somewhat questionable. I've never used JFR, but my findings using JOL are quite similar to yours.
请注意,您不仅要测量在ArrayList
查询期间分配的堆,而且还要在其创建和填充期间进行测量.单个ArrayList
的分配和填充期间的分配应如下所示(64位,压缩的OOP,通过
Note that you measure not only the heap allocated during the ArrayList
querying, but also during its creation and population. The allocation during the allocation and population of single ArrayList
should look like this (64bits, compressed OOPs, via JOL):
COUNT AVG SUM DESCRIPTION
1 416 416 [Ljava.lang.Object;
1 24 24 java.util.ArrayList
1 32 32 java.util.Random
1 24 24 java.util.concurrent.atomic.AtomicLong
4 496 (total)
因此分配的最大内存是ArrayList
内部用于存储数据的Object[]
数组. AtomicLong
是Random类实现的一部分.如果执行此100_000_000次,则在两个测试中至少应分配496*10^8/2^30 = 46.2 Gb
.不过,可以跳过此部分,因为这两个测试应该相同.
So the most memory allocated is the Object[]
array used inside ArrayList
to store the data. AtomicLong
is a part of Random class implementation. If you perform this 100_000_000 times, then you should have at least 496*10^8/2^30 = 46.2 Gb
allocated in both tests. Nevertheless this part could be skipped as it should be identical for both tests.
另一个有趣的事情是内联. JIT很聪明,可以内联整个getIndexOfNothingManualImpl
(通过java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining StreamMemoryTest
):
Another interesting thing here is inlining. JIT is smart enough to inline the whole getIndexOfNothingManualImpl
(via java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining StreamMemoryTest
):
StreamMemoryTest::main @ 13 (59 bytes)
...
@ 30 StreamMemoryTest::getIndexOfNothingManualImpl (43 bytes) inline (hot)
@ 1 java.util.ArrayList::iterator (10 bytes) inline (hot)
\-> TypeProfile (2132/2132 counts) = java/util/ArrayList
@ 6 java.util.ArrayList$Itr::<init> (6 bytes) inline (hot)
@ 2 java.util.ArrayList$Itr::<init> (26 bytes) inline (hot)
@ 6 java.lang.Object::<init> (1 bytes) inline (hot)
@ 8 java.util.ArrayList$Itr::hasNext (20 bytes) inline (hot)
\-> TypeProfile (215332/215332 counts) = java/util/ArrayList$Itr
@ 8 java.util.ArrayList::access$100 (5 bytes) accessor
@ 17 java.util.ArrayList$Itr::next (66 bytes) inline (hot)
@ 1 java.util.ArrayList$Itr::checkForComodification (23 bytes) inline (hot)
@ 14 java.util.ArrayList::access$100 (5 bytes) accessor
@ 28 StreamMemoryTest$$Lambda$1/791452441::test (8 bytes) inline (hot)
\-> TypeProfile (213200/213200 counts) = StreamMemoryTest$$Lambda$1
@ 4 StreamMemoryTest::lambda$main$0 (13 bytes) inline (hot)
@ 1 java.lang.Integer::intValue (5 bytes) accessor
@ 8 java.util.ArrayList$Itr::hasNext (20 bytes) inline (hot)
@ 8 java.util.ArrayList::access$100 (5 bytes) accessor
@ 33 StreamMemoryTest::consume (19 bytes) inline (hot)
反汇编实际上表明在预热之后没有执行迭代器分配.由于转义分析成功地告诉JIT迭代器对象不会转义,因此仅对其进行了标量.如果Iterator
实际分配了它,则将另外占用32个字节:
Disassembly actually shows that no allocation of iterator is performed after warm-up. Because escape analysis successfully tells JIT that iterator object does not escape, it's simply scalarized. Were the Iterator
actually allocated it would take additionally 32 bytes:
COUNT AVG SUM DESCRIPTION
1 32 32 java.util.ArrayList$Itr
1 32 (total)
请注意,JIT也可以完全消除迭代.默认情况下,您的blackhole
是false,因此无论value
做什么,blackhole = blackhole && value
都不会更改它,并且value
计算可以完全排除在外,因为它没有任何副作用.我不确定它是否真的做到了(对我而言,阅读反汇编非常困难),但这是可能的.
Note that JIT could also remove iteration at all. Your blackhole
is false by default, so doing blackhole = blackhole && value
does not change it regardless of the value
, and value
calculation could be excluded at all, as it does not have any side effects. I'm not sure whether it actually did this (reading disassembly is quite hard for me), but it's possible.
尽管getIndexOfNothingStreamImpl
似乎也可以内联所有内容,但是由于流API中有太多相互依赖的对象,因此转义分析失败,因此会发生实际分配.因此,它确实增加了五个其他对象(该表由JOL输出手动组成):
However while getIndexOfNothingStreamImpl
also seems to inline everything inside, escape analysis fails as there are too many interdependent objects inside the stream API, so actual allocations occur. Thus it really adds five additional objects (the table is composed manually from JOL outputs):
COUNT AVG SUM DESCRIPTION
1 32 32 java.util.ArrayList$ArrayListSpliterator
1 24 24 java.util.stream.FindOps$FindSink$OfRef
1 64 64 java.util.stream.ReferencePipeline$2
1 24 24 java.util.stream.ReferencePipeline$2$1
1 56 56 java.util.stream.ReferencePipeline$Head
5 200 (total)
因此,对此特定流的每次调用实际上都会分配200个额外的字节.在执行100_000_000次迭代时,总Stream版本应比手动版本分配10 ^ 8 * 200/2 ^ 30 = 18.62Gb,这与您的结果很接近.我认为Random
中的AtomicLong
也已标量,但是在预热迭代期间同时存在Iterator
和AtomicLong
(直到JIT实际上创建了最优化的版本).这将解释数字中的细微差异.
So every invocation of this particular stream actually allocates 200 additional bytes. As you perform 100_000_000 iterations, in total Stream version should allocate 10^8*200/2^30 = 18.62Gb more than manual version which is close to your result. I think, AtomicLong
inside Random
is scalarized as well, but both Iterator
and AtomicLong
are present during the warmup iterations (until JIT actually creates the most optimized version). This would explain the minor discrepancies in the numbers.
此额外的200字节分配不取决于流大小,而是取决于中间流操作的数量(尤其是,每个附加过滤步骤将增加64 + 24 = 88字节).但是请注意,这些对象通常是短暂的,可以快速分配,并且可以由次要GC收集.在大多数实际应用中,您可能不必担心这一点.
This additional 200 bytes allocation does not depend on the stream size, but depends on the number of intermediate stream operations (in particular, every additional filter step would add 64+24=88 bytes more). However note that these objects are usually short-lived, allocated quickly and can be collected by minor GC. In most of real-life applications you probably should not have to worry about this.
这篇关于Java 8流对象占用大量内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!