Stream#filter内存不足,无法保存1,000,000个项目 [英] Stream#filter Runs out of Memory for 1,000,000 items
问题描述
假设我的Stream
的长度为1,000,000
,全为1.
Let's say I have a Stream
of length 1,000,000
with all 1's.
scala> val million = Stream.fill(100000000)(1)
million: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> million filter (x => x % 2 == 0)
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
我得到一个Out of Memory
异常.
然后,我尝试使用List
进行相同的filter
调用.
Then, I tried the same filter
call with List
.
scala> val y = List.fill(1000000)(1)
y: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ...
scala> y.filter(x => x % 2 == 0)
res2: List[Int] = List()
但是成功了.
为什么Stream#filter
在这里用完了内存,但是List#filter
完成就好了?
Why does the Stream#filter
run out of memory here, but the List#filter
completes just fine?
最后,在大数据流中,filter
是否会导致整个数据流的非惰性评估?
Lastly, with a large stream, will filter
result in the non-lazy evaluation of the entire stream?
推荐答案
List
的开销-单个对象(::
的实例),每个元素具有2个字段(2个指针).
Overhead of List
- single object (instance of ::
) with 2 fields (2 pointers) per element.
Stream
的开销-Cons
的实例(带有3个指针)加上Function
的实例(tl: => Stream[A]
),用于每个元素的Stream#tail
的惰性计算.
Overhead of Stream
- instance of Cons
(with 3 pointers) plus an instance of Function
(tl: => Stream[A]
) for lazy evaluation of Stream#tail
per element.
因此,您将在Stream
上花费大约2倍的内存.
So you'll spend ~2 times more memory on Stream
.
您已将Stream
定义为val
.或者,您可以将million
定义为def
-在这种情况下,在filter
之后,GC将删除所有已创建的元素,并取回内存.
You have defined your Stream
as val
. Alternatively you could define million
as def
- in this case after filter
GC will delete all created elements and you'll get your memory back.
请注意,只有Stream
中的tail
是惰性的,head
是严格的,因此filter
严格评估,直到获得满足给定谓词的第一个元素为止,并且由于filter
遍历所有million
流并将所有元素放入内存中.
Note that only tail
in Stream
is lazy, head
is strict, so filter
evaluates strictly until it gets first element that satisfies a given predicate, and since there is no such elements in your Stream
filter
iterates over all your million
stream and puts all elements in memory.
这篇关于Stream#filter内存不足,无法保存1,000,000个项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!