Scala无限迭代器OutOfMemory [英] Scala Infinite Iterator OutOfMemory
问题描述
我正在玩Scala的懒惰迭代器,并且遇到了一个问题。我想要做的是读取一个大文件,做一个转换,然后写出结果:
object FileProcessor {
def main(args:Array [String]){
val inSource = Source.fromFile(in.txt)
val outSource = new PrintWriter(out.txt)
try {
//这个基本惰性迭代器正常工作
// val iterator = inSource.getLines
// ...但这个包含我的过程方法
//抛出OutOfMemoryExceptions
val iterator = process(inSource.getLines.toSeq).iterator
while(iterator.hasNext)outSource。 println(iterator.next)
} finally {
inSource.close()
outSource.close()
}
}
//在这种情况下处理仅仅意味着每个行的大写
私有def过程(内容:Seq [String])= contents.map(_。toUpperCase)
}
所以我得到一个OutOfMem oryException在大文件上。我知道如果你围绕着Stream的头部引用,你可能会和Scala的懒惰Streams发生冲突。所以在这种情况下,我会小心地将process()的结果转换为迭代器,并丢弃最初返回的Seq。
有谁知道为什么导致O(n)内存消耗?
更新
$ p $ object FileReader {
def main(args :Array [String]){
inSource = Source.fromFile(in.txt)
val outSource = new PrintWriter(out.txt)
try {
writeToFile(outSource,process(inSource.getLines.toSeq))
} finally {
inSource.close()
outSource.close()
}
@ scala.annotation.tailrec
private def writeToFile(outSource:PrintWriter,contents:Seq [String]){
if(!contents.isEmpty){
outSource.println(contents.head)
writeToFile(outSource,contents.tail)
}
}
private def process(内容:Seq [String] )= contents.map(_。toUpperCase)
由 fge 暗示,修改 process
以获取迭代器并移除 .toSeq
。 inSource.getLines
已经是一个迭代器。
转换为 Seq
会导致项目被记住。我认为它会将迭代器转换为 Stream
并导致所有项目被记住。
编辑:好的,它更微妙。通过对进程结果调用 iterator
,您可以做到相当于 Iterator.toSeq.iterator
。这可能会导致内存不足异常。
scala> Iterator.continually(1).toSeq.iterator.take(300 * 1024 * 1024).size
java.lang.OutOfMemoryError:Java堆空间
这可能与此处报告的问题相同: https://issues.scala-lang.org/browse/SI-4835 。请注意我在错误结束时的评论,这是来自个人经验。
I'm playing around with Scala's lazy iterators, and I've run into an issue. What I'm trying to do is read in a large file, do a transformation, and then write out the result:
object FileProcessor {
def main(args: Array[String]) {
val inSource = Source.fromFile("in.txt")
val outSource = new PrintWriter("out.txt")
try {
// this "basic" lazy iterator works fine
// val iterator = inSource.getLines
// ...but this one, which incorporates my process method,
// throws OutOfMemoryExceptions
val iterator = process(inSource.getLines.toSeq).iterator
while(iterator.hasNext) outSource.println(iterator.next)
} finally {
inSource.close()
outSource.close()
}
}
// processing in this case just means upper-cases every line
private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
}
So I'm getting an OutOfMemoryException on large files. I know you can run afoul of Scala's lazy Streams if you keep around references to the head of the Stream. So in this case I'm careful to convert the result of process() to an iterator and throw-away the Seq it initially returns.
Does anyone know why this still causes O(n) memory consumption? Thanks!
Update
In response to fge and huynhjl, it seems like the Seq might be the culprit, but I don't know why. As an example, the following code works fine (and I'm using Seq all over the place). This code does not produce an OutOfMemoryException:
object FileReader {
def main(args: Array[String]) {
val inSource = Source.fromFile("in.txt")
val outSource = new PrintWriter("out.txt")
try {
writeToFile(outSource, process(inSource.getLines.toSeq))
} finally {
inSource.close()
outSource.close()
}
}
@scala.annotation.tailrec
private def writeToFile(outSource: PrintWriter, contents: Seq[String]) {
if (! contents.isEmpty) {
outSource.println(contents.head)
writeToFile(outSource, contents.tail)
}
}
private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
As hinted by fge, modify process
to take an iterator and remove the .toSeq
. inSource.getLines
is already an iterator.
Converting to a Seq
will cause the items to be remembered. I think it will convert the iterator into a Stream
and cause all items to be remembered.
Edit: Ok, it's more subtle. You are doing the equivalent of Iterator.toSeq.iterator
by calling iterator
on the result of process. This can cause an out of memory exception.
scala> Iterator.continually(1).toSeq.iterator.take(300*1024*1024).size
java.lang.OutOfMemoryError: Java heap space
It may be the same issue as reported here: https://issues.scala-lang.org/browse/SI-4835. Note my comment at the end of the bug, this is from personal experience.
这篇关于Scala无限迭代器OutOfMemory的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!