Scala无限迭代器OutOfMemory [英] Scala Infinite Iterator OutOfMemory

查看:154
本文介绍了Scala无限迭代器OutOfMemory的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在玩Scala的懒惰迭代器,并且遇到了一个问题。我想要做的是读取一个大文件,做一个转换,然后写出结果:

  object FileProcessor {
def main(args:Array [String]){
val inSource = Source.fromFile(in.txt)
val outSource = new PrintWriter(out.txt)

try {
//这个基本惰性迭代器正常工作
// val iterator = inSource.getLines

// ...但这个包含我的过程方法
//抛出OutOfMemoryExceptions
val iterator = process(inSource.getLines.toSeq).iterator

while(iterator.hasNext)outSource。 println(iterator.next)

} finally {
inSource.close()
outSource.close()
}
}

//在这种情况下处理仅仅意味着每个行的大写
私有def过程(内容:Seq [String])= contents.map(_。toUpperCase)
}

所以我得到一个OutOfMem oryException在大文件上。我知道如果你围绕着Stream的头部引用,你可能会和Scala的懒惰Streams发生冲突。所以在这种情况下,我会小心地将process()的结果转换为迭代器,并丢弃最初返回的Seq。



有谁知道为什么导致O(n)内存消耗?




更新

fge和huynhjl,似乎Seq可能是罪魁祸首,但我不知道为什么。作为一个例子,下面的代码工作正常(我在各地使用Seq)。这段代码没有产生一个OutOfMemoryException异常:

$ p $ object FileReader {
def main(args :Array [String]){

inSource = Source.fromFile(in.txt)
val outSource = new PrintWriter(out.txt)
try {
writeToFile(outSource,process(inSource.getLines.toSeq))
} finally {
inSource.close()
outSource.close()
}


@ scala.annotation.tailrec
private def writeToFile(outSource:PrintWriter,contents:Seq [String]){
if(!contents.isEmpty){
outSource.println(contents.head)
writeToFile(outSource,contents.tail)
}
}

private def process(内容:Seq [String] )= contents.map(_。toUpperCase)


解决方案

fge 暗示,修改 process 以获取迭代器并移除 .toSeq inSource.getLines 已经是一个迭代器。



转换为 Seq 会导致项目被记住。我认为它会将迭代器转换为 Stream 并导致所有项目被记住。



编辑:好的,它更微妙。通过对进程结果调用 iterator ,您可以做到相当于 Iterator.toSeq.iterator 。这可能会导致内存不足异常。

  scala> Iterator.continually(1).toSeq.iterator.take(300 * 1024 * 1024).size 
java.lang.OutOfMemoryError:Java堆空间

这可能与此处报告的问题相同: https://issues.scala-lang.org/browse/SI-4835 。请注意我在错误结束时的评论,这是来自个人经验。

I'm playing around with Scala's lazy iterators, and I've run into an issue. What I'm trying to do is read in a large file, do a transformation, and then write out the result:

object FileProcessor {
  def main(args: Array[String]) {
    val inSource = Source.fromFile("in.txt")
    val outSource = new PrintWriter("out.txt")

    try {
      // this "basic" lazy iterator works fine
      // val iterator = inSource.getLines

      // ...but this one, which incorporates my process method, 
      // throws OutOfMemoryExceptions
      val iterator = process(inSource.getLines.toSeq).iterator

      while(iterator.hasNext) outSource.println(iterator.next)

    } finally {
      inSource.close()
      outSource.close()
    }
  }

  // processing in this case just means upper-cases every line
  private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
}

So I'm getting an OutOfMemoryException on large files. I know you can run afoul of Scala's lazy Streams if you keep around references to the head of the Stream. So in this case I'm careful to convert the result of process() to an iterator and throw-away the Seq it initially returns.

Does anyone know why this still causes O(n) memory consumption? Thanks!


Update

In response to fge and huynhjl, it seems like the Seq might be the culprit, but I don't know why. As an example, the following code works fine (and I'm using Seq all over the place). This code does not produce an OutOfMemoryException:

object FileReader {
  def main(args: Array[String]) {

  val inSource = Source.fromFile("in.txt")
  val outSource = new PrintWriter("out.txt")
  try {
    writeToFile(outSource, process(inSource.getLines.toSeq))
  } finally {
    inSource.close()
    outSource.close()
  }
}

@scala.annotation.tailrec
private def writeToFile(outSource: PrintWriter, contents: Seq[String]) {
  if (! contents.isEmpty) {
    outSource.println(contents.head)
    writeToFile(outSource, contents.tail)
  }
}

private def process(contents: Seq[String]) = contents.map(_.toUpperCase)

解决方案

As hinted by fge, modify process to take an iterator and remove the .toSeq. inSource.getLines is already an iterator.

Converting to a Seq will cause the items to be remembered. I think it will convert the iterator into a Stream and cause all items to be remembered.

Edit: Ok, it's more subtle. You are doing the equivalent of Iterator.toSeq.iterator by calling iterator on the result of process. This can cause an out of memory exception.

scala> Iterator.continually(1).toSeq.iterator.take(300*1024*1024).size
java.lang.OutOfMemoryError: Java heap space

It may be the same issue as reported here: https://issues.scala-lang.org/browse/SI-4835. Note my comment at the end of the bug, this is from personal experience.

这篇关于Scala无限迭代器OutOfMemory的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆