Scala 流的函数处理没有 OutOfMemory 错误 [英] Functional processing of Scala streams without OutOfMemory errors

查看:22
本文介绍了Scala 流的函数处理没有 OutOfMemory 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以将函数式编程应用于 Scala 流,以便顺序处理流,但流中已处理的部分可以被垃圾收集?

Is it possible to apply functional programming to Scala streams such that the stream is processed sequentially, but the already processed part of the stream can be garbage collected?

例如,我定义了一个 Stream,它包含从 startend 的数字:

For example, I define a Stream that contains the numbers from start to end:

def fromToStream(start: Int, end: Int) : Stream[Int] = {
  if (end < start) Stream.empty
  else start #:: fromToStream(start+1, end)
}

如果我用函数式的方式总结这些值:

If I sum up the values in a functional style:

println(fromToStream(1,10000000).reduceLeft(_+_))

我得到一个 OutOfMemoryError - 也许是因为调用 reduceLeft 的堆栈帧持有对流头部的引用.但是,如果我以迭代方式执行此操作,它会起作用:

I get an OutOfMemoryError - perhaps since the stackframe of the call to reduceLeft holds a reference to the head of the stream. But if I do this in iterative style, it works:

var sum = 0
for (i <- fromToStream(1,10000000)) {
  sum += i
}

有没有办法以函数式风格做到这一点,而不会获得 OutOfMemory?

Is there a way to do this in a functional style without getting an OutOfMemory?

更新:这是Scala 中的一个错误a> 现在已修复.所以这现在或多或少已经过时了.

UPDATE: This was a bug in scala that is fixed now. So this is more or less out of date now.

推荐答案

是的,你可以.诀窍是使用尾递归方法,以便本地堆栈帧包含对 Stream 实例的唯一引用.由于该方法是尾递归的,一旦递归调用自身,对前一个Stream头的本地引用将被擦除,从而使GC能够收集到Stream的开始随你去.

Yes, you can. The trick is to use tail recursive methods, so that the local stack frame contains the only reference to the Stream instance. Since the method is tail-recursive, the local reference to the previous Stream head will be erased once it recursively calls itself, thus enabling the GC to collect the start of the Stream as you go.

Welcome to Scala version 2.9.0.r23459-b20101108091606 (Java HotSpot(TM) Server VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import collection.immutable.Stream
import collection.immutable.Stream

scala> import annotation.tailrec
import annotation.tailrec

scala> @tailrec def last(s: Stream[Int]): Int = if (s.tail.isEmpty) s.head else last(s.tail)
last: (s: scala.collection.immutable.Stream[Int])Int

scala> last(Stream.range(0, 100000000))                                                                             
res2: Int = 99999999

此外,您必须确保传递给上面的方法 last 的东西在堆栈中只有一个引用.如果你将一个 Stream 存储到一个局部变量或值中,当你调用 last 方法时它不会被垃圾收集,因为它的参数不是唯一剩下的对 <代码>流.下面的代码内存不足.

Also, you must ensure that the thing you pass to the method last above has only one reference on the stack. If you store a Stream into a local variable or value, it will not be garbage collected when you call the last method, since its argument is not the only reference left to Stream. The code below runs out of memory.

scala> val s = Stream.range(0, 100000000)                                                                           
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)                                                            

scala> last(s)                                                                                                      
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space                                              
        at sun.net.www.ParseUtil.encodePath(ParseUtil.java:84)                                                      
        at sun.misc.URLClassPath$JarLoader.checkResource(URLClassPath.java:674)                                     
        at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:759)                                       
        at sun.misc.URLClassPath.getResource(URLClassPath.java:169)                                                 
        at java.net.URLClassLoader$1.run(URLClassLoader.java:194)                                                   
        at java.security.AccessController.doPrivileged(Native Method)                                               
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)                                               
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)                                                    
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)                                            
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)                                                    
        at scala.tools.nsc.Interpreter$Request$$anonfun$onErr$1$1.apply(Interpreter.scala:978)                      
        at scala.tools.nsc.Interpreter$Request$$anonfun$onErr$1$1.apply(Interpreter.scala:976)                      
        at scala.util.control.Exception$Catch.apply(Exception.scala:80)
        at scala.tools.nsc.Interpreter$Request.loadAndRun(Interpreter.scala:984)                                    
        at scala.tools.nsc.Interpreter.loadAndRunReq$1(Interpreter.scala:579)                                       
        at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:599)                                             
        at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:576)
        at scala.tools.nsc.InterpreterLoop.reallyInterpret$1(InterpreterLoop.scala:472)                             
        at scala.tools.nsc.InterpreterLoop.interpretStartingWith(InterpreterLoop.scala:515)                         
        at scala.tools.nsc.InterpreterLoop.command(InterpreterLoop.scala:362)
        at scala.tools.nsc.InterpreterLoop.processLine$1(InterpreterLoop.scala:243)
        at scala.tools.nsc.InterpreterLoop.repl(InterpreterLoop.scala:249)
        at scala.tools.nsc.InterpreterLoop.main(InterpreterLoop.scala:559)
        at scala.tools.nsc.MainGenericRunner$.process(MainGenericRunner.scala:75)
        at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:31)
        at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

总结:

  1. 使用尾递归方法
  2. 将它们注释为尾递归
  3. 调用它们时,请确保它们的参数是对 Stream
  4. 的唯一引用
  1. Use tail-recursive methods
  2. Annotate them as tail-recursive
  3. When you call them, ensure that their argument is the only reference to the Stream

请注意,这也有效并且不会导致内存不足错误:

Note that this also works and does not result in an out of memory error:

scala> def s = Stream.range(0, 100000000)                                                   
s: scala.collection.immutable.Stream[Int]

scala> last(s)                                                                              
res1: Int = 99999999

编辑 2:

对于 reduceLeft 的情况,您必须定义一个带有累加器参数的辅助方法作为结果.

And in the case of reduceLeft that you require, you would have to define a helper method with an accumulator argument for the result.

对于reduceLeft,您需要一个累加器参数,您可以使用默认参数将其设置为某个值.一个简化的例子:

For reduceLeft, you need an accumulator argument, which you can set to a certain value using default arguments. A simplified example:

scala> @tailrec def rcl(s: Stream[Int], acc: Int = 0): Int = if (s.isEmpty) acc else rcl(s.tail, acc + s.head)
rcl: (s: scala.collection.immutable.Stream[Int],acc: Int)Int

scala> rcl(Stream.range(0, 10000000))
res6: Int = -2014260032

这篇关于Scala 流的函数处理没有 OutOfMemory 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆