对没有OutOfMemory错误的Scala流进行功能处理 [英] Functional processing of Scala streams without OutOfMemory errors
问题描述
是否可以将函数式编程应用到Scala流中,以便按顺序处理流,但流的已处理部分可以进行垃圾回收?
例如,我定义了一个 如果我总结了函数样式中的值: 我得到一个 有没有办法做到这一点的功能风格没有得到 UPDATE strong>:这是 scala中的bug ,现在已经修复。所以这个或多或少已经过时了。 是的,您可以。诀窍是使用尾递归方法,以便本地堆栈帧包含对 另外,你必须确保你传递给上面的方法 Stream ,它包含从
start
到 end $ c
$ b $ $ p $ def fromToStream(start:Int,end:Int):Stream [Int] = {
if(end
}
println(fromToStream 1,10000000).reduceLeft(_ + _))
OutOfMemoryError
- 也许是因为调用 reduceLeft
的栈帧保存了对流头部的引用。但是,如果我以迭代方式执行此操作,它可以工作:
var sum = 0
for(i< fromToStream(1,10000000)){
sum + = i
}
OutOfMemory
?
Stream
实例的唯一引用。由于该方法是尾递归的,一旦递归调用它自己,对前面的 Stream
头部的本地引用将被擦除,从而使GC能够收集 Stream
随你去。
欢迎使用Scala 2.9.0.r23459- b20101108091606(Java HotSpot TM服务器VM,Java 1.6.0_20)。
键入表达式让他们评估。
输入:help获取更多信息。
scala> import collection.immutable.Stream
import collection.immutable.Stream
scala> import annotation.tailrec
import annotation.tailrec
scala> @tailrec def last(s:Stream [Int]):Int = if(s.tail.isEmpty)s.head else last(s.tail)
last:(s:scala.collection.immutable.Stream [ Int])Int
scala> (Stream.range(0,100000000))
res2:Int = 99999999
last
的东西在栈上只有一个引用。如果将 Stream
存储到本地变量或值中,那么在调用 last
方法时不会收集垃圾,因为它的参数不是唯一引用 Stream
的引用。下面的代码耗尽内存。
scala> val s = Stream.range(0,100000000)
s:scala.collection.immutable.Stream [Int] = Stream(0,?)
scala>最后(s)
线程main中的异常java.lang.OutOfMemoryError:Java堆空间
在sun.net.www.ParseUtil.encodePath(ParseUtil.java:84)
在太阳.misc.URLClassPath $ JarLoader.checkResource(URLClassPath.java:674)
at sun.misc.URLClassPath $ JarLoader.getResource(URLClassPath.java:759)
at sun.misc.URLClassPath.getResource(URLClassPath .java:169)在java.net.URLClassLoader上
$ 1.run(URLClassLoader.java:194)$ java.util.AccessController.doPrivileged(Native方法)
在java.net上
。 URLClassLoa der.findClass(URLClassLoader.java:190)
位于java.lang.ClassLoader.loadClass(ClassLoader.java:307)
位于sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at scala.tools.nsc.Interpreter $ Request $$ anonfun $ onErr $ 1 $ 1.apply(Interpreter.scala:978)
at scala.tools.nsc.Interpreter $ Request $$ anonfun $ onErr $ 1 $ 1.apply(Interpreter.scala:976)
at scala.util.control.Exception $ Catch.apply(Exception.scala: 80)
at scala.tools.nsc.Interpreter $ Request.loadAndRun(Interpreter.scala:984)
at scala.tools.nsc.Interpreter.loadAndRunReq $ 1(Interpreter.scala:579)
at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:599)
at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:576)
at scala.tools .nsc.InterpreterLoop.reallyInterpret $ 1(InterpreterLoop.scala:472)
at scala.tools.nsc.InterpreterLoop.interpretStartingWith(InterpreterLoop.scala:515)
at scala.tools.nsc.InterpreterLoop.command( InterpreterLoop.scala:362)
at scala.tools.nsc.InterpreterLoop.processLine $ 1(InterpreterLoop.scala:243)
at scala.tools.nsc.InterpreterLoop.repl(InterpreterLoop.scala:249)
at scala.tools.nsc.InterpreterLoop.main(InterpreterLoop.scala:559)
at scala.tools.nsc.MainGenericRunner $ .process(MainGenericRunner.scala:75)
at scala.tools .nsc.MainGenericRunner $ .main(MainGenericRunner.scala:31)
at scala.tools.nsc.MainG总结:
$ b
- 使用尾递归方法
- 将它们标注为尾递归
- 确保它们的参数是对
Stream
的唯一引用编辑:
请注意,这也适用,并且不会导致内存不足错误:
阶> def s = Stream.range(0,100000000)
s:scala.collection.immutable.Stream [Int]
scala> last(s)
res1:Int = 99999999
编辑2:
在你需要的 reduceLeft
的情况下,你必须为结果定义一个带累加器参数的辅助方法。
对于reduceLeft,您需要一个累加器参数,您可以使用默认参数将其设置为特定值。一个简单的例子:
scala> @tailrec def rcl(s:Stream [Int],acc:Int = 0):Int = if(s.isEmpty)acc else rcl(s.tail,acc + s.head)
rcl:(s: scala.collection.immutable.Stream [Int],acc:Int)Int
scala> rcl(Stream.range(0,10000000))
res6:Int = -2014260032
Is it possible to apply functional programming to Scala streams such that the stream is processed sequentially, but the already processed part of the stream can be garbage collected?
For example, I define a Stream
that contains the numbers from start
to end
:
def fromToStream(start: Int, end: Int) : Stream[Int] = {
if (end < start) Stream.empty
else start #:: fromToStream(start+1, end)
}
If I sum up the values in a functional style:
println(fromToStream(1,10000000).reduceLeft(_+_))
I get an OutOfMemoryError
- perhaps since the stackframe of the call to reduceLeft
holds a reference to the head of the stream. But if I do this in iterative style, it works:
var sum = 0
for (i <- fromToStream(1,10000000)) {
sum += i
}
Is there a way to do this in a functional style without getting an OutOfMemory
?
UPDATE: This was a bug in scala that is fixed now. So this is more or less out of date now.
Yes, you can. The trick is to use tail recursive methods, so that the local stack frame contains the only reference to the Stream
instance. Since the method is tail-recursive, the local reference to the previous Stream
head will be erased once it recursively calls itself, thus enabling the GC to collect the start of the Stream
as you go.
Welcome to Scala version 2.9.0.r23459-b20101108091606 (Java HotSpot(TM) Server VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import collection.immutable.Stream
import collection.immutable.Stream
scala> import annotation.tailrec
import annotation.tailrec
scala> @tailrec def last(s: Stream[Int]): Int = if (s.tail.isEmpty) s.head else last(s.tail)
last: (s: scala.collection.immutable.Stream[Int])Int
scala> last(Stream.range(0, 100000000))
res2: Int = 99999999
Also, you must ensure that the thing you pass to the method last
above has only one reference on the stack. If you store a Stream
into a local variable or value, it will not be garbage collected when you call the last
method, since its argument is not the only reference left to Stream
. The code below runs out of memory.
scala> val s = Stream.range(0, 100000000)
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> last(s)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at sun.net.www.ParseUtil.encodePath(ParseUtil.java:84)
at sun.misc.URLClassPath$JarLoader.checkResource(URLClassPath.java:674)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:759)
at sun.misc.URLClassPath.getResource(URLClassPath.java:169)
at java.net.URLClassLoader$1.run(URLClassLoader.java:194)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at scala.tools.nsc.Interpreter$Request$$anonfun$onErr$1$1.apply(Interpreter.scala:978)
at scala.tools.nsc.Interpreter$Request$$anonfun$onErr$1$1.apply(Interpreter.scala:976)
at scala.util.control.Exception$Catch.apply(Exception.scala:80)
at scala.tools.nsc.Interpreter$Request.loadAndRun(Interpreter.scala:984)
at scala.tools.nsc.Interpreter.loadAndRunReq$1(Interpreter.scala:579)
at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:599)
at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:576)
at scala.tools.nsc.InterpreterLoop.reallyInterpret$1(InterpreterLoop.scala:472)
at scala.tools.nsc.InterpreterLoop.interpretStartingWith(InterpreterLoop.scala:515)
at scala.tools.nsc.InterpreterLoop.command(InterpreterLoop.scala:362)
at scala.tools.nsc.InterpreterLoop.processLine$1(InterpreterLoop.scala:243)
at scala.tools.nsc.InterpreterLoop.repl(InterpreterLoop.scala:249)
at scala.tools.nsc.InterpreterLoop.main(InterpreterLoop.scala:559)
at scala.tools.nsc.MainGenericRunner$.process(MainGenericRunner.scala:75)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:31)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
To summarize:
- Use tail-recursive methods
- Annotate them as tail-recursive
- When you call them, ensure that their argument is the only reference to the
Stream
EDIT:
Note that this also works and does not result in an out of memory error:
scala> def s = Stream.range(0, 100000000)
s: scala.collection.immutable.Stream[Int]
scala> last(s)
res1: Int = 99999999
EDIT2:
And in the case of reduceLeft
that you require, you would have to define a helper method with an accumulator argument for the result.
For reduceLeft, you need an accumulator argument, which you can set to a certain value using default arguments. A simplified example:
scala> @tailrec def rcl(s: Stream[Int], acc: Int = 0): Int = if (s.isEmpty) acc else rcl(s.tail, acc + s.head)
rcl: (s: scala.collection.immutable.Stream[Int],acc: Int)Int
scala> rcl(Stream.range(0, 10000000))
res6: Int = -2014260032
这篇关于对没有OutOfMemory错误的Scala流进行功能处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!