整数对的尾递归有界流(Scala)? [英] Tail-recursive bounded stream of pairs of integers (Scala)?

查看:80
本文介绍了整数对的尾递归有界流(Scala)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Scala还是很陌生,所以请原谅我的无知!我正在尝试迭代以最大值为界的整数对.例如,如果最大值为5,则迭代应返回:

(0, 0), (0, 1), ..., (0, 5), (1, 0), ..., (5, 5)

我选择尝试将其尾部递归返回为Stream:

    @tailrec
    def _pairs(i: Int, j: Int, maximum: Int): Stream[(Int, Int)] = {
        if (i == maximum && j == maximum) Stream.empty
        else if (j == maximum) (i, j) #:: _pairs(i + 1, 0, maximum)
        else (i, j) #:: _pairs(i, j + 1, maximum)
    }

在没有tailrec注释的情况下,代码有效:

scala> _pairs(0, 0, 5).take(11)
res16: scala.collection.immutable.Stream[(Int, Int)] = Stream((0,0), ?)

scala> _pairs(0, 0, 5).take(11).toList
res17: List[(Int, Int)] = List((0,0), (0,1), (0,2), (0,3), (0,4), (0,5), (1,0), (1,1), (1,2), (1,3), (1,4))

但这对我来说还不够.编译器正确地指出_pairs的最后一行没有返回_pairs:

could not optimize @tailrec annotated method _pairs: it contains a recursive call not in tail position
    else (i, j) #:: _pairs(i, j + 1, maximum)
                ^

所以,我有几个问题:

  • 直接解决上述实现,一个尾部如何递归返回Stream [(Int,Int)]?
  • 退一步,什么是最有效的内存方法来迭代有界整数序列?我不想遍历Range,因为范围扩展了IndexedSeq ,并且我不希望序列完全存在于内存中.还是我错了?如果我遍历Range.view,是否可以避免它进入内存?

在Python(!)中,我想要的只是:

In [6]: def _pairs(maximum):
   ...:     for i in xrange(maximum+1):
   ...:         for j in xrange(maximum+1):
   ...:             yield (i, j)
   ...:             

In [7]: p = _pairs(5)

In [8]: [p.next() for i in xrange(11)]
Out[8]: 
[(0, 0),
 (0, 1),
 (0, 2),
 (0, 3),
 (0, 4),
 (0, 5),
 (1, 0),
 (1, 1),
 (1, 2),
 (1, 3),
 (1, 4)]

感谢您的帮助!如果您认为我需要阅读参考资料/API文档/其他内容,请告诉我,因为我热衷于学习.

解决方案

这不是尾递归

让我们假设您是在创建列表而不是流: (让我使用一个简单的函数来表达我的观点)

def foo(n: Int): List[Int] =
  if (n == 0)
    0 :: Nil
  else
    n :: foo(n - 1)

在此递归中,通常情况下,foo(n - 1)返回后,该函数必须对返回的列表进行某些操作-它必须将另一个项连接到列表的开头.因此该函数不能是尾部递归,因为在递归之后必须对列表进行某些操作.

没有尾部递归,对于n的较大值,您将耗尽堆栈空间.

常用列表解决方案

通常的解决方案是将ListBuffer作为第二个参数传递并填充.

def foo(n: Int) = {
  def fooInternal(n: Int, list: ListBuffer[Int]) = {
    if (n == 0) 
      list.toList
    else {
      list += n
      fooInternal(n - 1, list)
    }
  }
  fooInternal(n, new ListBuffer[Int]())
}

您正在做的事情被称为"尾递归模缺点",这是 LISP Prolog编译器在看到尾部递归模态cons模式时自动执行的优化,因为它很常见. Scala的编译器不会自动对此进行优化.

流不需要尾递归

流不需要尾部递归来避免用完堆栈空间-这是因为它们使用了巧妙的技巧来阻止对foo的递归调用在代码中出现的位置执行.该函数调用被包装在一个thunk中,并且仅在您实际上尝试从流中获取值时才调用.一次只有一次对foo的调用处于活动状态–永远不会递归.

我已经写了一个先前的答案,解释了 #::运算符的工作原理在Stackoverflow上.当您调用以下递归流函数时,将发生以下情况. (从数学意义上讲,它是递归的,但它并不像通常期望的那样从函数调用中进行函数调用.)

def foo(n: Int): Stream[Int] =
  if (n == 0)
    0 #:: Nil
  else
    n #:: foo(n - 1)

您调用foo(10),它将返回一个已经计算出一个元素的流,并且尾部是一个thunk,将在您下次需要该流中的某个元素时调用foo(9).现在不调用foo(9),而是将调用绑定到流中的lazy val,并且foo(10)立即返回.当您最终确实需要流中的第二个值时,将调用foo(9),它会计算一个元素,并将hte流的尾部设置为将调用foo(8)的thunk. foo(9)立即返回而无需调用foo(8).等等...

这使您可以创建无限流而不会耗尽内存,例如:

def countUp(start: Int): Stream[Int] = start #::countUp(start + 1)

(请小心在此流上调用的操作.如果尝试执行forEachmap,则将填满整个堆,但是使用take是一种很好的处理方法流的任意前缀.)

一个更简单的解决方案

为什么不只使用Scala的for循环,而不是处理递归和流?

def pairs(maximum:Int) =
  for (i <- 0 to maximum;
       j <- 0 to maximum)
    yield (i, j)

这将实现内存中的整个集合,并返回IndexedSeq[(Int, Int)].

如果您特别需要Stream,可以将第一个范围转换为Stream.

def pairs(maximum:Int) =
  for (i <- 0 to maximum toStream;
       j <- 0 to maximum)
    yield (i, j)

这将返回Stream[(Int, Int)].当您访问序列中的某个点时,它将被具体化到内存中,并且只要您仍然引用该元素之前的流中的任何点,它就会一直存在.

通过将两个范围都转换为视图,可以获得更好的内存使用率.

def pairs(maximum:Int) =
  for (i <- 0 to maximum view;
       j <- 0 to maximum view)
    yield (i, j)

返回一个SeqView[(Int, Int),Seq[_]],该SeqView[(Int, Int),Seq[_]]每次需要时都会计算每个元素,并且不存储预先计算的结果.

您还可以以相同的方式获得一个迭代器(只能遍历一次)

def pairs(maximum:Int) =
  for (i <- 0 to maximum iterator;
       j <- 0 to maximum iterator)
    yield (i, j)

返回Iterator[(Int, Int)].

I'm very new to Scala, so forgive my ignorance! I'm trying to iterate of pairs of integers that are bounded by a maximum. For example, if the maximum is 5, then the iteration should return:

(0, 0), (0, 1), ..., (0, 5), (1, 0), ..., (5, 5)

I've chosen to try and tail-recursively return this as a Stream:

    @tailrec
    def _pairs(i: Int, j: Int, maximum: Int): Stream[(Int, Int)] = {
        if (i == maximum && j == maximum) Stream.empty
        else if (j == maximum) (i, j) #:: _pairs(i + 1, 0, maximum)
        else (i, j) #:: _pairs(i, j + 1, maximum)
    }

Without the tailrec annotation the code works:

scala> _pairs(0, 0, 5).take(11)
res16: scala.collection.immutable.Stream[(Int, Int)] = Stream((0,0), ?)

scala> _pairs(0, 0, 5).take(11).toList
res17: List[(Int, Int)] = List((0,0), (0,1), (0,2), (0,3), (0,4), (0,5), (1,0), (1,1), (1,2), (1,3), (1,4))

But this isn't good enough for me. The compiler is correctly pointing out that the last line of _pairs is not returning _pairs:

could not optimize @tailrec annotated method _pairs: it contains a recursive call not in tail position
    else (i, j) #:: _pairs(i, j + 1, maximum)
                ^

So, I have several questions:

  • directly addressing the implementation above, how does one tail-recursively return Stream[(Int, Int)]?
  • taking a step back, what is the most memory-efficient way to iterate over bounded sequences of integers? I don't want to iterate over Range because Range extends IndexedSeq, and I don't want the sequence to exist entirely in memory. Or am I wrong? If I iterate over Range.view do I avoid it coming into memory?

In Python (!), all I want is:

In [6]: def _pairs(maximum):
   ...:     for i in xrange(maximum+1):
   ...:         for j in xrange(maximum+1):
   ...:             yield (i, j)
   ...:             

In [7]: p = _pairs(5)

In [8]: [p.next() for i in xrange(11)]
Out[8]: 
[(0, 0),
 (0, 1),
 (0, 2),
 (0, 3),
 (0, 4),
 (0, 5),
 (1, 0),
 (1, 1),
 (1, 2),
 (1, 3),
 (1, 4)]

Thanks for your help! If you think I need to read references / API docs / anything else please tell me, because I'm keen to learn.

解决方案

This is not tail-recursion

Let's suppose you were making a list instead of a stream: (let me use a simpler function to make my point)

def foo(n: Int): List[Int] =
  if (n == 0)
    0 :: Nil
  else
    n :: foo(n - 1)

In the general case in this recursion, after foo(n - 1) returns the function has to do something with the list that it returns -- it has to concatenate another item onto the beginning of the list. So the function can't be tail recursive, becuase something has to be done to the list after the recursion.

Without tail recursion, for some large value of n, you run out of stack space.

The usual list solution

The usual solution would be to pass a ListBuffer as a second parameter, and fill that.

def foo(n: Int) = {
  def fooInternal(n: Int, list: ListBuffer[Int]) = {
    if (n == 0) 
      list.toList
    else {
      list += n
      fooInternal(n - 1, list)
    }
  }
  fooInternal(n, new ListBuffer[Int]())
}

What you're doing is known as "tail recursion modulo cons", and this is an optimization performed automatically by LISP Prolog compilers when they see the tail recursion modulo cons pattern, since it's so common. Scala's compiler does not optimize this automatically.

Streams don't need tail recursion

Streams don't need tail recursion to avoid running out of stack space -- this is becuase they use a clever trick to keep from executing the recursive call to foo at the point where it appears in the code. The function call gets wrapped in a thunk, and only called at the point that you actually try to get the value from the stream. Only one call to foo is active at a time -- it's never recursive.

I've written a previous answer explaining how the #:: operator works here on Stackoverflow. Here's what happens when you call the following recursive stream function. (It is recursive in the mathematical sense, but it doesn't make a function call from within a function call the way you usually expect.)

def foo(n: Int): Stream[Int] =
  if (n == 0)
    0 #:: Nil
  else
    n #:: foo(n - 1)

You call foo(10), it returns a stream with one element computed already, and the tail is a thunk that will call foo(9) the next time you need an element from the stream. foo(9) is not called right now -- rather the call is bound to a lazy val inside the stream, and foo(10) returns immediately. When you finally do need the second value in the stream, foo(9) is called, and it computes one element and sets the tail of hte stream to be a thunk that will call foo(8). foo(9) returns immediately without calling foo(8). And so on...

This allows you to create infinite streams without running out of memory, for example:

def countUp(start: Int): Stream[Int] = start #::countUp(start + 1)

(Be careful what operations you call on this stream. If you try to do a forEach or a map, you'll fill up your whole heap, but using take is a good way to work with an arbitrary prefix of the stream.)

A simpler solution altogether

Instead of dealing with recursion and streams, why not just use Scala's for loop?

def pairs(maximum:Int) =
  for (i <- 0 to maximum;
       j <- 0 to maximum)
    yield (i, j)

This materializes the entire collection in memory, and returns an IndexedSeq[(Int, Int)].

If you need a Stream specifically, you can convert the first range into a Stream.

def pairs(maximum:Int) =
  for (i <- 0 to maximum toStream;
       j <- 0 to maximum)
    yield (i, j)

This will return a Stream[(Int, Int)]. When you access a certain point in the sequence, it will be materialized into memory, and it will stick around as long as you still have a reference to any point in the stream before that element.

You can get even better memory usage by converting both ranges into views.

def pairs(maximum:Int) =
  for (i <- 0 to maximum view;
       j <- 0 to maximum view)
    yield (i, j)

That returns a SeqView[(Int, Int),Seq[_]] that computes each element each time you need it, and doesn't store precomputed results.

You can also get an iterator (which you can only traverse once) the same way

def pairs(maximum:Int) =
  for (i <- 0 to maximum iterator;
       j <- 0 to maximum iterator)
    yield (i, j)

That returns Iterator[(Int, Int)].

这篇关于整数对的尾递归有界流(Scala)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆