Scala、Erastothenes:有没有一种直接的方法可以用迭代替换流? [英] Scala, Erastothenes: Is there a straightforward way to replace a stream with an iteration?

查看:19
本文介绍了Scala、Erastothenes:有没有一种直接的方法可以用迭代替换流?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个使用流无限生成素数的函数(维基百科:Erastothenes 的增量筛选).它返回一个流,但它也在内部合并素数倍数流以标记即将到来的复合.这个定义简洁、实用、优雅且易于理解,如果我自己这么说的话:

I wrote a function that generates primes indefinitely (wikipedia: incremental sieve of Erastothenes) usings streams. It returns a stream, but it also merges streams of prime multiples internally to mark upcoming composites. The definition is concise, functional, elegant and easy to understand, if I do say so myself:

def primes(): Stream[Int] = {
  def merge(a: Stream[Int], b: Stream[Int]): Stream[Int] = {
    def next = a.head min b.head
    Stream.cons(next, merge(if (a.head == next) a.tail else a,
                            if (b.head == next) b.tail else b))
  }
  def test(n: Int, compositeStream: Stream[Int]): Stream[Int] = {
    if (n == compositeStream.head) test(n+1, compositeStream.tail)
    else Stream.cons(n, test(n+1, merge(compositeStream, Stream.from(n*n, n))))
  }
  test(2, Stream.from(4, 2))
}

但是,当我尝试生成第 1000 个素数时,出现java.lang.OutOfMemoryError: GC 开销限制超出".

But, I get a "java.lang.OutOfMemoryError: GC overhead limit exceeded" when I try to generate the 1000th prime.

我有一个替代解决方案,它返回素数上的迭代器,并在内部使用元组的优先级队列(多个,素数用于生成多个)来标记即将到来的组合.它运行良好,但需要大约两倍的代码,而且我基本上不得不从头开始:

I have an alternative solution that returns an iterator over primes and uses a priority queue of tuples (multiple, prime used to generate multiple) internally to mark upcoming composites. It works well, but it takes about twice as much code, and I basically had to restart from scratch:

import scala.collection.mutable.PriorityQueue
def primes(): Iterator[Int] = {
  // Tuple (composite, prime) is used to generate a primes multiples
  object CompositeGeneratorOrdering extends Ordering[(Long, Int)] {
    def compare(a: (Long, Int), b: (Long, Int)) = b._1 compare a._1
  }
  var n = 2;
  val composites = PriorityQueue(((n*n).toLong, n))(CompositeGeneratorOrdering)
  def advance = {
    while (n == composites.head._1) { // n is composite
      while (n == composites.head._1) { // duplicate composites
        val (multiple, prime) = composites.dequeue
        composites.enqueue((multiple + prime, prime))
      }
      n += 1
    }
    assert(n < composites.head._1)
    val prime = n
    n += 1
    composites.enqueue((prime.toLong * prime.toLong, prime))
    prime
  }
  Iterator.continually(advance)
}

是否有一种直接的方法可以将带有流的代码转换为带有迭代器的代码?或者有没有一种简单的方法可以让我的第一次尝试提高内存效率?

Is there a straightforward way to translate the code with streams to code with iterators? Or is there a simple way to make my first attempt more memory efficient?

从流的角度考虑更容易;我宁愿这样开始,然后在必要时调整我的代码.

It's easier to think in terms of streams; I'd rather start that way, then tweak my code if necessary.

推荐答案

在您的第一个代码中,您应该推迟合并,直到在候选中看到素数的平方.这将大大减少正在使用的流数量,从根本上改善您的内存使用问题.要得到第 1000 个质数 7919,我们只需要考虑不超过其平方根 88 的质数.那只是 23 个素数/它们倍数的流,而不是 999(22,如果我们从一开始就忽略偶数).对于第 10,000 个素数,区别在于拥有 9999 个倍数流和只有 66 个.对于第 100,000 个,只需要 189.

In your first code, you should postpone the merging until the square of a prime is seen amongst the candidates. This will drastically reduce the number of streams in use, radically improving your memory usage issues. To get the 1000th prime, 7919, we only need to consider primes not above its square root, 88. That's just 23 primes/streams of their multiples, instead of 999 (22, if we ignore the evens from the outset). For the 10,000th prime, it's the difference between having 9999 streams of multiples and just 66. And for the 100,000th, only 189 are needed.

诀窍是通过递归调用将消耗的素数与产生的素数分开:

The trick is to separate the primes being consumed from the primes being produced, via a recursive invocation:

def primes(): Stream[Int] = {
  def merge(a: Stream[Int], b: Stream[Int]): Stream[Int] = {
    def next = a.head min b.head
    Stream.cons(next, merge(if (a.head == next) a.tail else a,
                            if (b.head == next) b.tail else b))
  }
  def test(n: Int, q: Int, 
                   compositeStream: Stream[Int], 
                   primesStream: Stream[Int]): Stream[Int] = {
    if (n == q) test(n+2, primesStream.tail.head*primesStream.tail.head,
                          merge(compositeStream, 
                                Stream.from(q, 2*primesStream.head).tail),
                          primesStream.tail)
    else if (n == compositeStream.head) test(n+2, q, compositeStream.tail,
                                                     primesStream)
    else Stream.cons(n, test(n+2, q, compositeStream, primesStream))
  }
  Stream.cons(2, Stream.cons(3, Stream.cons(5, 
     test(7, 25, Stream.from(9, 6), primes().tail.tail))))
}

作为一个额外的好处,不需要将素数的平方存储为 Longs.这也会更快,并且具有更好的算法复杂性(时间和空间),因为它避免了做很多多余的工作.Ideone 测试显示它的运行速度约为 ~n1.5..1.6 经验增长顺序,生产量高达 n = 80,000 素数.

As an added bonus, there's no need to store the squares of primes as Longs. This will also be much faster and have better algorithmic complexity (time and space) as it avoids doing a lot of superfluous work. Ideone testing shows it runs at about ~ n1.5..1.6 empirical orders of growth in producing up to n = 80,000 primes.

这里仍然存在算法问题:这里创建的结构仍然是线性左倾结构(((mults_of_2 + mults_of_3) + mults_of_5) + ...),更频繁- 产生位于更深处的流(因此数字有更多层次可以渗透,上升).右倾结构应该更好,mults_of_2 + (mults_of_3 + (mults_of_5 + ...)).使其成为一棵树应该会带来时间复杂度的真正改进(通常将其降低到大约 ~ n1.2..1.25).有关相关讨论,请参阅此 haskellwiki 页面.

There's still an algorithmic problem here: the structure that is created here is still a linear left-leaning structure (((mults_of_2 + mults_of_3) + mults_of_5) + ...), with more frequently-producing streams situated deeper inside it (so the numbers have more levels to percolate through, going up). The right-leaning structure should be better, mults_of_2 + (mults_of_3 + (mults_of_5 + ...)). Making it a tree should bring a real improvement in time complexity (pushing it down typically to about ~ n1.2..1.25). For a related discussion, see this haskellwiki page.

Eratosthenes 的真正的"命令式筛选通常在 ~ n1.1 左右运行(在 n 个素数中产生)和一个最优的试除法筛在 ~ n1.40..1.45.您的原始代码运行时间约为 三次,或更糟.使用命令式可变数组通常是最快的,按段工作(又名 Eratosthenes 的分段筛).

The "real" imperative sieve of Eratosthenes usually runs at around ~ n1.1 (in n primes produced) and an optimal trial division sieve at ~ n1.40..1.45. Your original code runs at about cubic time, or worse. Using imperative mutable array is usually the fastest, working by segments (a.k.a. the segmented sieve of Eratosthenes).

在您的第二个代码的上下文中,这是在 Python 中实现的方式.

In the context of your second code, this is how it is achieved in Python.

这篇关于Scala、Erastothenes:有没有一种直接的方法可以用迭代替换流?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆