Scala,Erastothenes:有一种直接用迭代替换流的方法吗? [英] Scala, Erastothenes: Is there a straightforward way to replace a stream with an iteration?

查看:147
本文介绍了Scala,Erastothenes:有一种直接用迭代替换流的方法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个无限期生成素数的函数(维基百科:增量筛选Erastothenes )使用流。它返回一个流,但它也在内部合并素数倍流以标记即将到来的复合。如果我自己这样说,那么定义简洁,实用,优雅且易于理解:

I wrote a function that generates primes indefinitely (wikipedia: incremental sieve of Erastothenes) usings streams. It returns a stream, but it also merges streams of prime multiples internally to mark upcoming composites. The definition is concise, functional, elegant and easy to understand, if I do say so myself:

def primes(): Stream[Int] = {
  def merge(a: Stream[Int], b: Stream[Int]): Stream[Int] = {
    def next = a.head min b.head
    Stream.cons(next, merge(if (a.head == next) a.tail else a,
                            if (b.head == next) b.tail else b))
  }
  def test(n: Int, compositeStream: Stream[Int]): Stream[Int] = {
    if (n == compositeStream.head) test(n+1, compositeStream.tail)
    else Stream.cons(n, test(n+1, merge(compositeStream, Stream.from(n*n, n))))
  }
  test(2, Stream.from(4, 2))
}

但是,我得到一个java.lang.OutOfMemoryError:GC当我尝试生成第1000个素数时,超出了开销限制。

But, I get a "java.lang.OutOfMemoryError: GC overhead limit exceeded" when I try to generate the 1000th prime.

我有一个替代解决方案,它在素数上返回迭代器并使用元组的优先级队列(倍数,素数)用于在内部生成多个标记即将推出的复合材料。它运行良好,但它需要大约两倍的代码,我基本上不得不从头重新开始:

I have an alternative solution that returns an iterator over primes and uses a priority queue of tuples (multiple, prime used to generate multiple) internally to mark upcoming composites. It works well, but it takes about twice as much code, and I basically had to restart from scratch:

import scala.collection.mutable.PriorityQueue
def primes(): Iterator[Int] = {
  // Tuple (composite, prime) is used to generate a primes multiples
  object CompositeGeneratorOrdering extends Ordering[(Long, Int)] {
    def compare(a: (Long, Int), b: (Long, Int)) = b._1 compare a._1
  }
  var n = 2;
  val composites = PriorityQueue(((n*n).toLong, n))(CompositeGeneratorOrdering)
  def advance = {
    while (n == composites.head._1) { // n is composite
      while (n == composites.head._1) { // duplicate composites
        val (multiple, prime) = composites.dequeue
        composites.enqueue((multiple + prime, prime))
      }
      n += 1
    }
    assert(n < composites.head._1)
    val prime = n
    n += 1
    composites.enqueue((prime.toLong * prime.toLong, prime))
    prime
  }
  Iterator.continually(advance)
}

是否有一种直接的方法将带有流的代码转换为带迭代器的代码?或者是否有一种简单的方法可以让我的第一次尝试更有效率?

Is there a straightforward way to translate the code with streams to code with iterators? Or is there a simple way to make my first attempt more memory efficient?

从流的角度考虑更容易;我宁愿那样开始,然后在必要时调整我的代码。

It's easier to think in terms of streams; I'd rather start that way, then tweak my code if necessary.

推荐答案

在你的第一个代码中,你应该推迟合并,直到候选人中看到素数的平方。这将大大减少使用的流的数量,从根本上改善您的内存使用问题。要获得第1000个素数, 7919 ,我们只需要考虑不高于其平方根的 88 。这只是 23 它们的倍数的素数/流,而不是 999 22 ,如果我们从一开始就忽略了evens)。对于第10,000个素数,它是 9999 的倍数流和 66 之间的区别。而对于第100,000个,只需要 189

In your first code, you should postpone the merging until the square of a prime is seen amongst the candidates. This will drastically reduce the number of streams in use, radically improving your memory usage issues. To get the 1000th prime, 7919, we only need to consider primes not above its square root, 88. That's just 23 primes/streams of their multiples, instead of 999 (22, if we ignore the evens from the outset). For the 10,000th prime, it's the difference between having 9999 streams of multiples and just 66. And for the 100,000th, only 189 are needed.

诀窍是将正在消耗的素数与正在生成的素数分开,通过递归调用:

The trick is to separate the primes being consumed from the primes being produced, via a recursive invocation:

def primes(): Stream[Int] = {
  def merge(a: Stream[Int], b: Stream[Int]): Stream[Int] = {
    def next = a.head min b.head
    Stream.cons(next, merge(if (a.head == next) a.tail else a,
                            if (b.head == next) b.tail else b))
  }
  def test(n: Int, q: Int, 
                   compositeStream: Stream[Int], 
                   primesStream: Stream[Int]): Stream[Int] = {
    if (n == q) test(n+2, primesStream.tail.head*primesStream.tail.head,
                          merge(compositeStream, 
                                Stream.from(q, 2*primesStream.head).tail),
                          primesStream.tail)
    else if (n == compositeStream.head) test(n+2, q, compositeStream.tail,
                                                     primesStream)
    else Stream.cons(n, test(n+2, q, compositeStream, primesStream))
  }
  Stream.cons(2, Stream.cons(3, Stream.cons(5, 
     test(7, 25, Stream.from(9, 6), primes().tail.tail))))
}

作为一个额外的奖励,没有必要将素数的平方存储为 Long s。这也将更快,并且具有更好的算法复杂性(时间和空间),因为它避免了做大量多余的工作。 Ideone测试显示它运行在〜 n 1.5..1.6 primes。

As an added bonus, there's no need to store the squares of primes as Longs. This will also be much faster and have better algorithmic complexity (time and space) as it avoids doing a lot of superfluous work. Ideone testing shows it runs at about ~ n1.5..1.6 empirical orders of growth in producing up to n = 80,000 primes.

这里仍然存在算法问题:这里创建的结构仍然是线性左倾结构((( mults_of_2 + mults_of_3)+ mults_of_5)+ ...),更频繁地产生的流位于其内部(因此数字有更多的水平渗透,上升)。右倾结构应该更好, mults_of_2 +(mults_of_3 +(mults_of_5 + ...))。使它成为一棵树应该会带来时间复杂度的真正改善(通常将其推低到约〜 n 1.2..1.25 )。有关相关讨论,请参阅此haskellwiki页面

There's still an algorithmic problem here: the structure that is created here is still a linear left-leaning structure (((mults_of_2 + mults_of_3) + mults_of_5) + ...), with more frequently-producing streams situated deeper inside it (so the numbers have more levels to percolate through, going up). The right-leaning structure should be better, mults_of_2 + (mults_of_3 + (mults_of_5 + ...)). Making it a tree should bring a real improvement in time complexity (pushing it down typically to about ~ n1.2..1.25). For a related discussion, see this haskellwiki page.

Eratosthenes的真实命令筛通常在〜 n 1.1 附近运行(在 n 质数中产生)和在 n 1.40..1.45 的最佳试验筛分筛。 您的原始代码的运行时间大约为立方时间,甚至更差。使用命令式可变阵列通常是最快的,按段(也就是Eratosthenes的分段筛)工作。

The "real" imperative sieve of Eratosthenes usually runs at around ~ n1.1 (in n primes produced) and an optimal trial division sieve at ~ n1.40..1.45. Your original code runs at about cubic time, or worse. Using imperative mutable array is usually the fastest, working by segments (a.k.a. the segmented sieve of Eratosthenes).

在第二个代码的上下文中,这是在Python中实现的方式

In the context of your second code, this is how it is achieved in Python.

这篇关于Scala,Erastothenes:有一种直接用迭代替换流的方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆