为什么我的埃拉托色尼筛这么慢? [英] Why is my Sieve of Eratosthenes so slow?

查看:225
本文介绍了为什么我的埃拉托色尼筛这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

即时通讯解决上项目欧拉一些问题,不得不产生200万素数来解决问题。我实现埃拉托色尼的筛的竟然是极其缓慢的,但我不太知道为什么。可能有人请解释的主要问题与此实现。我认为这是如此pretty的,然后我想通了,这是完全可怕:(。我发现了另一种实现它的网络,它是如此的速度远远超过我的。

 高清generatePrimes(上界):
    号码=范围(2,上界+ 1)的
    素数= []

    而数字:
        素数= [0]
        primes.append(素)
        数=过滤器((波长X:X%黄金),数字)
    返回素数
 

编辑:感谢所有的答案!这个结论是,它是过滤器,这是问题,因为它经历的每一个元素(而不是仅那些被标记为不是素数),也因为它创建一个新的列表每次。重写了它具有良好的旧的循环和一轮过滤和它的作品要快得多。新的code:

 高清generatePrimes(上界):
号码=范围(2,上界+ 1)的

对我的xrange(LEN(数字)):
    如果(数字[I]!= 0):
        对于j中的xrange第(i +号码[I],LEN(号码),数字[I]):
            号码[J] = 0

素数=滤波器(拉姆达X:X,数字)
返回素数
 

解决方案

埃拉托色尼的筛看起来是这样的:

 高清筛(N):
    primality_flags = [真] *(N + 1)
    标志[0] =标志[1] =假
    素数= []
    对于我,在历数(primality_flags)标志:
        如果标志:
            primes.append㈠
            对于j中的xrange(2 * I,N + 1,i)的:
                国旗[我] = FALSE
 

它处理每一个数字,一旦当外环到达它,而一旦对每个主要的鸿沟了。约1/2的数字是整除2,约三分之一是被3整除,等等;渐近地讲,对于每一种数目将被处理的平均数量是素数的倒数的直到n + 1的总和。 这笔款项是关于日志(日志(N)) ,使筛子具有时间复杂度为O(n *日志(的log(n))),假设算术是恒定的时间。这是非常好的。


您的函数不这样做。你的过滤器越过每一个元素数字,无论它是整除主要。每个元素被处理为每一个素数,直到这除以它的第一个素数,以及处理所述素数p删除的号中的元素的约1 /第让素数的序列为p [0],P [1],第[2],等等,并让的尺寸的顺序为n [0]中,n [1]中,n [2],等等,我们有以下的近似复发

  N [0] =上界 -  1
N [1] = N [0] *(P [0] -1)/ P [0]
N [2] = N [1] *(第[1] -1)/ P [1]
...
N [K + 1] = N [k]的*(第[k]的-1)/ P [k]的
 

和你的算法需要时间大致成正比的 N 值,直到数字之和是空的。我还没有分析了一系列的行为,而是计算显示增长比差远了为O(n *日志(的log(n)))

Im solving some problems on Project Euler and had to generate 2 million primes to solve a problem. My implementation of the Sieve of Eratosthenes turned out to be EXTREMELY slow but I don't quite know why. Could someone please explain the major problems with this implementation. I thought it was so pretty, and then I figured out it is utterly terrible :(. I found another implementation of it online and it was so much faster than mine.

def generatePrimes(upperBound):
    numbers = range(2,upperBound+1)
    primes = []

    while numbers:
        prime = numbers[0]
        primes.append(prime)
        numbers = filter((lambda x: x%prime),numbers)
    return primes

EDIT: Thanks for all the answers! Conclusions of this is that it is the filter that is the problem because it goes through every element (instead of only those that are to be marked as not prime) and because it creates a new list every time. Rewrote it with good old for loops and one round of filtering and it works much faster. New code:

def generatePrimes(upperBound):
numbers = range(2,upperBound+1)

for i in xrange(len(numbers)):
    if(numbers[i] != 0):
        for j in xrange(i+numbers[i],len(numbers),numbers[i]):
            numbers[j] = 0

primes = filter(lambda x: x,numbers)
return primes

解决方案

The Sieve of Eratosthenes looks like this:

def sieve(n):
    primality_flags = [True]*(n+1)
    flags[0] = flags[1] = False
    primes = []
    for i, flag in enumerate(primality_flags):
        if flag:
            primes.append(i)
            for j in xrange(2*i, n+1, i):
                flags[i] = False

It processes each number once when the outer loop reaches it, and once for every prime that divides it. About 1/2 the numbers are divisible by 2, about 1/3 are divisible by 3, and so on; asymptotically speaking, the average number of times each number will be processed is 1 + the sum of the reciprocals of the primes up to n. This sum is about log(log(n)), so the sieve has asymptotic time complexity O(n*log(log(n))), assuming arithmetic is constant time. This is really good.


Your function doesn't do that. Your filter goes over every element in numbers, regardless of whether it's divisible by prime. Each element is processed for every prime up until the first prime that divides it, and processing the prime p removes about 1/p of the elements of numbers. Letting the sequence of primes be p[0], p[1], p[2], etc., and letting the sequence of sizes of numbers be n[0], n[1], n[2], etc., we have the following approximate recurrence:

n[0] = upperBound - 1
n[1] = n[0] * (p[0]-1)/p[0]
n[2] = n[1] * (p[1]-1)/p[1]
...
n[k+1] = n[k] * (p[k]-1)/p[k]

and your algorithm takes time roughly proportional to the sum of the n values up until numbers is empty. I haven't analyzed the behavior of that series, but calculations show the growth is much worse than O(n*log(log(n))).

这篇关于为什么我的埃拉托色尼筛这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆