为什么两种算法查找素数速度不同的那么多,即使他们似乎做迭代相同数量? [英] Why do two algorithms for finding primes differ in speed so much even though they seem to do the same number of iterations?

查看:210
本文介绍了为什么两种算法查找素数速度不同的那么多,即使他们似乎做迭代相同数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现的素数,在Python两种算法。每一个的内环似乎要执行的次数相同,并且同样很简单。然而,他们中的一个需要10倍之多的其他。我的问题是:

  

为什么呢?这是Python的一些怪癖,可以优化掉(如何?),还是我失去了什么东西?

我解决这个问题基本上是从 http://www.spoj.pl/problems/PRIME1 / 。就我而言,我有N = 10 ** 9,增量= 10 ** 5,我想找到的N-三角洲和三角洲之间的所有素数。我也有 smallprimes ,所有的pre-制作清单素数小于或等于N的平方根的第一个算法很简单:

 高清range_f1(LO,HI,smallprimes):
  查找所有素数p与LO< = P< =喜。

  smallprimes都是素数达喜(至少)平方根的排序列表。
  喜&功放;罗可能是大的,但HI-LO + 1 miust放入一个长。

  素数= []
  对我的xrange(HI-LO + 1):
    N = LO +我

    isprime = TRUE
    对p的smallprimes:
        如果n%P == 0:
            isprime =假
            打破

    如果isprime:
        primes.append(N)

  返回素数
 

调用 range_f1(N-Δ,N,smallprimes)需要很长的时间(约10秒)。内环被称为195170次。我也有一个版本的算法替换名单COM prehension循环(这是一个我实际使用的分析;看到问题的结束),但就是没有速度更快。

第二个版本是(一个丑陋的实现)埃拉托色尼的筛子:

 高清range_sieve(LO,HI,smallprimes):
  参数和以前一样

  # 这么难看!但如此之快!怎么样??

  增量= HI-LO + 1
  iamprime = [真] *三角洲#iamprime [I]罗说:无论+ i是素数
  如果LO&其中; = 1:
    iamprime [1  -  LO] = FALSE

  高清sillyfun():#对于分析和放大器;速度对比
    通过

  对p的smallprimes:
    REM = LO%P
    如果REM == 0:
        iamprime [0] = FALSE
    对我的xrange(P  - 雷姆,三角洲,P):
        iamprime [I] =假
        sillyfun()

    如果P> = LO和P =喜:
        iamprime [P  -  LO] = TRUE

  返回[在历数(iamprime)如果p + LO为(P,AMI)AMI]
 

这是约10倍的速度,只需要不到2秒。然而,内环(sillyfun())被调用259982次,超过在最后一种情况下。我茫然地解释为什么这是快的。

我想,也许,是因为第一算法的内循环包含模运算,而第二只有赋值。但是,以下似乎暗示分配不得高于模运算:

 >>>从timeit进口timeit
>>> timeit(10 ** 9%2341234)
0.23445186469234613
>>> timeit(一[5000] =假,一个= [真] * 10000)
0.47924750212666822
 


这里的(不可读)版本,第一种算法实际上我用的:

 高清range_f2(LO,HI,smallprimes):

  素数= []
  对我的xrange(HI-LO + 1):
    N = LO +我

    尝试:
        (1 p在smallprimes如果n%P == 0)。接下来()
    除StopIteration异常:
        primes.append(N)

  返回素数
 

下面是调用分析器的range_f2(的结果)(注意时间的数产生EX pression评估):

 >>>从CPROFILE导入运行为教授
>>>教授(range_f2(N-Δ,N,SP))
 在13.866 CPU秒200005函数调用

 标准名称:由有序

 ncalls tottime percall cumtime percall文件名:LINENO(功能)
      1 0.000 0.000 13.866 13.866&其中;串GT;:1(小于模块&GT)
 195170 12.632 0.000 12.632 0.000 prime1.py:101(<genexpr>)
      1 1.224 1.224 13.865 13.865 prime1.py:90(range_f2)
   4832 0.009 0.000 0.009 0.000 {的'名单'方法'追加'对象}
      1 0.000 0.000 0.000 0.000 {法禁用的_lsprof.Profiler对象}
 

下面是range_sieve探查结果():

 &GT;&GT;&GT;教授(range_sieve(N-Δ,N,SP))
在1.370 CPU秒259985函数调用

标准名称:由有序
ncalls tottime percall cumtime percall文件名:LINENO(功能)
     1 0.003 0.003 1.370 1.370&其中;串GT;:1(小于模块&GT)
     1 0.877 0.877 1.367 1.367 prime1.py:39(range_sieve)
259982 0.490 0.000 0.490 0.000 prime1.py:51(sillyfun)
     1 0.000 0.000 0.000 0.000 {法禁用的_lsprof.Profiler对象}
 

最后,这里是他完成code,产生小的素数的列表(在一个非常愚蠢的方式),这样就可以检查什么样的结果你得到:的 http://pastebin.com/7sfN4mG4

更新按大众的需求,分析数据为code中的第一个块。多少次没有任何数据的内部循环被执行,但似乎pretty的清楚它是一样的三分之一。

 &GT;&GT;&GT;教授(range_f1(N-Δ,N,SP))
      在14.184 CPU秒4835函数调用
标准名称:由有序
ncalls tottime percall cumtime percall文件名:LINENO(功能)
     1 0.000 0.000 14.184 14.184&其中;串GT;:1(小于模块&GT)
     1 14.162 14.162 14.183 14.183 prime1.py:69(range_f1)
  4832 0.021 0.000 0.021 0.000 {的'名单'方法'追加'对象}
     1 0.000 0.000 0.000 0.000 {法禁用的_lsprof.Profiler对象}
 

解决方案

所不同的是一个算法之一。在第一个版本,审判庭,您测试每个候选对所有小素数 - 你当小素数超过候选人** 0.5 不要紧,不停止在范围(10 ** 9 - 10 ** 5,10 ** 9)如果smallprimes具有良好的上限,但它如果范围的长度是多少大相对于上界。对于复合材料,这并不会产生太大的成本,因为他们大多数都至少有一个小的素因子。但是,对于素数,你必须去一路ñ** 0.5 。有大约 10 ** 5 /日志(10 ** 9)在该区间的质数,他们每个人的审判,除以约 10 * * 4.5 /日志(10 ** 4.5)素数,这样就使得关于 1.47 * 10 ** 7 审判对素数师。

在另一方面,用筛子,你只划掉复合材料,每个复合被划掉了很多次,因为它有素因子,而素数都没有划掉(所以素数是免费的)。素因子数 N 是由(的倍数)的log(n)(这是一个粗的上界束缚,往往大大高估),因此给出了一个上限 10 ** 5 *日志(10 ** 9)(次小恒)过境点关闭,约 2 * 10 ** 6 。除此之外,该道口关闭可能比分工更小(不知道Python的,它是C数组)。所以,你正在做的工作,少用筛子。

编辑:收集到的实际号码 10 ** 9-10 ** 5 10 ** 9

 蜱:259987
4832
部门:20353799
4832
 

筛只做259987过路客,你看到原油上部上述约束是一大因素高估。审判庭需要20多万师,这些素数(的16433632 X /登录X 低估素数数, X = 10 * * 4.5 10%左右),3435183都在这个范围内,其最小的质因子比Ñ**(1/3)

I have two algorithms of finding primes, in Python. The inner loop of each one seems to be executed the same number of times, and is equally simple. However, one of them takes 10 times as much as the other. My question is:

Why? Is this some quirk of Python that can be optimized away (how?), or am I missing something else?

The problem I am solving is essentially from http://www.spoj.pl/problems/PRIME1/. In my case, I have N = 10 ** 9, delta = 10 ** 5, and I want to find all primes between N-delta and delta. I also have smallprimes, a pre-made list of all primes less than or equal to square root of N. The first algorithm is very simple:

def range_f1(lo, hi, smallprimes):
  """Finds all primes p with lo <= p <= hi. 

  smallprimes is the sorted list of all primes up to (at least) square root of hi.
  hi & lo might be large, but hi-lo+1 miust fit into a long."""

  primes =[]
  for i in xrange(hi-lo+1):
    n = lo + i

    isprime = True
    for p in smallprimes:
        if n % p == 0:
            isprime = False
            break

    if isprime:
        primes.append(n)

  return primes

Calling range_f1(N-delta,N,smallprimes) takes a long time (about 10 seconds). The inner loop is called 195170 times. I also have a version of this algorithm that replaces the loop with a list comprehension (That is the one I actually use for profiling; see the end of the question) but that is no faster.

The second version is (an ugly implementation of) the sieve of Eratosthenes:

def range_sieve(lo, hi, smallprimes):
  """Parameters as before"""

  # So ugly! But SO FAST!! How??

  delta = hi-lo+1
  iamprime = [True] * delta      # iamprime[i] says whether lo + i is prime
  if lo<= 1:
    iamprime[1 - lo] = False

  def sillyfun():      # For profiling & speed comparison
    pass

  for p in smallprimes:
    rem = lo % p
    if rem == 0:
        iamprime[0] = False
    for i in xrange(p - rem, delta, p):
        iamprime[i] = False
        sillyfun()

    if p >= lo and p <= hi:
        iamprime[p - lo] = True

  return [p + lo for (p, ami) in enumerate(iamprime) if ami]

This is about 10 times as fast, takes less than 2 seconds. However, the inner loop (sillyfun()) is called 259982 times, more than in the last case. I am at a loss to explain why this is fast.

I thought that maybe the reason is because the inner loop of the first algorithm contains modular arithmetic, while the second only has an assignment. However, the following seems to imply that assignment is no faster than modular arithmetic:

>>> from timeit import timeit
>>> timeit("10**9 % 2341234")
0.23445186469234613
>>> timeit("a[5000]=False", "a = [True] * 10000")
0.47924750212666822


Here's the (less readable) version of the first algorithm I actually use:

def range_f2(lo, hi, smallprimes):

  primes =[]
  for i in xrange(hi-lo+1):
    n = lo + i

    try:
        (1 for p in smallprimes if n % p ==0).next()
    except StopIteration:
        primes.append(n)

  return primes

Here are the result of calling the profiler for range_f2() (note the number of time generating expression is evaluated):

>>> from cProfile import run as prof
>>> prof("range_f2(N-delta,N,sp)")
 200005 function calls in 13.866 CPU seconds

 Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000   13.866   13.866 <string>:1(<module>)
 195170   12.632    0.000   12.632    0.000 prime1.py:101(<genexpr>)
      1    1.224    1.224   13.865   13.865 prime1.py:90(range_f2)
   4832    0.009    0.000    0.009    0.000 {method 'append' of 'list' objects}
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Here is the profiler result for range_sieve():

>>> prof("range_sieve(N-delta,N,sp)")
259985 function calls in 1.370 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.003    0.003    1.370    1.370 <string>:1(<module>)
     1    0.877    0.877    1.367    1.367 prime1.py:39(range_sieve)
259982    0.490    0.000    0.490    0.000 prime1.py:51(sillyfun)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Finally,here is he complete code that generates the lists of small primes (in a very silly way) so that you can check what results you get: http://pastebin.com/7sfN4mG4

UPDATE By popular demand, the profiling data for the first chunk of code. No data on how many times the inner loop is executed, but it seems pretty clear it's the same as the third.

>>> prof("range_f1(N-delta,N,sp)")
      4835 function calls in 14.184 CPU seconds
Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000   14.184   14.184 <string>:1(<module>)
     1   14.162   14.162   14.183   14.183 prime1.py:69(range_f1)
  4832    0.021    0.000    0.021    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

解决方案

The difference is an algorithmic one. In the first version, trial division, you test each candidate against all small primes - that you don't stop when the small prime exceeds candidate ** 0.5 doesn't matter for the range(10**9 - 10**5 ,10**9) if smallprimes has a good upper bound, but it would if the length of the range were much larger in relation to the upper bound. For composites, that doesn't incur much cost, since most of them have at least one small prime divisor. But for primes, you have to go all the way to N**0.5. There are roughly 10**5/log(10**9) primes in that interval, each of them is trial-divided by about 10**4.5/log(10**4.5) primes, so that makes about 1.47*10**7 trial divisions against primes.

On the other hand, with the sieve, you only cross off composites, each composite is crossed off as many times as it has prime divisors, while primes aren't crossed off at all (so primes are free). The number of prime divisors of n is bounded by (a multiple of) log(n) (that's a crude upper bound, usually greatly overestimating), so that gives an upper bound of 10**5*log(10**9) (times a small constant) crossings-off, about 2*10**6. In addition to that, the crossing off may be less work than a division (don't know about Python, it is for C arrays). So you're doing less work with the sieve.

Edit: collected the actual numbers for 10**9-10**5 to 10**9.

Ticks: 259987
4832
Divisions: 20353799
4832

The sieve does only 259987 crossings-off, you see that the crude upper bound above is overestimating by a large factor. The trial division needs more than 20 million divisions, 16433632 of those for primes (x/log x underestimates the number of primes, for x = 10**4.5 by about 10%), 3435183 are used for the 3297 numbers in that range whose smallest prime factor is larger than n**(1/3).

这篇关于为什么两种算法查找素数速度不同的那么多,即使他们似乎做迭代相同数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆