阿特金的筛 [英] The Sieve of Atkin

查看:189
本文介绍了阿特金的筛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力学习的算法来生成素数的和我对面阿特金的筛来了维基百科。我理解算法的几乎所有地区,除了少数。这里有几个问题:

  1. 如何三个二次方程下方形成的呢? 4X ^ 2 + Y ^ 2,3 ^ 2 + Y ^ 2和3 ^ 2-Y2
  2. 在维​​基百科会谈约模60,但我不知道如何/在哪里用于在psudo code以下的算法。
  3. 如何为这些提醒1,5,7和11发现?

下面是维基百科的伪code,以供参考:

  //任意搜索限制
限制←百万

//初始化筛
因为我在[5,限制]:is_prime(一)←假

//放在候选素数:
//整数具有奇数个
//重新presentations某些二次型
为(X,Y)在[1,√limit]×[1,√limit]:
    ñ←4x²+ Y 2
    如果(正≤界限)和(N模12 = 1或正模12 = 5):
        is_prime(N)←¬is_prime(N)
    ñ←3x²+ Y 2
    如果(正≤界限)和(N模12 = 7):
        is_prime(N)←¬is_prime(N)
    ñ←3x²-Y 2
    如果(X GT; y)的和(n≤界限)和(N模12 = 11):
        is_prime(N)←¬is_prime(N)

//消除复合材料通过筛分
对于n [5,√limit]:
    如果is_prime(N):
        // n是素数,忽略它的平方的倍数;这是
        //足够的,因为它设法复合材料
        //名单上不能无平方
        is_prime(K)←假,K∈{N²,2n²,3n²,...,限}

打印2,3
对于n [5,限制]:
    如果is_prime(N):打印传单N
 

解决方案

的阿特金伪$ C的筛从维基百科的文章你报$ C包含了回答您的问题或有关这将尼斯提供了一个链接做了文章的讨论,虽然你可能无法把信息一起。简短的回答如下:

  1. 这三个等式来自阿特金的数学证明一切 素数会发生的一种或多种的那些三的解决方案 与所有有效适当的模条件方程 变量'X'和'Y'的价值观,而事实上他进一步证实 真正的素数将是那些具有奇数号 当切换一个奇怪的解决方案,这三个等式(左为真 与模数从虚假初始条件次)数 每个条件不包括那些可整除的数字 发现素数小于或等于的平方根的平方 测试号码。

  2. 阿特金的真正筛是基于一组模60的条件; 这个伪code再presents简化那里有少 使用一组模12的每一个方程的条件范围 条件(5×12 = 60) - 然而,这将导致在那里为20% 由于引入了冗余在这个新的额外的工作量 设定的条件。它也是这个简化的原因 伪code需要启动它的首要扫描5,而不是 正常的7做基素数平方免费淘汰赛 开始以5基本素在执行时间的增加的成本 作为5的因素不另外妥善处理。原因 对于这种简化是可能要牺牲一些额外的code 开销,以缓解code复杂性,不必检查 值,尽管这可以用一个表来完成查询 而额外在工作的30%,由于使用模12不能 降低。

  3. 在提醒(应拼写的)被发现 mod在你所引用的伪code运算符代表的的的 运营商在任何计算机语言可能会用到,经常 再由计算机语言,如C,Java中的%符号psented $ P $, C#,F#,等等。这个操作符产生整数剩余 股利的整数除法(第一个数字后, 前的模)由除数(第二数字的,后 在MOD符号)。究其原因,该余数只有四个 而不是作为在全模中使用的16 60算法是由于 减少到一个模12的算法的简化。你会发现 与模12的条件下,4倍的条件经过25 这通常通过模被淘汰60的条件,因此许多复合材料含有25作为一个因素需要是 由5平方免费检查额外素淘汰。同样,55 通过了3X +检查和35通过了3x-检查,他们 不会为全模60算法。

随着讨论的维基百科的文章对话栏目中,并暗示以上,这个报价伪code是从来没有超过甚至是基本的赔率只有筛埃拉托色尼的(SOE)更快实现更遑论一个使用的相同程度轮因式分解由于其低效率:变量'x'和'y'的不需要的全方位范围超过该范围的平方根过筛对于许多情况下(在文章中提及),正确使用的模60的轮恢复在使用的模12的简化的冗余(如余上面提到的),并有在所述的解决方案的二次方程式,使得使用(计算慢)的条件的模操作模格子图案不必由被测试使用算法自动跳过这些解决方案根据格子图案(只提到非常隐晦地在全阿特金斯和伯恩斯坦纸),将不能满足这些条件的模

要回答一个问题,你没有问,但应该有:?为什么要用阿特金的筛

这阿特金的筛(SOA)被实现,而不是埃拉托塞尼的筛(SOE)的主要的原因是,它是互联网知识的SOA是更快。有两个原因信念,如下:

  1. 在SOA被认为是更快,因为渐进计算 复杂性是少它比国有企业通过日志的因素(log n)的 其中n是素数的筛分的范围内。实事求是地讲,去 从一系列2的32次方(四十亿加),以二至 电源64(约2后跟19个零),该因子是6以上 5等于1.2。这意味着,在真实环境状况应该采取的1.2倍 只要筛分给料时,仅通过一个线性比例 的64位数字范围相比,32位数字的范围,而 SOA的将有一个线性关系如果一切都完美。您 可以AP preciate,第一,这是一个非常小的因素对于这样的 在数量巨大的差异范围;第二,这个好处只有 有效,如果两种算法具有相同或接近相同的 表现在一些合理的范围内的素数。

    什么是不清楚地了解了网络知识是 这些数字只在一个比较的比例申请 在给定范围内的性能相比,另一个给定的范围 作为相同的算法,而不是与不同的比较 算法。因此,它是无用的一个证明,SOA将成为 不是因为SOA国有企业更快地可以从一个更大的开销 对于特定的环境状况的算法作为在一个给定的筛范围 下面优化环境状况的例子。

  2. 在SOA被认为是因为计算的快 对比决定,由阿特金与伯恩斯坦发表按他们的 纸维基百科的文章链接。虽然工作是准确的, 它仅适用于他们做对的人为限制 环境状况算法他们比较:因为SOA算法基于 模60分解从而重新presents 2 2,3,5分解 轮旋转,他们也限制了国有企业的算法相同 轮分解。这样做,国有企业确实约424,000 超过十亿的范围内合数宰杀操作测试, 而全面优化的SOA为测试大约有326,000组合 切换和方免费剔除操作,每次取有关 同时,在国有企业,由于每个合数扑杀操作 写在相同的风格。这意味着,SOA是关于 30%,比国有企业更快地为这个特定的调整轮分解的 条件,那就是几乎正是阿特金斯和伯恩斯坦 比较试验结果显示。

    不过,SOA并没有轮子的水平进一步回应 因式分解为2,3,5级是烤机的算法, 而国有企业不应对进一步的水平:使用2,3,5,7 轮分解导致操作的大约相同数量的 意思大致相同的性能为每个。我们可以使用甚至 轮分解比2,3,5,7级分较高的水平 获取操作的次数为环境状况比SOA少约16.7%, 对比例更好的性能。所需的优化 实行轮分解这些额外的水平实际上 比code的复杂性来实现原来的优化更简单 SOA。实施这些可比页的内存占用 分段实施约的一样,是大小 页面缓存加上基素数数组。

    在此外,无论将受益于被写入一个机 状态查询的风格,将更好地优化使用帮助 内联code和极端循环展开,但国有企业的应景更多 从这种风格不是SOA中,由于是一个简单的算法。

因此​​,对于筛范围高达约32位数范围内,最大限度地优化环境状况是快大约20%(或更多与轮进一步分解)不是SOA;然而,SOA有这个渐进的计算复杂度的优势,所以会有一些点上,它赶上。这点将在约其中log的比(log n)的记录的范围(日志(2 ^ 32))或5为1.2至第十九功率范围为约2倍10 - 一种非常大的数字。 如果现代桌面计算机上运行的优化SOA分别采取有关的第二个三分之一来计算,在32位数字范围内的素数,和,如果的实施是理想采取线性时间的增加而增加的范围,这将需要大约45年时间计算素数在该范围内。然而,素数,在这些高范围分析在小块,为此,相比于环境状况为非常大量的筛使用SOA的,在理论上是实际经常进行,但由一个非常小的因子

EDIT_ADD:事实上,无论是页面分段环境状况也不是SOA继续为这些巨大的范围线性时间运行相比,较低的范围,既遇到的问题与反复和扑杀行动失去效率,因为不必跳过大量的每次跳页;这意味着这两个需要修改算法来处理这​​种页跳跃以及极微弱的优势到SOA可被完全擦除是否存在的方式中这样做任何轻微的差别。的SOA已经在跳转表比环境状况有更多的方面通过找到约达的范围内,以该范围的平方根的质数的数目之间的反比,而这可能会增加一个为O(log n)的术语既在处理,但对于具有在跳表多个项越高SOA一个常数因子较大的增加。这个额外的事实将pretty的多少完全抵消了SOA在国有企业的任何优势,即使是非常大的范围。此外,SOA已经实施了三个独立的二次方程的条件加上黄金广场免费的循环,而不仅仅是一个素数剔除循环所需的更多的情况下,更多的单个环路的恒定负载,所以绝不能有低的每个操作的平均计算时间为国企当了全面优化。 END_EDIT_ADD

EDIT_ADD2:这是我认为很多有关阿特金的筛的混乱是由于自引的问题维基百科的文章伪code中的不足之处,所以有拿出伪code,解决至少某些不足之处做的'X'和'Y'变量和模12与模60混淆如下范围内的稍微好一些的版本:

 限制←10亿//任意搜索限制

//初始化筛
因为我在{7,11,13,17,19,23,29,31,37,41,43,47,49,53,59,61,...}:
    is_prime(一)←假

//放入候选素数:
//整数具有奇数个
//重新presentations某些二次型。
而N≤限制,正←4x²+ Y 2,其中x∈{1,2,...}和y∈{1,3,...} //奇数个Y
    如果N模60∈{1,13,17,29,37,41,49,53}:
        is_prime(N)←¬is_prime(N)
而N≤限制,正←3x²+ Y 2,其中x∈{1,3,...}和y∈{2,4,...} //只奇数
    如果N模60∈{7,19,31,43} // X的,甚至ÿ的
        is_prime(N)←¬is_prime(N)
而N≤限制,正←3x²-Y 2,其中x∈{2,3,...}和y∈{X-1,X-3,...,1} //所有
    如果N模60∈{11,23,47,59}://奇/偶奇/偶组合
        is_prime(N)←¬is_prime(N)

//消除复合材料筛选,只为那些出现在车轮
对于N²≤限制,其中n∈{7,11,13,17,19,23,29,31,37,41,43,47,49,53,59,61,...}:
    如果is_prime(N):
        // n是素数,忽略它的平方的倍数;这是
        //足够的,因为它设法复合材料
        //名单上不能无平方
        而C≤限制,C←N²×k其中
                      ķ∈{1,7,11,13,17,19,23,29,31,37,41,43,47,49,53,59,...}:
            is_prime(三)←假

输出2,3,5
对于N≤极限当n←60×K + x,其中
  ķ∈{0 ..}和
  x∈{7,11,13,17,19,23,29,31,37,41,43,47,49,53,59,61}:
    如果is_prime(N):输出N
 

上面似乎很简单,是一个相当好的算法,除了它仍然不大于埃拉托塞尼的一个基本的筛使用相同2更快; 3; 5因式分解轮因为它浪费它的内部循环切换操作的几乎一半失败的模测试。为了说明:

以下是在一系列的X(每个值)的15垂直价值观和'Y'水平15奇数值出线模60年代的重复4x²+ Y 2模式;双方从1开始。注意,它们是对称的围绕垂直轴,但只有128出来的225或256出450为'x'的值是有效的肘节的动作的完整的30号范围:

  0 13 29 53 0 0 53 49 53 0 0 53 29 13 0
 17 0 41 0 37 17 0 1 0 17 37 0 41 0 17
 37 0 1 0 0 37 0 0 0 37 0 0 1 0 37
  0 13 29 53 0 0 53 49 53 0 0 53 29 13 0
 41 49 0 29 1 41 29 0 29 41 1 29 0 49 41
  0 0 49 13 0 0 13 0 13 0 0 13 49 0 0
 17 0 41 0 37 17 0 1 0 17 37 0 41 0 17
 17 0 41 0 37 17 0 1 0 17 37 0 41 0 17
  0 0 49 13 0 0 13 0 13 0 0 13 49 0 0
 41 49 0 29 1 41 29 0 29 41 1 29 0 49 41
  0 13 29 53 0 0 53 49 53 0 0 53 29 13 0
 37 0 1 0 0 37 0 0 0 37 0 0 1 0 37
 17 0 41 0 37 17 0 1 0 17 37 0 41 0 17
  0 13 29 53 0 0 53 49 53 0 0 53 29 13 0
  1 0 0 49 0 1 49 0 49 1 0 49 0 0 1
 

以下是在一系列的X垂直和15甚至价值观'Y'的水平的5奇数值出线模60年代的重复3x²+ Y 2模式。请注意,他们是对称的绕水平轴,但只有48出75或144出450的整整30号范围'X'值是有效的切换操作,因为所有的144出450无效的操作与连'X'和奇数'Y'已经被淘汰:

  7月19日0 7 0 43 19 19 0 43 7 0 19 7 0
 31 43 0 31 7 0 43 43 0 7 31 0 43 31 0
 19 31 0 19 0 0 31 31 0 0 19 0 31 19 0
 31 43 0 31 7 0 43 43 0 7 31 0 43 31 0
  7月19日0 7 0 43 19 19 0 43 7 0 19 7 0
 

以下是在一系列的X垂直和15甚至价值观'Y'的水平的5奇数值出线模60年代的重复3x²-Y 2的模式。注意,它们是对称的绕水平轴,但只有48个75或224总分450为'x'的值的完整的30号范围是有效的切换操作:

  59 47 0 59 23 0 47 47 0 23 59 0 47 59 0
 23 11 0 23 47 0 11 11 0 47 23 0 11 23 0
 11 59 0 11 0 0 59 59 0 0 11 0 59 11 0
 23 11 0 23 47 0 11 11 0 47 23 0 11 23 0
 59 47 0 59 23 0 47 47 0 23 59 0 47 59 0
 

以下是通过一系列的'X'垂直和15多'Y'值水平5偶数值出线模60年代的重复3x²-Y 2的模式。注意,它们是对称的围绕垂直轴,但只有48个75或224总分450为'x'的值的完整的30号范围是有效的切换操作:

  11 0 47 23 0 11 23 0 23 11 0 23 47 0 11
 47 0 23 59 0 47 59 0 59 47 0 59 23 0 47
 47 0 23 59 0 47 59 0 59 47 0 59 23 0 47
 11 0 47 23 0 11 23 0 23 11 0 23 47 0 11
 59 0 0 11 0 59 11 0 11 59 0 11 0 0 59
 

作为一个可以通过上述表的检验确定,为伪code如上述有一个整体的重复范围的x的30个值(包括找齐和赔率),其只具有688有效操作出了一个共有1125联合作战;因而它浪费了处理的只是有条件跳过由于非生产性跳跃操作被最内loops.There的部分是两个可能的值的方法,以消除命中冗余低效几乎一半,如下:

  1. 如在Atkin的和伯恩斯坦纸,它使用识别图案中的组合模数所概述的方法,'x'和'y'来仅处理使用的事实,一旦人们找到一个模命中定模上的图案,然后有值中的一些模(等于车轮位的位置),其中每个图案是由一个易于计算(可变)隔开抵消它的形式当前位置加上阿倍的当前值的无限序列'x'的加B和当前位置加上C倍'Y'加D,其中A,B,C和D,是简单的常数(简单的意思小容易操作)的当前值。伯恩斯坦使用的方法有很多查找表(LUT)的更多帮助。

  2. 创建由X结合模电流值索引的总体格局状态查找表(LUT)(每个四个表上面加一个用于轻微黄金广场自由循环)的方法, 'y'的的模找到的A,B,C和D参数跳过,而不是下一个模式,但只是为了在水平序列中的下一个位置。这很可能是更高的性能选项,因为它使用内联的展开循环code更容易使得每个操作减少极端时钟周期,并且会产生更有效的code为大范围时,页面分割为跳跃每个回路都在平均水平以下。这将可能使每个循环的时钟周期接近一个高度优化的埃拉托色尼筛在primesieve 的,但不太可能得到想象中的那么低,由于不必计算变量步长,而不是能够使用国有企业固定的黄金偏移。

因此​​,有减少的运行时间的首要筛子3理事目标,如下:

  1. 一个成功的筛减少操作次数,甚至在打优化SOA失败相比,高轮分解的环境状况约16.7%为约数十亿的范围。

  2. 一个成功的筛降低了每个操作的CPU时钟周期,这是因为它的操作更为复杂,因为相较于高度优化环境状况的SOA失败如primesieve由于变量递增,可能再次通过20%〜60 %。阿特金与伯恩斯坦的primegen(SOA)的范围中的一个十亿每个大约需要4.5相比,每运行约2.5个时钟周期primesieve(SOE),但也许SOA可以通过借用一些优化技术有所加快primesieve。

  3. 一个成功的筛具有相当低的code的复杂性,使其可以更容易地扩展到使用这种技术的水桶筛选和其他页面分割优化更大的范围。对于这个阿特金的筛不幸地失败了,因为它得到的页面分割大批范围成倍更加复杂。这是非常复杂的写一个SOA计划,将与连伯恩斯坦的primegen(SOA)竞争,更不用说primesieve,而这是很容易写code来接近相同的性能primesieve。

  4. 一个成功的筛具有较低的经验性的复杂性,其中所​​述SOA并具有以上的环境状况通过(日志(log n)的,其中n是被筛分上部范围的一个因素,但额外的小比率不可能永远不足以弥补上述前两个组合的损失率,因为这一因素变得日益范围较小。 END_EDIT_ADD2

所以,这个问题的答案为什么要用阿特金的筛?没有理由使用它了,如果国有企业与最大车轮分解优化,直至实现过筛后的数字是非常大的(64位数字,范围和更高的),然后在SOA的优势是非常小的,也许不是变现都取决于相对优化的非常小的调整。

作为一个答案的阿特金问题,另一个类似筛子,我已经发布了一个网页分割C#版本优化按照上面的国有企业:<一href="http://stackoverflow.com/a/20559687/549617">http://stackoverflow.com/a/20559687/549617.

I have been trying to learn algorithms for generating prime numbers and I came across Sieve of Atkin on Wikipedia. I understand almost all parts of the algorithm, except a few. Here are the questions:

  1. How are the three quadratic equations below formed? 4x^2+y^2, 3x^2+y^2 and 3x^2-y2
  2. The algorithm in wikipedia talks about modulo sixty but I dont understand how/where that is used in the psudocode below.
  3. How are these reminders 1,5,7 and 11 found?

Below is the pseudocode from Wikipedia for reference:

// arbitrary search limit                                                   
limit ← 1000000                                                             

// initialize the sieve                                                     
for i in [5, limit]: is_prime(i) ← false                                    

// put in candidate primes:                                                 
// integers which have an odd number of                                     
// representations by certain quadratic forms                               
for (x, y) in [1, √limit] × [1, √limit]:                                    
    n ← 4x²+y²                                                              
    if (n ≤ limit) and (n mod 12 = 1 or n mod 12 = 5):                      
        is_prime(n) ← ¬is_prime(n)                                          
    n ← 3x²+y²                                                              
    if (n ≤ limit) and (n mod 12 = 7):                                      
        is_prime(n) ← ¬is_prime(n)                                          
    n ← 3x²-y²                                                              
    if (x > y) and (n ≤ limit) and (n mod 12 = 11):                         
        is_prime(n) ← ¬is_prime(n)                                          

// eliminate composites by sieving                                          
for n in [5, √limit]:                                                       
    if is_prime(n):                                                         
        // n is prime, omit multiples of its square; this is                
        // sufficient because composites which managed to get               
        // on the list cannot be square-free                                
        is_prime(k) ← false, k ∈ {n², 2n², 3n², ..., limit}                 

print 2, 3                                                                  
for n in [5, limit]:                                                        
    if is_prime(n): print n  

解决方案

The Sieve of Atkin pseudo code from the Wikipedia article you've quoted contains the answers to your questions or the discussion about the article for which Will Ness has provided a link does, although you may not be able to put the information together. Short answers are as follows:

  1. The three equations come from Atkin's mathematical proof that all primes will occur as the solutions of one or more of those three equations with the appropriate modulo conditions for all valid values of the variables 'x' and 'y', and in fact he further proved that true primes will be those numbers that have an odd number of solutions to those three equations (left as True when toggled an odd number of times from initial conditions of False) with the modulo conditions for each excluding those numbers which are divisible by squares of found primes less than or equal to the square root of the tested number.

  2. The true Sieve of Atkin is based on a set of modulo 60 conditions; this pseudo code represents a simplification where there are less ranges of conditions for each equation using a set of modulo 12 conditions (5 × 12 = 60) - however, this results in there being 20% extra work being done because of introduced redundancy in this new set of conditions. It is also the reason that this simplified pseudo code needs to start its prime scan at 5 rather than the normal 7 and to do the base primes squares free eliminations starting at a base prime of 5 at an added cost in execution time as the factors of 5 are not otherwise handled properly. The reason for this simplification is perhaps to sacrifice some extra code overhead in order to ease the code complexity in having to check values, although this can be done with a single table look up whereas the extra over 30% in work due to using modulo 12 can not be reduced.

  3. The "reminders" (should be spelled remainders) are found by the 'mod' operators in the pseudo code you quoted which stand for the modulo operator in whatever computer language one might use, often represented by the "%" symbol in computer languages such as C, Java, C#, F#, et cetera. This operator yields the integer remainder after an integer division of the dividend (the first of the numbers, before the 'mod') by the divisor (the second of the numbers, after the 'mod' symbol). The reason that the remainders are just four rather than the 16 as used in the full modulo 60 algorithm is due to the simplifications of reducing to a modulo 12 algorithm. You'll notice that with the modulo 12 conditions, the "4x" condition passes 25 which would normally be eliminated by the modulo 60 conditions, thus many composites containing 25 as a factor need to be eliminated by the additional prime of 5 square free check. Similarly, 55 passes the "3x+" check and 35 passes the "3x-" check where they wouldn't for the full modulo 60 algorithm.

As discussed in the Wikipedia article "Talk" section and hinted above, this quoted pseudo code is never much faster than even basic odds-only Sieve of Eratosthenes (SoE) implementations let alone one that uses the same degree of wheel factorization due to its inefficiencies: the variables 'x' and 'y' do not need to range over the full range to the square root of the range sieved for many cases (mentioned in the article), the proper use of the modulo 60 wheel recovers the redundancies in the use of the modulo 12 simplification (as I mention above), and there are modulo lattice patterns in the solutions to the quadratic equations such that the conditions using the (computationally slow) modulo operations need not be tested by the use of an algorithm that automatically skips those solutions that would not satisfy those modulo conditions according to the lattice patterns (only mentioned very obscurely in the full Atkin and Bernstein paper).

To answer a question you didn't ask but should have: "Why use the Sieve of Atkin?".

The main reason that the Sieve of Atkin (SoA) is implemented rather than the Sieve of Eratosthenes (SoE) is that it is "Internet Knowledge" that the SOA is faster. There are two reasons for this belief, as follows:

  1. The SoA is assumed to be faster because the asymptotic computational complexity is less for it than for the SoE by a factor of log(log n) where n is the range of primes sieved. Practically speaking, going from a range of two to the power 32 (four billion plus) to two to the power 64 (about 2 followed by 19 zeros), this factor is six over five equals 1.2. That means that the true SoE should take 1.2 times as long as expected by just a linear ratio when sieving to the 64-bit number range as compared to the 32-bit number range, whereas the SoA will have a linear relationship if all were ideal. You can appreciate that, first, this is a very small factor for such a huge difference in number ranges and, second, that this benefit only holds if the two algorithms had the same or close to the same performance at some reasonable range of primes.

    What isn't clearly understood by that "Internet knowledge" is that these figures only apply when one is comparing the ratio of performance over a given range as compared to another given range for the same algorithm, not as a comparison between different algorithms. Thus it is useless as a proof that the SoA will be faster than the SoE since the SoA could start with a larger overhead for a given sieve range of a particular SoE algorithm as in the following optimized SoE example.

  2. The SoA is believed to be faster because of the computational comparison made and published by Atkin and Bernstein as per their paper linked in the Wikipedia article. While the work is accurate, it only applies to the artificial limitations that they made on the SoE algorithm they compared: Since the SoA algorithm is based on modulo 60 factorization which represents two 2,3,5 factorization wheel rotations, they also limited the SoE algorithm to that same wheel factorization. Doing this, the SoE does about 424,000 composite number cull operations over the one billion range tested, whereas a fully optimized SoA as tested has about 326,000 combined toggle and square free cull operations, which each take about the same time as each composite number cull operation in the SoE due to being written in the same style. This means that the SoA is about 30% faster than the SoE for this particular set of wheel factorization conditions, and that is just about exactly what the Atkins and Bernstein comparison test showed.

    However, the SoA does not respond to further levels of wheel factorization as the 2,3,5 level is "baked-in" to the algorithm, whereas the SoE does respond to further levels: using a 2,3,5,7 wheel factorization results in about the same number of operations meaning about the same performance for each. One can use even a partial higher level of wheel factorization than the 2,3,5,7 level to get the number of operations for SoE about 16.7% less than SoA, for a proportional better performance. The optimizations required to implement these extra levels of wheel factorization are actually simpler than the code complexity to implement the original optimized SoA. The memory footprint for implementing these comparable page segmented implementations is about the same, being the size of the page buffers plus the base primes array.

    In addition, both would benefit from being written in a "machine state look up" style which would help in better optimization using inlined code and extreme loop unrolling but the SoE befits more from this style than the SoA due to being a simpler algorithm.

So for sieve ranges up to about the 32-bit number range, the maximally optimized SoE is about 20% faster (or more with further wheel factorization) than the SoA; however, the SoA has this asymptotic computational complexity advantage, so there will be some point at which it catches up. That point will be at about the range where the ratio of log (log n) to log (log (2^32)) or 5 is 1.2 or a range of about 2 times ten to the nineteenth power - an exceedingly large number. If the optimized SoA run on a modern desktop computer were to take about a third of a second to compute the primes in the 32-bit number range, and if the implementation were ideal as in taking linear time increase with increasing range, it would take about 45 years to compute the primes to this range. However, analysis of prime numbers in these high ranges is often done in small chunks, for which the use of the SoA would be theoretically practical as compared to the SoE for very large number sieves, but by a very small factor.

EDIT_ADD: In actual fact, neither the page segmented SoE nor the SoA continue to run in linear time for these huge ranges as compared to the lower ranges as both run into problems with the "toggling" and "culling" operations losing efficiency due to having to skip large numbers of pages per jump; this means that both will require modified algorithms to handle this "page jumping" and the very slight advantage to the SoA may be completely erased if there are any slight differences in the way that this is done. The SoA has many more terms in the "jump tables" than the SoE by about the inverse ratio between of the number of primes found up to the square root of the range to that range, and this will likely add a O(log n) term to both in processing but a constant factor larger increase for the SoA which has a higher number of entries in the "jump table". This extra fact will pretty much completely cancel out any advantage of the SoA over the SoE, even for extremely large ranges. Further, the SoA has a constant overhead of more individual loops required for the many more cases in implementing the conditions for the three separate quadratic equations plus the "prime squares free" loop instead of just a primes culling loop so can never have as low an average computational time per operation as the SoE when fully optimized. END_EDIT_ADD

EDIT_ADD2: It is my opinion that much of the confusion about the Sieve of Atkin is due to the deficiencies of the pseudo code from the Wikipedia article as quoted in the question, so have come up with a somewhat better version of the pseudo code that addresses at least some of the deficiencies to do with the range of the 'x' and 'y' variables and the modulo 12 versus modulo 60 confusion as follows:

limit ← 1000000000        // arbitrary search limit

// Initialize the sieve
for i in {7,11,13,17,19,23,29,31, 37,41,43,47,49,53,59,61,...}:
    is_prime(i) ← false

// Put in candidate primes:
// integers which have an odd number of
// representations by certain quadratic forms.
while n ≤ limit, n ← 4x²+y² where x ∈ {1,2,...} and y ∈ {1,3,...} // odd y's
    if n mod 60 ∈ {1,13,17,29,37,41,49,53}:
        is_prime(n) ← ¬is_prime(n)
while n ≤ limit, n ← 3x²+y² where x ∈ {1,3,...} and y ∈ {2,4,...} // only odd 
    if n mod 60 ∈ {7,19,31,43}:                            // x's and even y's
        is_prime(n) ← ¬is_prime(n)
while n ≤ limit, n ← 3x²-y² where x ∈ {2,3,...} and y ∈ {x-1,x-3,...,1} //all 
    if n mod 60 ∈ {11,23,47,59}:                   // even/odd odd/even combos
        is_prime(n) ← ¬is_prime(n)

// Eliminate composites by sieving, only for those occurrences on the wheel
for n² ≤ limit where n ∈ {7,11,13,17,19,23,29,31, 37,41,43,47,49,53,59,61,...}:
    if is_prime(n):
        // n is prime, omit multiples of its square; this is
        // sufficient because composites which managed to get
        // on the list cannot be square-free
        while c ≤ limit, c ← n² × k where
                      k ∈ {1,7,11,13,17,19,23,29, 31,37,41,43,47,49,53,59,...}:
            is_prime(c) ← false

output 2, 3, 5
for n ≤ limit when n ← 60 × k + x where
  k ∈ {0..} and
  x ∈ {7,11,13,17,19,23,29,31, 37,41,43,47,49,53,59,61}:
    if is_prime(n): output n

The above seems quite simple and is quite a good algorithm except that it still isn't faster than a basic Sieve of Eratosthenes that uses the same 2;3;5 factorization wheel because it wastes almost half of its inner loop toggle operations that fail the modulo tests. To demonstrate:

Following is the repeating 4x²+y² pattern of qualifying modulo 60's across a range of 15 values of 'x' (every value) vertically and 15 odd values of 'y' horizontally; both starting at one. Notice that they are symmetrical about a vertical axis but only 128 out of 225 or 256 out of 450 for a full 30 number range of 'x' values are valid toggle operations:

  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
 37  0  1  0  0 37  0  0  0 37  0  0  1  0 37
  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
 41 49  0 29  1 41 29  0 29 41  1 29  0 49 41
  0  0 49 13  0  0 13  0 13  0  0 13 49  0  0
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
  0  0 49 13  0  0 13  0 13  0  0 13 49  0  0
 41 49  0 29  1 41 29  0 29 41  1 29  0 49 41
  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
 37  0  1  0  0 37  0  0  0 37  0  0  1  0 37
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
  1  0  0 49  0  1 49  0 49  1  0 49  0  0  1

Following is the repeating 3x²+y² pattern of qualifying modulo 60's across a range of 5 odd values of 'x' vertically and 15 even values of 'y' horizontally. Notice that they are symmetrical about a horizontal axis but only 48 out of 75 or 144 out of 450 for a full 30 number range of 'x' values are valid toggle operations as all 144 out of 450 invalid operations with even 'x' and odd 'y' have already been eliminated:

  7 19  0  7 43  0 19 19  0 43  7  0 19  7  0
 31 43  0 31  7  0 43 43  0  7 31  0 43 31  0
 19 31  0 19  0  0 31 31  0  0 19  0 31 19  0
 31 43  0 31  7  0 43 43  0  7 31  0 43 31  0
  7 19  0  7 43  0 19 19  0 43  7  0 19  7  0

Following is the repeating 3x²-y² pattern of qualifying modulo 60's across a range of 5 odd values of 'x' vertically and 15 even values of 'y' horizontally. Notice that they are symmetrical about a horizontal axis but only 48 out of 75 or 224 out of 450 for a full 30 number range of 'x' values are valid toggle operations:

 59 47  0 59 23  0 47 47  0 23 59  0 47 59  0
 23 11  0 23 47  0 11 11  0 47 23  0 11 23  0
 11 59  0 11  0  0 59 59  0  0 11  0 59 11  0
 23 11  0 23 47  0 11 11  0 47 23  0 11 23  0
 59 47  0 59 23  0 47 47  0 23 59  0 47 59  0

Following is the repeating 3x²-y² pattern of qualifying modulo 60's across a range of 5 even values of 'x' vertically and 15 odd values of 'y' horizontally. Notice that they are symmetrical about a vertical axis but only 48 out of 75 or 224 out of 450 for a full 30 number range of 'x' values are valid toggle operations:

 11  0 47 23  0 11 23  0 23 11  0 23 47  0 11
 47  0 23 59  0 47 59  0 59 47  0 59 23  0 47
 47  0 23 59  0 47 59  0 59 47  0 59 23  0 47
 11  0 47 23  0 11 23  0 23 11  0 23 47  0 11
 59  0  0 11  0 59 11  0 11 59  0 11  0  0 59

As one can determine by inspection of the above tables, for the pseudo code as above there is an overall repeating range of 30 values of x (including both evens and odds) which only has 688 valid operations out of a total of 1125 combined operations; thus it wastes almost half of its processing in just conditionally skipping over values due to the non-productive skipping operations being part of the innermost loops.There are two possible methods to eliminate that "hit" redundancy inefficiently, as follows:

  1. The method as outlined in the Atkin and Bernstein paper, which uses the recognized patterns in the combined modulos of 'x' and 'y' to process only the modulo "hits" using the fact that once one locates a given modulo on the pattern, then there are an infinite sequence of values at that some modulo (equals wheel bit position) where each pattern is separated by an easily calculated (variable) offset which has the form "current position plus A times the current value of 'x' plus B" and "current position plus C times the current value of 'y' plus D" where A, B, C, and D, are simple constants (simple meaning small easily manipulated). Bernstein used that method with the additional help of many Look Up Tables (LUTs).

  2. The method of creating overall pattern state Look Up Tables (LUTs) (one for each of the four tables above plus one for the minor prime square free loop) indexed by the modulo current values of 'x' combined with the modulo of 'y' to find the A, B, C, and D parameters to skip, not to the next pattern, but just to the next position on the horizontal sequence. This is likely the more highly performance option as it more easily allows for extreme clock cycle per operation reduction using in-lining of the unrolled loop code and will produce more efficient code for large ranges when page segmenting as the jumps per loop are smaller on average. This will likely make the clock cycles per loop close to that of a highly optimized Sieve of Eratosthenes as in primesieve, but will not likely get to quite that low due to having to compute the variable step sizes rather than being able to use fixed prime offsets for SoE.

So there are three governing objectives in reducing the running time for a prime sieve, as follows:

  1. A successful sieve reduces the number of operations, which even the "hit optimized" SoA fails as compared to a highly wheel factorized SoE by about 16.7% for ranges of about a few billion.

  2. A successful sieve reduces the CPU clock cycles per operation, which the SoA fails as compared to a highly optimized SoE such as primesieve because its operations are more complex due to the variable increments, again likely by 20% to 60%. Atkin and Bernstein's primegen (SoA) takes about 4.5 as compared to about 2.5 clock cycles per operation for primesieve (SoE) for a range of one billion for each, but perhaps the SoA could be sped up somewhat by borrowing some of the optimization techniques from primesieve.

  3. A successful sieve has reasonably low code complexity so that it can be more easily extended to larger ranges using such techniques as "bucket sieving" and other page segmentation optimizations. For this the Sieve of Atkin fails miserably as it gets exponentially more complex for page segmenting large number ranges. It is extremely complex to write a SoA program that would compete with even Bernstein's primegen (SoA) let alone with primesieve, whereas it is quite easy to write code that comes close to the same performance as primesieve.

  4. A successful sieve has a lower empirical complexity, which the SoA does have above the SoE by a factor of (log (log n) where n is the upper range to be sieved, but that extra small ratio is not likely ever enough to make up for the above top two combined loss ratios, as this factor gets smaller with increasing range. END_EDIT_ADD2

So the answer to the question "Why use the Sieve of Atkin?" is "There is no reason to use it at all if the SoE is implemented with maximum wheel factorization optimizations until the numbers sieved are extremely large (64-bit number range and higher) and then the SoA advantage is very small and perhaps not realizable at all depending on very minor tweaks in relative optimizations.".

As an answer to another similar Sieve of Atkin question, I have posted a page segmented C# version of the SoE optimized as per the above at: http://stackoverflow.com/a/20559687/549617.

这篇关于阿特金的筛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆