阿特金筛 [英] The Sieve of Atkin

查看:18
本文介绍了阿特金筛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试学习生成素数的算法,我在维基百科上遇到了 Sieve of Atkin.我了解算法的几乎所有部分,除了少数部分.以下是问题:

I have been trying to learn algorithms for generating prime numbers and I came across Sieve of Atkin on Wikipedia. I understand almost all parts of the algorithm, except a few. Here are the questions:

  1. 下面的三个二次方程是如何形成的?4x^2+y^2、3x^2+y^2 和 3x^2-y2
  2. 维基百科中的算法讨论了模 60,但我不明白在下面的伪代码中如何/在哪里使用它.
  3. 如何找到这些提醒 1、5、7 和 11?

以下是维基百科的伪代码供参考:

Below is the pseudocode from Wikipedia for reference:

// arbitrary search limit                                                   
limit ← 1000000                                                             

// initialize the sieve                                                     
for i in [5, limit]: is_prime(i) ← false                                    

// put in candidate primes:                                                 
// integers which have an odd number of                                     
// representations by certain quadratic forms                               
for (x, y) in [1, √limit] × [1, √limit]:                                    
    n ← 4x²+y²                                                              
    if (n ≤ limit) and (n mod 12 = 1 or n mod 12 = 5):                      
        is_prime(n) ← ¬is_prime(n)                                          
    n ← 3x²+y²                                                              
    if (n ≤ limit) and (n mod 12 = 7):                                      
        is_prime(n) ← ¬is_prime(n)                                          
    n ← 3x²-y²                                                              
    if (x > y) and (n ≤ limit) and (n mod 12 = 11):                         
        is_prime(n) ← ¬is_prime(n)                                          

// eliminate composites by sieving                                          
for n in [5, √limit]:                                                       
    if is_prime(n):                                                         
        // n is prime, omit multiples of its square; this is                
        // sufficient because composites which managed to get               
        // on the list cannot be square-free                                
        is_prime(k) ← false, k ∈ {n², 2n², 3n², ..., limit}                 

print 2, 3                                                                  
for n in [5, limit]:                                                        
    if is_prime(n): print n  

推荐答案

Sieve of Atkin 伪代码来自您引用的维基百科文章 包含您的问题的答案或有关 Will Ness 提供链接的文章的讨论,尽管您可能无法将这些信息放在一起.简答如下:

The Sieve of Atkin pseudo code from the Wikipedia article you've quoted contains the answers to your questions or the discussion about the article for which Will Ness has provided a link does, although you may not be able to put the information together. Short answers are as follows:

  1. 这三个方程来自阿特金的数学证明,所有素数将作为这三个中的一个或多个的解出现对所有有效的具有适当模条件的方程变量 'x' 和 'y' 的值,事实上他进一步证明了真正的质数将是那些具有奇数的数这三个方程的解(在切换奇数时保留为 True从初始条件为 False 的次数)与模每个排除那些可被整除的数字的条件找到的素数的平方小于或等于素数的平方根测试号码.

  1. The three equations come from Atkin's mathematical proof that all primes will occur as the solutions of one or more of those three equations with the appropriate modulo conditions for all valid values of the variables 'x' and 'y', and in fact he further proved that true primes will be those numbers that have an odd number of solutions to those three equations (left as True when toggled an odd number of times from initial conditions of False) with the modulo conditions for each excluding those numbers which are divisible by squares of found primes less than or equal to the square root of the tested number.

真正的阿特金筛分法基于一组模 60 条件;这个伪代码代表了一种简化,其中较少每个方程的条件范围使用一组模 12条件 (5 × 12 = 60) - 然而,这导致有 20%由于在这个新项目中引入了冗余,额外的工作正在完成条件集.这也是简化的原因伪代码需要在 5 点开始其主要扫描,而不是正常的 7 和做基本素数平方免费消除从 5 的基数开始,执行时间增加因为 5 的因数没有正确处理.原因这种简化可能是为了牺牲一些额外的代码开销,以减轻必须检查的代码复杂性值,虽然这可以通过单个表查找来完成而由于使用模 12 而额外超过 30% 的工作不能减少.

The true Sieve of Atkin is based on a set of modulo 60 conditions; this pseudo code represents a simplification where there are less ranges of conditions for each equation using a set of modulo 12 conditions (5 × 12 = 60) - however, this results in there being 20% extra work being done because of introduced redundancy in this new set of conditions. It is also the reason that this simplified pseudo code needs to start its prime scan at 5 rather than the normal 7 and to do the base primes squares free eliminations starting at a base prime of 5 at an added cost in execution time as the factors of 5 are not otherwise handled properly. The reason for this simplification is perhaps to sacrifice some extra code overhead in order to ease the code complexity in having to check values, although this can be done with a single table look up whereas the extra over 30% in work due to using modulo 12 can not be reduced.

提醒"(应拼写为 remainders)由您引用的伪代码中的mod"运算符代表 modulo可以使用任何计算机语言的运算符,通常是在计算机语言如 C、Java、C#、F# 等等.此运算符产生整数余数在对被除数进行整数除法后(第一个数字,在'mod'之前)除数(第二个数字,之后'mod' 符号).余数只有四的原因而不是全模 60 算法中使用的 16 是由于简化为模 12 算法.你会注意到在模 12 条件下,4x"条件通过 25这通常会被模 60 条件消除,因此许多包含 25 作为因子的复合材料需要由 5 平方免费支票的额外素数消除.同样,55通过3x+"检查,35 通过3x-"检查不会用于完整的模 60 算法.

The "reminders" (should be spelled remainders) are found by the 'mod' operators in the pseudo code you quoted which stand for the modulo operator in whatever computer language one might use, often represented by the "%" symbol in computer languages such as C, Java, C#, F#, et cetera. This operator yields the integer remainder after an integer division of the dividend (the first of the numbers, before the 'mod') by the divisor (the second of the numbers, after the 'mod' symbol). The reason that the remainders are just four rather than the 16 as used in the full modulo 60 algorithm is due to the simplifications of reducing to a modulo 12 algorithm. You'll notice that with the modulo 12 conditions, the "4x" condition passes 25 which would normally be eliminated by the modulo 60 conditions, thus many composites containing 25 as a factor need to be eliminated by the additional prime of 5 square free check. Similarly, 55 passes the "3x+" check and 35 passes the "3x-" check where they wouldn't for the full modulo 60 algorithm.

正如维基百科文章谈话"部分所讨论和上面暗示的那样,这个引用的伪代码从来都不会比仅基本赔率的埃拉托色尼筛 (SoE) 实现快得多,更不用说使用相同程度的轮因式分解了对于它的低效率:变量x"和y"不需要在整个范围内变化到许多情况下(文章中提到的)筛选范围的平方根,正确使用模 60 轮可以恢复使用模 12 简化的冗余(正如我上面提到的),并且二次方程的解中存在模格模式,因此使用(计算缓慢)模运算的条件不需要通过使用自动跳过那些根据格子模式不满足那些模条件的解决方案的算法(仅在完整的 Atkin 和 Bernstein 论文中非常隐晦地提及).

As discussed in the Wikipedia article "Talk" section and hinted above, this quoted pseudo code is never much faster than even basic odds-only Sieve of Eratosthenes (SoE) implementations let alone one that uses the same degree of wheel factorization due to its inefficiencies: the variables 'x' and 'y' do not need to range over the full range to the square root of the range sieved for many cases (mentioned in the article), the proper use of the modulo 60 wheel recovers the redundancies in the use of the modulo 12 simplification (as I mention above), and there are modulo lattice patterns in the solutions to the quadratic equations such that the conditions using the (computationally slow) modulo operations need not be tested by the use of an algorithm that automatically skips those solutions that would not satisfy those modulo conditions according to the lattice patterns (only mentioned very obscurely in the full Atkin and Bernstein paper).

回答一个你没有问但应该问的问题:为什么要使用阿特金筛?".

To answer a question you didn't ask but should have: "Why use the Sieve of Atkin?".

实施阿特金筛法 (SoA) 而不是埃拉托色尼筛法 (SoE) 的主要原因是 SOA 更快是互联网知识".这种信念有两个原因,如下所示:

The main reason that the Sieve of Atkin (SoA) is implemented rather than the Sieve of Eratosthenes (SoE) is that it is "Internet Knowledge" that the SOA is faster. There are two reasons for this belief, as follows:

  1. 假设 SoA 更快,因为渐近计算它的复杂性比 SoE 低 log(log n) 倍其中 n 是筛选的素数范围.实际上,去从 2 到 32 次方(四十亿加)到 2 到64 次幂(大约 2 后跟 19 个零),这个因数是六五等于 1.2.这意味着真正的 SoE 应该是 1.2 倍只要在筛选到64 位数字范围与 32 位数字范围相比,而SoA 将具有线性关系如果一切都理想.你可以理解,首先,对于这样的一个非常小的因素数量范围的巨大差异,其次,这只会带来好处如果两种算法具有相同或接近相同在一些合理的素数范围内的表现.

  1. The SoA is assumed to be faster because the asymptotic computational complexity is less for it than for the SoE by a factor of log(log n) where n is the range of primes sieved. Practically speaking, going from a range of two to the power 32 (four billion plus) to two to the power 64 (about 2 followed by 19 zeros), this factor is six over five equals 1.2. That means that the true SoE should take 1.2 times as long as expected by just a linear ratio when sieving to the 64-bit number range as compared to the 32-bit number range, whereas the SoA will have a linear relationship if all were ideal. You can appreciate that, first, this is a very small factor for such a huge difference in number ranges and, second, that this benefit only holds if the two algorithms had the same or close to the same performance at some reasonable range of primes.

互联网知识"没有明确理解的是这些数字仅适用于比较与另一个给定范围相比,在给定范围内的表现对于同一个算法,不作为不同算法之间的比较算法.因此,作为证明 SoA 将是无用的比 SoE 更快,因为 SoA 可以从更大的开销开始对于特定 SoE 算法的给定筛选范围,如下面是优化的 SoE 示例.

What isn't clearly understood by that "Internet knowledge" is that these figures only apply when one is comparing the ratio of performance over a given range as compared to another given range for the same algorithm, not as a comparison between different algorithms. Thus it is useless as a proof that the SoA will be faster than the SoE since the SoA could start with a larger overhead for a given sieve range of a particular SoE algorithm as in the following optimized SoE example.

SoA 被认为更快,因为计算阿特金和伯恩斯坦根据他们的维基百科文章中链接的论文.虽然工作是准确的,它仅适用于他们对他们比较的 SoE 算法:由于 SoA 算法基于模 60 因式分解,代表两个 2,3,5 因式分解车轮旋转,他们也将 SoE 算法限制为相同车轮分解.这样做,SoE 做了大约 424,000超过 10 亿范围的合数剔除操作测试,而经过测试的完全优化的 SoA 具有大约 326,000 个组合切换和方形自由剔除操作,每个操作都需要与 SoE 中的每个合数剔除操作同时进行,因为以相同的风格编写.这意味着 SoA 大约是对于这组特定的轮子分解,比 SoE 快 30%条件,这正是阿特金斯和伯恩斯坦对比测试显示.

The SoA is believed to be faster because of the computational comparison made and published by Atkin and Bernstein as per their paper linked in the Wikipedia article. While the work is accurate, it only applies to the artificial limitations that they made on the SoE algorithm they compared: Since the SoA algorithm is based on modulo 60 factorization which represents two 2,3,5 factorization wheel rotations, they also limited the SoE algorithm to that same wheel factorization. Doing this, the SoE does about 424,000 composite number cull operations over the one billion range tested, whereas a fully optimized SoA as tested has about 326,000 combined toggle and square free cull operations, which each take about the same time as each composite number cull operation in the SoE due to being written in the same style. This means that the SoA is about 30% faster than the SoE for this particular set of wheel factorization conditions, and that is just about exactly what the Atkins and Bernstein comparison test showed.

但是,SoA 不会响应更多级别的轮子分解为 2,3,5 级别被烘焙"到算法中,而 SoE 确实对进一步的水平做出反应:使用 2,3,5,7轮子分解导致大约相同数量的操作意思是每个人都有相同的表现.一个人甚至可以使用比 2,3,5,7 级别更高的轮子分解级别为了使 SoE 的操作数比 SoA 少约 16.7%,以获得更好的性能.所需的优化实施这些额外级别的车轮分解实际上是比实现原始优化的代码复杂度更简单索A.实现这些可比页面的内存占用分段实现大致相同,即页缓冲区加上基本素数数组.

However, the SoA does not respond to further levels of wheel factorization as the 2,3,5 level is "baked-in" to the algorithm, whereas the SoE does respond to further levels: using a 2,3,5,7 wheel factorization results in about the same number of operations meaning about the same performance for each. One can use even a partial higher level of wheel factorization than the 2,3,5,7 level to get the number of operations for SoE about 16.7% less than SoA, for a proportional better performance. The optimizations required to implement these extra levels of wheel factorization are actually simpler than the code complexity to implement the original optimized SoA. The memory footprint for implementing these comparable page segmented implementations is about the same, being the size of the page buffers plus the base primes array.

此外,两者都将受益于在机器状态查找"样式,这将有助于更好地优化使用内联代码和极端循环展开,但 SoE 更适合由于算法更简单,因此这种风格比 SoA 更简单.

In addition, both would benefit from being written in a "machine state look up" style which would help in better optimization using inlined code and extreme loop unrolling but the SoE befits more from this style than the SoA due to being a simpler algorithm.

因此,对于高达约 32 位数字范围的筛选范围,最大优化的 SoE 比 SoA 快约 20%(或更多轮分解);但是,SoA 具有这种渐近计算复杂度的优势,因此它会在某个点上赶上来.该点将在 log (log n) 与 log (log (2^32)) 或 5 之比为 1.2 的范围内,或大约为 10 的 19 次方的 2 倍的范围 - 一个非常大的数字.如果在现代台式计算机上运行的优化 SoA 需要大约三分之一秒来计算 32 位数字范围内的素数,并且如果实现是理想的是随着范围的增加线性时间增加,计算这个范围的素数需要大约 45 年.然而,在这些高范围内对素数的分析通常是在小块中完成的,对于非常大的筛子,使用 SoA 在理论上比 SoE 更实用,但系数非常小.

So for sieve ranges up to about the 32-bit number range, the maximally optimized SoE is about 20% faster (or more with further wheel factorization) than the SoA; however, the SoA has this asymptotic computational complexity advantage, so there will be some point at which it catches up. That point will be at about the range where the ratio of log (log n) to log (log (2^32)) or 5 is 1.2 or a range of about 2 times ten to the nineteenth power - an exceedingly large number. If the optimized SoA run on a modern desktop computer were to take about a third of a second to compute the primes in the 32-bit number range, and if the implementation were ideal as in taking linear time increase with increasing range, it would take about 45 years to compute the primes to this range. However, analysis of prime numbers in these high ranges is often done in small chunks, for which the use of the SoA would be theoretically practical as compared to the SoE for very large number sieves, but by a very small factor.

EDIT_ADD:实际上,与较低范围相比,页面分段 SoE 和 SoA 都不会继续在这些大范围内以线性时间运行,因为两者都遇到了切换"问题和剔除"操作由于每次跳转必须跳过大量页面而失去效率;这意味着两者都需要修改算法来处理这种页面跳转",并且如果在执行此操作的方式上存在任何细微差异,则 SoA 的非常微小的优势可能会完全消失.SoA 在跳转表"中的项比 SoE 多得多,大约是找到的质数数量与该范围的平方根之间的反比,这可能会增加 O(log n)术语在处理中,但对于在跳转表"中具有更多条目的 SoA 而言,一个常数因子更大的增加.这个额外的事实将几乎完全抵消 SoA 相对于 SoE 的任何优势,即使对于非常大的范围也是如此.此外,在实现三个独立二次方程的条件以及素数平方自由"循环而不是素数剔除循环的更多情况下,SoA 具有更多独立循环的恒定开销,因此永远不会有那么低的完全优化后,每个操作的平均计算时间作为 SoE.END_EDIT_ADD

EDIT_ADD: In actual fact, neither the page segmented SoE nor the SoA continue to run in linear time for these huge ranges as compared to the lower ranges as both run into problems with the "toggling" and "culling" operations losing efficiency due to having to skip large numbers of pages per jump; this means that both will require modified algorithms to handle this "page jumping" and the very slight advantage to the SoA may be completely erased if there are any slight differences in the way that this is done. The SoA has many more terms in the "jump tables" than the SoE by about the inverse ratio between of the number of primes found up to the square root of the range to that range, and this will likely add a O(log n) term to both in processing but a constant factor larger increase for the SoA which has a higher number of entries in the "jump table". This extra fact will pretty much completely cancel out any advantage of the SoA over the SoE, even for extremely large ranges. Further, the SoA has a constant overhead of more individual loops required for the many more cases in implementing the conditions for the three separate quadratic equations plus the "prime squares free" loop instead of just a primes culling loop so can never have as low an average computational time per operation as the SoE when fully optimized. END_EDIT_ADD

EDIT_ADD2:我认为,关于阿特金筛网的大部分混淆是由于问题中引用的维基百科文章中伪代码的缺陷,所以提出了伪代码的更好版本,至少解决了一些与x"和y"变量的范围以及模 12 与模 60 混淆的缺陷,如下所示:

EDIT_ADD2: It is my opinion that much of the confusion about the Sieve of Atkin is due to the deficiencies of the pseudo code from the Wikipedia article as quoted in the question, so have come up with a somewhat better version of the pseudo code that addresses at least some of the deficiencies to do with the range of the 'x' and 'y' variables and the modulo 12 versus modulo 60 confusion as follows:

limit ← 1000000000        // arbitrary search limit

// Initialize the sieve
for i in {7,11,13,17,19,23,29,31, 37,41,43,47,49,53,59,61,...}:
    is_prime(i) ← false

// Put in candidate primes:
// integers which have an odd number of
// representations by certain quadratic forms.
while n ≤ limit, n ← 4x²+y² where x ∈ {1,2,...} and y ∈ {1,3,...} // odd y's
    if n mod 60 ∈ {1,13,17,29,37,41,49,53}:
        is_prime(n) ← ¬is_prime(n)
while n ≤ limit, n ← 3x²+y² where x ∈ {1,3,...} and y ∈ {2,4,...} // only odd 
    if n mod 60 ∈ {7,19,31,43}:                            // x's and even y's
        is_prime(n) ← ¬is_prime(n)
while n ≤ limit, n ← 3x²-y² where x ∈ {2,3,...} and y ∈ {x-1,x-3,...,1} //all 
    if n mod 60 ∈ {11,23,47,59}:                   // even/odd odd/even combos
        is_prime(n) ← ¬is_prime(n)

// Eliminate composites by sieving, only for those occurrences on the wheel
for n² ≤ limit where n ∈ {7,11,13,17,19,23,29,31, 37,41,43,47,49,53,59,61,...}:
    if is_prime(n):
        // n is prime, omit multiples of its square; this is
        // sufficient because composites which managed to get
        // on the list cannot be square-free
        while c ≤ limit, c ← n² × k where
                      k ∈ {1,7,11,13,17,19,23,29, 31,37,41,43,47,49,53,59,...}:
            is_prime(c) ← false

output 2, 3, 5
for n ≤ limit when n ← 60 × k + x where
  k ∈ {0..} and
  x ∈ {7,11,13,17,19,23,29,31, 37,41,43,47,49,53,59,61}:
    if is_prime(n): output n

以上看起来很简单并且是一个很好的算法,除了它仍然不比使用相同 2;3;5 分解轮的基本 Eratosthenes 筛法快,因为它浪费了几乎一半的内循环切换操作未通过模测试.演示:

The above seems quite simple and is quite a good algorithm except that it still isn't faster than a basic Sieve of Eratosthenes that uses the same 2;3;5 factorization wheel because it wastes almost half of its inner loop toggle operations that fail the modulo tests. To demonstrate:

以下是重复的 4x²+y² 模式,在 15 个 'x' 值(每个值)垂直和 15 个奇数值 'y' 水平范围内合格的 60 模数;都从一开始.请注意,它们关于垂直轴对称,但对于完整的 30 个x"值范围,只有 225 个中的 128 个或 450 个中的 256 个是有效的切换操作:

Following is the repeating 4x²+y² pattern of qualifying modulo 60's across a range of 15 values of 'x' (every value) vertically and 15 odd values of 'y' horizontally; both starting at one. Notice that they are symmetrical about a vertical axis but only 128 out of 225 or 256 out of 450 for a full 30 number range of 'x' values are valid toggle operations:

  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
 37  0  1  0  0 37  0  0  0 37  0  0  1  0 37
  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
 41 49  0 29  1 41 29  0 29 41  1 29  0 49 41
  0  0 49 13  0  0 13  0 13  0  0 13 49  0  0
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
  0  0 49 13  0  0 13  0 13  0  0 13 49  0  0
 41 49  0 29  1 41 29  0 29 41  1 29  0 49 41
  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
 37  0  1  0  0 37  0  0  0 37  0  0  1  0 37
 17  0 41  0 37 17  0  1  0 17 37  0 41  0 17
  0 13 29 53  0  0 53 49 53  0  0 53 29 13  0
  1  0  0 49  0  1 49  0 49  1  0 49  0  0  1

以下是重复的 3x²+y² 模式,在 5 个垂直 'x' 奇数值和 15 个水平 'y' 偶数值范围内对 60 取模进行资格取模.请注意,它们是关于水平轴对称的,但对于完整的 30 个x"值范围,只有 75 个中的 48 个或 450 个中的 144 个是有效的切换操作,因为 450 个无效操作中的 144 个是偶数 'x' 和奇数'y' 已经被删除:

Following is the repeating 3x²+y² pattern of qualifying modulo 60's across a range of 5 odd values of 'x' vertically and 15 even values of 'y' horizontally. Notice that they are symmetrical about a horizontal axis but only 48 out of 75 or 144 out of 450 for a full 30 number range of 'x' values are valid toggle operations as all 144 out of 450 invalid operations with even 'x' and odd 'y' have already been eliminated:

  7 19  0  7 43  0 19 19  0 43  7  0 19  7  0
 31 43  0 31  7  0 43 43  0  7 31  0 43 31  0
 19 31  0 19  0  0 31 31  0  0 19  0 31 19  0
 31 43  0 31  7  0 43 43  0  7 31  0 43 31  0
  7 19  0  7 43  0 19 19  0 43  7  0 19  7  0

以下是重复的 3x²-y² 模式,在 'x' 的 5 个奇数值和 'y' 的 15 个偶数值的范围内限定 60 的模数.请注意,它们关于水平轴对称,但对于完整的 30 个x"值范围,只有 75 个中的 48 个或 450 个中的 224 个是有效的切换操作:

Following is the repeating 3x²-y² pattern of qualifying modulo 60's across a range of 5 odd values of 'x' vertically and 15 even values of 'y' horizontally. Notice that they are symmetrical about a horizontal axis but only 48 out of 75 or 224 out of 450 for a full 30 number range of 'x' values are valid toggle operations:

 59 47  0 59 23  0 47 47  0 23 59  0 47 59  0
 23 11  0 23 47  0 11 11  0 47 23  0 11 23  0
 11 59  0 11  0  0 59 59  0  0 11  0 59 11  0
 23 11  0 23 47  0 11 11  0 47 23  0 11 23  0
 59 47  0 59 23  0 47 47  0 23 59  0 47 59  0

以下是重复的 3x²-y² 模式,在 5 个垂直 'x' 偶值和 15 个水平 'y' 奇数值的范围内对 60 取模进行限定.请注意,它们关于垂直轴对称,但对于完整的 30 个x"值范围,只有 75 个中的 48 个或 450 个中的 224 个是有效的切换操作:

Following is the repeating 3x²-y² pattern of qualifying modulo 60's across a range of 5 even values of 'x' vertically and 15 odd values of 'y' horizontally. Notice that they are symmetrical about a vertical axis but only 48 out of 75 or 224 out of 450 for a full 30 number range of 'x' values are valid toggle operations:

 11  0 47 23  0 11 23  0 23 11  0 23 47  0 11
 47  0 23 59  0 47 59  0 59 47  0 59 23  0 47
 47  0 23 59  0 47 59  0 59 47  0 59 23  0 47
 11  0 47 23  0 11 23  0 23 11  0 23 47  0 11
 59  0  0 11  0 59 11  0 11 59  0 11  0  0 59

通过检查上表可以确定,对于上述伪代码,x 的整体重复范围为 30 个值(包括偶数和奇数),总共 1125 个有效操作中只有 688 个有效操作联合作战;因此,由于非生产性跳过操作是最内层循环的一部分,因此它几乎有一半的处理都浪费在有条件地跳过值上.有两种可能的方法可以低效地消除命中"冗余,如下所示:

As one can determine by inspection of the above tables, for the pseudo code as above there is an overall repeating range of 30 values of x (including both evens and odds) which only has 688 valid operations out of a total of 1125 combined operations; thus it wastes almost half of its processing in just conditionally skipping over values due to the non-productive skipping operations being part of the innermost loops.There are two possible methods to eliminate that "hit" redundancy inefficiently, as follows:

  1. Atkin 和 Bernstein 论文中概述的方法,该方法使用 'x' 和 'y' 组合模中的识别模式来仅处理模命中",因为一旦找到一个给定模式的模数,那么在某个模数(等于轮位位置)处有一个无限的值序列,其中每个模式由一个易于计算的(变量)偏移量分隔,该偏移量的形式为当前位置加上当前值的 A 倍"'x' 加 B"和当前位置加 C 乘以 'y' 加 D 的当前值",其中 A、B、C 和 D 是简单的常数(简单的意思是小,易于操作).Bernstein 在许多查找表 (LUT) 的额外帮助下使用了该方法.

  1. The method as outlined in the Atkin and Bernstein paper, which uses the recognized patterns in the combined modulos of 'x' and 'y' to process only the modulo "hits" using the fact that once one locates a given modulo on the pattern, then there are an infinite sequence of values at that some modulo (equals wheel bit position) where each pattern is separated by an easily calculated (variable) offset which has the form "current position plus A times the current value of 'x' plus B" and "current position plus C times the current value of 'y' plus D" where A, B, C, and D, are simple constants (simple meaning small easily manipulated). Bernstein used that method with the additional help of many Look Up Tables (LUTs).

创建整体模式状态查找表 (LUT) 的方法(上面四个表中的每一个加上一个小素数平方自由循环),由x"的模当前值与'y' 的模以找到要跳过的 A、B、C 和 D 参数,而不是跳到下一个模式,而只是跳到水平序列上的下一个位置.这可能是更高性能的选项,因为它更容易允许使用展开循环代码的内联减少每个操作的极端时钟周期,并且将在页面分段时为大范围生成更有效的代码,因为每个循环的跳转平均较小.这可能会使每个循环的时钟周期接近 高度优化的 Eratosthenes 筛网,但不会由于必须计算可变步长而不是能够为 SoE 使用固定的主要偏移量,因此可能会变得非常低.

The method of creating overall pattern state Look Up Tables (LUTs) (one for each of the four tables above plus one for the minor prime square free loop) indexed by the modulo current values of 'x' combined with the modulo of 'y' to find the A, B, C, and D parameters to skip, not to the next pattern, but just to the next position on the horizontal sequence. This is likely the more highly performance option as it more easily allows for extreme clock cycle per operation reduction using in-lining of the unrolled loop code and will produce more efficient code for large ranges when page segmenting as the jumps per loop are smaller on average. This will likely make the clock cycles per loop close to that of a highly optimized Sieve of Eratosthenes as in primesieve, but will not likely get to quite that low due to having to compute the variable step sizes rather than being able to use fixed prime offsets for SoE.

因此,减少初级筛的运行时间有以下三个主要目标:

So there are three governing objectives in reducing the running time for a prime sieve, as follows:

  1. 一个成功的筛选减少了操作的数量,即使是命中优化"的 SoA 与高度因轮分解的 SoE 相比也失败了约 16.7%,范围约为数十亿.

  1. A successful sieve reduces the number of operations, which even the "hit optimized" SoA fails as compared to a highly wheel factorized SoE by about 16.7% for ranges of about a few billion.

成功的筛选减少了每个操作的 CPU 时钟周期,与高度优化的 SoE(例如 primesieve)相比,SoA 失败了,因为它的操作由于可变增量而更加复杂,同样可能减少 20% 到 60%.Atkin 和 Bernstein 的 Primegen (SoA) 大约需要 4.5 个时钟周期,而 Primesieve (SoE) 的每个操作大约需要 2.5 个时钟周期,每个周期为 10 亿,但也许 SoA 可以通过借用一些优化技术来加速素筛.

A successful sieve reduces the CPU clock cycles per operation, which the SoA fails as compared to a highly optimized SoE such as primesieve because its operations are more complex due to the variable increments, again likely by 20% to 60%. Atkin and Bernstein's primegen (SoA) takes about 4.5 as compared to about 2.5 clock cycles per operation for primesieve (SoE) for a range of one billion for each, but perhaps the SoA could be sped up somewhat by borrowing some of the optimization techniques from primesieve.

成功的筛选具有相当低的代码复杂性,因此可以使用桶筛选"和其他页面分段优化等技术更轻松地将其扩展到更大的范围.为此,Atkin 的筛选失败了,因为它在对大数字范围进行页面分割时变得更加复杂.编写一个 SoA 程序是极其复杂的,它甚至可以与伯恩斯坦的质数 (SoA) 竞争,更不用说质数筛了,而编写与质数筛性能接近的代码却很容易.

A successful sieve has reasonably low code complexity so that it can be more easily extended to larger ranges using such techniques as "bucket sieving" and other page segmentation optimizations. For this the Sieve of Atkin fails miserably as it gets exponentially more complex for page segmenting large number ranges. It is extremely complex to write a SoA program that would compete with even Bernstein's primegen (SoA) let alone with primesieve, whereas it is quite easy to write code that comes close to the same performance as primesieve.

一个成功的筛选具有较低的经验复杂性,SoA 确实比 SoE 高了一个因子 (log (log n),其中 n 是要筛选的上限,但这个额外的小比率不是可能足以弥补上述前两个综合损失率,因为这个因素随着范围的增加而变小.END_EDIT_ADD2

A successful sieve has a lower empirical complexity, which the SoA does have above the SoE by a factor of (log (log n) where n is the upper range to be sieved, but that extra small ratio is not likely ever enough to make up for the above top two combined loss ratios, as this factor gets smaller with increasing range. END_EDIT_ADD2

所以对于为什么使用阿特金筛法?"这个问题的答案是如果 SoE 是通过最大轮分解优化实现的,那么根本没有理由使用它,直到筛选出的数字非常大(64 位数字范围及更高),因此 SoA 优势非常小,可能根本无法实现,这取决于相对优化中的非常小的调整.".

So the answer to the question "Why use the Sieve of Atkin?" is "There is no reason to use it at all if the SoE is implemented with maximum wheel factorization optimizations until the numbers sieved are extremely large (64-bit number range and higher) and then the SoA advantage is very small and perhaps not realizable at all depending on very minor tweaks in relative optimizations.".

作为对另一个类似的阿特金筛选问题的回答,我已经发布了一个页面分段 C# 版本的 SoE 优化如上所述:https://stackoverflow.com/a/20559687/549617.

As an answer to another similar Sieve of Atkin question, I have posted a page segmented C# version of the SoE optimized as per the above at: https://stackoverflow.com/a/20559687/549617.

这篇关于阿特金筛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆