计算并打印第 n 个质数 [英] Calculating and printing the nth prime number

查看:70
本文介绍了计算并打印第 n 个质数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算素数,我已经完成了.但是我只想计算和打印第 n 个素数(用户输入),而计算其余的(它们不会被打印)只打印第 n 个素数.

这是我到目前为止所写的内容:

import java.util.Scanner;/*** 计算第n个素数* @author {Zyst}*/公共课总理{公共静态无效主(字符串 [] args){扫描仪输入 = 新扫描仪(System.in);国际n,我 = 2,x = 2;System.out.printf("本程序计算第n个素数
");System.out.printf("请输入您要查找的第n个质数:");n = input.nextInt();for(i = 2, x = 2; i <= n; i++) {for(x = 2; x 

这是我编写的用于计算从 1 到 n 的素数的程序.但是,我希望它只打印第 n 个质数,

我想做的是在每次找到素数时进行某种 count int 和 ++ing,当 count == n 时它会打印出该数字,但我不太明白了解如何降落.

解决方案

要计算第 n 个素数,我知道两个主要变体.

直截了当的方式

也就是说,从找到的所有素数开始,从 2 开始计数,直到达到所需的 nth.

这可以通过不同程度的复杂性和效率来完成,并且有两种概念上不同的方法来做到这一点.第一个是

测试序列中所有数的素性

这将通过一个驱动函数来完成

public static int nthPrime(int n) {int 候选人,计数;for(候选人= 2,计数= 0;计数

决定效率的有趣部分是isPrime函数.

鉴于素数的定义是一个大于 1 且只能被 1 和它本身整除的数,我们在学校中学到的素数检查的明显方法是

审判分庭

直接把定义翻译成代码是

private static boolean isPrime(int n) {for(int i = 2; i < n; ++i) {如果 (n % i == 0) {//我们很天真,但并不愚蠢,如果//该数字有一个除数 other//比 1 或它本身,我们立即返回.返回假;}}返回真;}

但是,如果您尝试一下,您很快就会发现,它的简单性伴随着缓慢.通过该素性测试,您可以在几毫秒内找到第 1000 个th 素数 7919(在我的计算机上约为 20),但是找到第 10000 个th 素数 104729,需要几秒(~2.4s),第 100000th 个素数,1299709,几分钟(大约 5),第 100 万个素数,15485863,大约需要 8 个半小时,第 10 个素数,179424673、周等.运行时复杂度比二次方差 - Θ(n² * log n).

所以我们想加快素性测试的速度.许多人采取的一个步骤是认识到n(除了n 本身)的除数最多可以是n/2.如果我们利用这个事实,让试除循环只运行到 n/2 而不是 n-1,算法的运行时间会如何变化?对于合数,循环下限不会改变任何东西.对于素数,试除的次数减半,所以总体来说,运行时间应该减少一个比2小一些的因子.如果你尝试一下,你会发现运行时间几乎正好减半,所以尽管合数比素数多得多,但几乎所有时间都花在验证素数的素性上.

现在,如果我们想要找到百万分之一的素数,那并没有多大帮助,所以我们必须做得更好.尝试进一步减少循环限制,让我们看看实际需要哪些数字 n/2 的上限.如果n/2n 的约数,则n/2 是一个整数,换句话说,n> 可以被 2 整除.但是循环不会超过 2,所以它永远不会(除了 n = 4)到达 n/2.太好了,那么 n 的下一个最大可能除数是多少?为什么,n/3 当然.但是n/3只能是n的约数,如果它是一个整数,换句话说,如果n能被3整除.那么循环将在 3(或之前,在 2)退出并且永远不会到达 n/3(除了 n = 9).下一个最大可能的除数...

等一下!我们有 2 <->n/23 <->n/3.n 的除数成对出现.

如果我们考虑n的对应约数对(d, n/d),或者d = n/d,即d = √n,或者其中之一,比如 d,比另一个小.但是随后 d*d d <√n.n 的每一对对应的除数包含(至少)一个不超过√n.

n 是合数,则其最小非平凡因数不超过 √n.

因此我们可以将循环限制减少到√n,从而降低算法的运行时复杂度.现在应该是 Θ(n1.5 * √(log n)),但从经验上看,它的比例似乎要好一些 - 但是,没有足够的数据从经验结果中得出可靠的结论.>

这将在大约 16 秒内找到百万分之一,在不到 9 分钟内找到百万分之一,并且将在大约四个半小时内找到一亿分之一.这还是很慢的,但与天真的审判师十年左右的时间相差甚远.

由于有素数的平方和两个相近素数的乘积,比如 323 = 17*19,我们不能在 √n 以下减少试除循环的限制.因此,在保持试验划分的同时,我们现在必须寻找其他方法来改进算法.

一个很容易看到的事情是,除了 2 之外没有质数是偶数,所以我们只需要在处理 2 之后检查奇数.不过,这没有太大区别,因为偶数是找到组合最便宜 - 而且大部分时间仍然花在验证素数的素数上.然而,如果我们将偶数看成候选除数,我们会看到如果 n 可以被一个偶数整除,n 本身必须是偶数,所以(除了 2)在尝试除以任何大于 2 的偶数之前,它将被识别为复合.因此,算法中发生的所有被大于 2 的偶数除法必须留下非零余数.因此,我们可以省略这些除法,只检查 2 和从 3 到 √n 的奇数的可整性.这使确定一个数为素数或合数所需的除法次数减半(不完全是),因此运行时间减半.这是一个好的开始,但我们能做得更好吗?

另一个大的数族是 3 的倍数.我们执行的每三次除法都是 3 的倍数,但是如果 n 可以被其中一个整除,它也可以被 3 整除,因此,我们在算法中执行的除以 9, 15, 21, ... 永远不会留下 0 的余数.那么,我们怎样才能跳过这些划分呢?好吧,既不能被 2 也不能被 3 整除的数正是 6*k ± 1 形式的数.从 5 开始(因为我们只对大于 1 的数字感兴趣),它们是 5, 7, 11, 13, 17, 19, ...,从一个到下一个的步骤在 2 和 4 之间交替,也就是很简单,所以我们可以使用

private static boolean isPrime(int n) {如果 (n % 2 == 0) 返回 n == 2;如果 (n % 3 == 0) 返回 n == 3;int step = 4, m = (int)Math.sqrt(n) + 1;for(int i = 5; i 

这使我们再次加速(接近)1.5 倍,因此我们需要大约一个半小时才能达到第一亿个素数.

如果我们继续这条路线,下一步就是消去 5 的倍数.与 2、3 和 5 互质的数是形式的数

30*k + 1, 30*k + 7, 30*k + 11, 30*k + 13, 30*k + 17, 30*k + 19, 30*k + 23, 30*k + 29

所以我们只需要每 30 个数字除以 8(加上三个最小的素数).从一个到下一个的步骤,从 7 开始,循环到 4, 2, 4, 2, 4, 6, 2, 6. 这仍然很容易实现,并产生另一个 1.25 倍的加速(减去一点更复杂的代码).更进一步,将消除 7 的倍数,每 210 个数字中留下 48 个要除以,然后是 11 (480/2310)、13 (5760/30030) 等等.消除倍数的每个质数 p 产生(几乎)p/(p-1) 的加速,因此回报减少,而成本(代码复杂性,步骤的查找表)随着每个素数增加.

一般来说,在消除六七个素数(甚至更少)的倍数后,很快就会停止.然而,在这里,我们可以坚持到最后,当所有素数的倍数都被消除,只剩下素数作为候选除数时.由于我们正在按顺序查找所有素数,因此在需要将其作为候选除数之前找到每个素数,然后可以将其存储以备将来使用.这将算法复杂性降低到 - 如果我没有错误计算 - O(n1.5/√(log n)).以存储素数的空间使用为代价.

用试除法,这是最好的,你必须尝试除以所有素数到 √n 或第一次除法 n 来确定素数n.这在大约半小时内找到了第 100 万个质数.

那怎么样

快速素性测试

质数除了不存在合数通常没有的非平凡除数之外,还有其他数论性质.这些属性,如果可以快速检查,可以构成概率或确定性素性测试的基础.这种属性的原型与皮埃尔·德·费马的名字有关,他在 17世纪 世纪初发现

<块引用>

如果p是素数,则p是所有ap-a)的约数代码>.

这 - 费马所谓的小定理" - 在等价公式中

<块引用>

p 为质数且 a 不能被 p 整除.然后p除以p-1 - 1.

大多数广泛使用的快速素性测试(例如 Miller-Rabin)的基础以及它们的变体或类似物出现在更多(例如 Lucas-Selfridge)中.

所以如果我们想知道一个不太小的奇数n是否是素数(偶数和小数可以通过试除法有效地处理),我们可以选择任意数a (> 1) 不是n的倍数,例如2,并检查n是否可以整除an-1 -1. 由于 an-1 变得很大,最有效的方法是检查是否a^(n-1) ≡ 1 (mod n),即通过模幂运算.如果这种一致性不成立,我们就知道 n 是复合的.但是,如果它成立,我们就不能得出 n 是质数的结论,例如 2^340 ≡ 1 (mod 341),但是 341 = 11 * 31 是复合的.复合数 n 使得 a^(n-1) ≡ 1 (mod n) 被称为基 a 的费马伪素数.>

但这种情况很少见.给定任何基本 a >1,虽然有无数的费马伪素数可以作为a的基础,但它们比实际素数少得多.例如,100000以下的2碱基费马伪素数只有78个,3碱基费马伪素数有76个,但有9592个素数.因此,如果选择任意奇数 n >1 和任意基 a >1 并找到 a^(n-1) ≡ 1 (mod n)n 很有可能是素数.

但是,我们的情况略有不同,给定了n,只能选择a.因此,对于奇数复合 n,对于多少个 a1 <<n-1 可以a^(n-1) ≡ 1 (mod n) 成立吗?不幸的是,有复合数 - 卡迈克尔数 - 使得 每个 an 互质的同余成立.这意味着要使用 Fermat 检验将卡迈克尔数识别为复合数,我们必须选择一个是 n 素数除数之一的倍数的基数 - 可能没有很多这样的倍数.

但我们可以加强费马测试,以便更可靠地检测复合材料.如果 p 是奇素数,写 p-1 = 2*m.然后,如果 0 <<p,

a^(p-1) - 1 = (a^m + 1) * (a^m - 1)

p 正好整除这两个因数之一(这两个因数相差 2,因此它们的最大公约数是 1 或 2).如果m 是偶数,我们可以用同样的方式拆分a^m - 1.继续,如果 p-1 = 2^s * kk 奇数,写

a^(p-1) - 1 = (a^(2^(s-1)*k) + 1) * (a^(2^(s-2)*k) + 1) * ... * (a^k + 1) * (a^k - 1)

then p 整除一个因子.这就产生了强费马测试,

n >2 为奇数.用 k 奇数写 n-1 = 2^s * k.给定任意 a1 <<n-1,如果

  1. a^k ≡ 1 (mod n)
  2. a^((2^j)*k) ≡ -1 (mod n) 对于任何 j0 <= j <;s

then n 是基a强(费马)可能素数.复合强基a(费马)可能素数称为基a 的强(费马)伪素数.强费马伪素数比普通费马伪素数还要少,1000000以下,有78498个素数,245个碱基2费马伪素数,2碱基强费马伪素数只有46个.更重要的是,对于任何奇数复合n,最多有(n-9)/4 个基1 <;<n-1 其中 n 是强费马伪素数.

所以如果 n 是一个奇数组合,n 通过 k 强费马测试的概率在 1 和 n-1(独占边界)小于 1/4^k.

强费马测试需要 O(log n) 步,每一步都涉及 O(log n) 位数字的一两次乘法,因此复杂度为 O((log n)^3) 与朴素乘法 [for巨大的n,更复杂的乘法算法是值得的].

Miller-Rabin 检验是随机选择碱基的 k 折强费马检验.这是一个概率测试,但对于足够小的边界,已知的短碱基组合可给出确定性结果.

强费马测试是确定性 APRCL 测试的一部分.

建议先用前几个小素数进行试验除法,因为除法相对便宜,而且可以剔除大多数复合体.

对于寻找 nth 个素数的问题,在测试所有数的素数可行的范围内,有已知的碱基组合使多重强费马测试正确,这样会给出更快的 - O(n*(log n)4) - 算法.

对于 n <2^32,基数2、7、61足以验证素性.使用它,可以在大约六分钟内找到第 100 万个素数.

用素数除数消除复合物,埃拉托色尼筛法

与其按顺序研究数字并从头开始检查每个数字是否为素数,还可以将整个相关数字视为一个整体,一次性消除给定素数的倍数.这被称为埃拉托色尼筛法:

求不超过N

的素数

  1. 列出从 2 到 N
  2. 的所有数字
  3. 对于从 2 到 N 的每个 k:如果 k 还没有被划掉,它是质数;将 k 的所有倍数划掉为复合

质数是列表中没有划掉的数字.

该算法与试除法有着根本的不同,尽管两者都直接使用质数的可分性表征,而费马检验和使用质数其他性质的类似检验则相反.

试除法中,每个数n与不超过√nn的最小素数除数中的较小者的所有素数配对.由于大多数复合物的质数除数非常小,因此平均而言,检测复合物的成本很低.但是测试素数是很昂贵的,因为√n下面的素数比较多.尽管复合数比素数多得多,但测试素数的成本如此之高,以至于它完全支配了整个运行时间,并使试除法成为一种相对较慢的算法.对所有小于 N 的数字进行试除需要 O(N1.5/(log N)²) 步.

在筛子中,每个复合n 都与它的所有质因数配对,但与那些.因此,质数是便宜的数字,它们只看一次,而复合数更昂贵,它们被多次划掉.人们可能会认为,由于筛子包含的昂贵"数字比便宜"数字要多得多,因此它总体上是一种糟糕的算法.然而,一个合数没有许多不同的质因数——n 的不同质数因数的数量受log n 的限制,但通常很多 更小,数字 <= n 的不同质数因数的平均值是 log log n - 所以即使是筛子中的昂贵"数字平均不比试用部门的便宜"号码贵(或几乎不贵).

最多筛分N,对于每个质数p,有Θ(N/p)倍数要划掉,所以总交叉数为Θ(∑ (N/p)) = Θ(N * log (log N)).与使用更快素性测试的试除法或顺序测试相比,这会产生 更快的算法来查找最多 N 的素数.

然而,sieve 有一个缺点,它使用 O(N) 内存.(但使用分段筛,可以将其减少到 O(√N),而不会增加时间复杂度.)

为了找到nth个素数,而不是到N的素数,也存在事先不知道的问题筛子应该达到多远.

后者可以使用质数定理解决.PNT 说

π(x) ~ x/log x(等价于:lim π(x)*log x/x = 1),

其中π(x)是不超过x的素数个数(这里和下面,log必须是自然对数,对于算法的复杂性,为对数选择哪个底并不重要).由此,p(n) ~ n*log n,其中 p(n)nth 素数,并且从更深入的分析中得知 p(n) 有很好的上限,特别是

n*(log n + log (log n) - 1) = 6.

因此可以将其用作筛分限制,它不会超过目标.

O(N) 空间要求可以通过使用分段筛来克服.然后可以记录 √N 下面的素数,用于 O(√N/​​log N) 内存消耗,并使用长度增加的段 (O(√N) 当筛子是N附近).

上述算法有一些简单的改进:

  1. 仅在 处开始划掉 p 的倍数,而不是在 2*p
  2. 从筛子中消除偶数
  3. 从筛子中消除更多小素数的倍数

这些都没有降低算法的复杂性,但它们都显着减少了常数因子(与试验除法一样,消除p的倍数对较大的p产生较小的加速 同时增加了代码复杂度,而不是较小的 p).

使用前两个改进收益

//数组中的条目k代表数字2*k+3,所以我们要做//一些算术来使索引正确.公共静态 int nthPrime(int n) {如果 (n <2) 返回 2;如果 (n == 2) 返回 3;int 限制,根,计数 = 1;limit = (int)(n*(Math.log(n) + Math.log(Math.log(n)))) + 3;root = (int)Math.sqrt(limit) + 1;限制 = (limit-1)/2;根=根/2-1;布尔[]筛=新布尔[限制];for(int i = 0; i < root; ++i) {如果 (!sieve[i]) {++计数;for(int j = 2*i*(i+3)+3, p = 2*i+3; j 

它在大约 18 秒内找到了第 100 万个素数,2038074743.通过存储打包的标志,每个标志一位,而不是 booleans,这个时间可以减少到大约 15 秒(这里是 YMMV),因为减少的内存使用提供了更好的缓存局部性.>

打包标志,同时消除 3 的倍数并使用位旋转以更快地计数,

//计算 int 中设置的位数公共静态 int popCount(int n) {n -= (n > > > 1) &0x55555555;n = ((n >>> 2) & 0x33333333) + (n & 0x33333333);n = ((n >> 4) & 0x0F0F0F0F) + (n & 0x0F0F0F0F);返回 (n * 0x01010101) >>24;}//通过计算每个素数来加速计数//数组槽而不是单独的.这产生//大约 1.24 左右的另一个因子.公共静态 int nthPrime(int n) {如果 (n <2) 返回 2;如果 (n == 2) 返回 3;如果 (n == 3) 返回 5;整数限制,根,计数 = 2;limit = (int)(n*(Math.log(n) + Math.log(Math.log(n)))) + 3;root = (int)Math.sqrt(limit);开关(限制%6){案例0:限制 = 2*(限制/6) - 1;休息;案例5:极限 = 2*(极限/6) + 1;休息;默认:限制= 2 *(限制/6);}开关(根%6){案例0:根 = 2*(根/6) - 1;休息;案例5:根 = 2*(根/6) + 1;休息;默认:根=2*(根/6);}int dim = (limit+31) >>5;int[] 筛 = 新的 int[dim];for(int i = 0; i < root; ++i) {if ((sieve[i >> 5] & (1 << (i&31))) == 0) {整数开始,s1,s2;如果 ((i & 1) == 1) {开始 = i*(3*i+8)+4;s1 = 4*i+5;s2 = 2*i+3;} 别的 {开始 = i*(3*i+10)+7;s1 = 2*i+3;s2 = 4*i+7;}for(int j = start; j = limit) 中断;筛子[j>>>5] |= 1 <<(j&31);}}}国际我;for(i = 0; count  > p) &1;}返回 3*(p+(i<<5))+7+(p&1);}

在大约 9 秒内找到第 100 万个质数,这不算长.

还有其他类型的素数筛,特别有趣的是阿特金筛,它利用了这样一个事实,即(有理)素数的某些同余类是 ℚ 的一些二次扩展的代数整数环中的复合物.这里不是扩展数学理论的地方,只需说 Atkin 筛比 Eratosthenes 筛具有更低的算法复杂性,因此更适合大限制(对于小限制,没有过度优化的 Atkin 筛具有更高的开销,因此可能比相对优化的 Eratosthenes 筛慢).DJ Bernstein 的 primegen 库(用 C 编写)针对小于 232<的数字进行了很好的优化/sup> 并在大约 1.1 秒内找到第 100 万个素数(此处).

快捷方式

如果我们只想找到 nth 个素数,那么找出所有较小的素数并没有内在价值.如果我们可以跳过其中的大部分,我们可以节省大量时间和工作.给定 a(n)nth 素数 p(n) 的一个很好的近似值,如果我们有计算不超过a(n)的素数π(a(n))的快速方法,然后我们可以在a之上或之下筛选一个小范围(n) 识别 a(n)p(n) 之间少数缺失或多余的素数.

我们在上面看到了一个很容易计算的相当好的 p(n) 近似值,我们可以取

a(n) = n*(log n + log (log n))

例如.

计算 π(x) 的一个好方法是 Meissel-Lehmer 方法,它在大约 O(x^0.7) 的时间内计算 π(x)(确切的复杂度取决于在实现上,由 Lagarias、Miller、Odlyzko、Deléglise 和 Rivat 进行的改进让人们可以在 O(x2/3/log² x) 时间内计算 π(x).

从简单的近似a(n)开始,我们计算e(n) = π(a(n)) - n.根据素数定理,a(n)附近的素数密度约为1/log a(n),所以我们期望p(n) 接近 b(n) = a(n) - log a(n)*e(n) 并且我们会筛选比 log a(n) 稍大的范围*e(n).为了让 p(n) 在筛选范围内更有信心,可以将范围增加 2 倍,例如,这几乎肯定会足够大.如果范围看起来太大,可以用更好的近似 b(n) 代替 a(n) 进行迭代,计算 π(b(n))f(n) = π((b(n)) - n.通常情况下,|f(n)| 将远小于 <代码>|e(n)|.如果f(n) 近似于-e(n)c(n) = (a(n) + b(n))/2 将是 p(n) 更好的近似值.只有在极不可能的情况下 f(n)> 非常接近 e(n)(而不是非常接近 0),找到一个足够好的 p(n) 近似值,可以完成最后的筛分阶段与计算 π(a(n)) 相当的时间成为一个问题.

一般情况下,对初始近似值进行一两次改进后,要筛选的范围足够小,筛选阶段的复杂度为 O(n^0.75) 或更好.

该方法在大约 40 毫秒内找到第 100 万个素数,在不到 8 秒内找到第 1012 个素数,29996224275833.

<小时>

tl;dr: 找到 nth 个素数可以有效地完成,但你想要的效率越高,数学就越多涉及.

<小时>

我为大多数讨论的算法准备了 Java 代码这里,以防有人想玩玩和他们在一起.

<小时>

¹ 除了对过分感兴趣的灵魂的评论:现代数学中使用的素数的定义是不同的,适用于更一般的情况.如果我们调整学校定义以包含负数——所以如果一个数字既不是 1 也不是 -1 并且只能被 1、-1、它本身和它的负数整除,那么它就是素数——这定义了(对于整数)现在称为 irreducible 元素,然而,对于整数,素数和不可约元素的定义是一致的.

I am trying to calculate prime numbers, which I've already done. But I want to calculate and print ONLY the nth prime number (User input), while calculating the rest (They won't be printed) only the nth prime number will be printed.

Here's what I've written so far:

import java.util.Scanner;
/**
 * Calculates the nth prime number
 * @author {Zyst}
 */
public class Prime {
    public static void main(String[] args) {

        Scanner input = new Scanner(System.in);
        int n, 
            i = 2, 
            x = 2;

        System.out.printf("This program calculates the nth Prime number
");
        System.out.printf("Please enter the nth prime number you want to find: ");
        n = input.nextInt();

        for(i = 2, x = 2; i <= n; i++) {
            for(x = 2; x < i; x++) {
                if(i % x == 0) {
                    break;
                }
            }
            if(x == i) {
                System.out.printf("
%d is prime", x);
            }
        }
    }
}

This is the program I wrote to calculate the prime numbers from 1 to n. However, I want it to only print the nth prime number,

What I've thought of doing is making some sort of count int and ++ing it every time it finds a prime, and when the count == n then it prints out that number, but I can't quite figure out how to land it.

解决方案

To calculate the n-th prime, I know two main variants.

The straightforward way

That is to count all the primes starting from 2 as you find them until you have reached the desired nth.

This can be done with different levels of sophistication and efficiency, and there are two conceptually different ways to do it. The first is

Testing the primality of all numbers in sequence

This would be accomplished by a driver function like

public static int nthPrime(int n) {
    int candidate, count;
    for(candidate = 2, count = 0; count < n; ++candidate) {
        if (isPrime(candidate)) {
            ++count;
        }
    }
    // The candidate has been incremented once after the count reached n
    return candidate-1;
}

and the interesting part that determines the efficiency is the isPrime function.

The obvious way for a primality check, given the definition of a prime as a number greater than 1 that is divisible only by 1 and by itself that we learned in school¹, is

Trial division

The direct translation of the definition into code is

private static boolean isPrime(int n) {
    for(int i = 2; i < n; ++i) {
        if (n % i == 0) {
            // We are naive, but not stupid, if
            // the number has a divisor other
            // than 1 or itself, we return immediately.
            return false;
        }
    }
    return true;
}

but, as you will soon discover if you try it, its simplicity is accompanied by slowness. With that primality test, you can find the 1000th prime, 7919, in a few milliseconds (about 20 on my computer), but finding the 10000th prime, 104729, takes seconds (~2.4s), the 100000th prime,1299709, several minutes (about 5), the millionth prime, 15485863, would take about eight and a half hours, the ten-millionth prime, 179424673, weeks, and so on. The runtime complexity is worse than quadratic - Θ(n² * log n).

So we'd like to speed the primality test up somewhat. A step that many people take is the realisation that a divisor of n (other than n itself) can be at most n/2. If we use that fact and let the trial division loop only run to n/2 instead of n-1, how does the running time of the algorithm change? For composite numbers, the lower loop limit doesn't change anything. For primes, the number of trial divisions is halved, so overall, the running time should be reduced by a factor somewhat smaller than 2. If you try it out, you will find that the running time is almost exactly halved, so almost all the time is spent verifying the primality of primes despite there being many more composites than primes.

Now, that didn't help much if we want to find the one-hundred-millionth prime, so we have to do better. Trying to reduce the loop limit further, let us see for what numbers the upper bound of n/2 is actually needed. If n/2 is a divisor of n, then n/2 is an integer, in other words, n is divisible by 2. But then the loop doesn't go past 2, so it never (except for n = 4) reaches n/2. Jolly good, so what's the next largest possible divisor of n? Why, n/3 of course. But n/3 can only be a divisor of n if it is an integer, in other words, if n is divisible by 3. Then the loop will exit at 3 (or before, at 2) and never reach n/3 (except for n = 9). The next largest possible divisor ...

Hang on a minute! We have 2 <-> n/2 and 3 <-> n/3. The divisors of n come in pairs.

If we consider the pair (d, n/d) of corresponding divisors of n, either d = n/d, i.e. d = √n, or one of them, say d, is smaller than the other. But then d*d < d*(n/d) = n and d < √n. Each pair of corresponding divisors of n contains (at least) one which does not exceed √n.

If n is composite, its smallest nontrivial divisor does not exceed √n.

So we can reduce the loop limit to √n, and that reduces the runtime complexity of the algorithm. It should now be Θ(n1.5 * √(log n)), but empirically it seems to scale a little bit better - however, there's not enough data to draw reliable conclusions from empirical results.

That finds the millionth prime in about 16 seconds, the ten-millionth in just under nine minutes, and it would find the one-hundred-millionth in about four and a half hours. That's still slow, but a far cry from the ten years or so it would take the naive trial division.

Since there are squares of primes and products of two close primes, like 323 = 17*19, we cannot reduce the limit for the trial division loop below √n. Therefore, while staying with trial division, we must look for other ways to improve the algorithm now.

One easily seen thing is that no prime other than 2 is even, so we need only check odd numbers after we have taken care of 2. That doesn't make much of a difference, though, since the even numbers are the cheapest to find composite - and the bulk of time is still spent verifying the primality of primes. However, if we look at the even numbers as candidate divisors, we see that if n is divisible by an even number, n itself must be even, so (excepting 2) it will have been recognised as composite before division by any even number greater than 2 is attempted. So all divisions by even numbers greater than 2 that occur in the algorithm must necessarily leave a nonzero remainder. We can thus omit these divisions and check for divisibility only by 2 and the odd numbers from 3 to √n. This halves (not quite exactly) the number of divisions required to determine a number as prime or composite and therefore the running time. That's a good start, but can we do better?

Another large family of numbers is the multiples of 3. Every third division we perform is by a multiple of 3, but if n is divisible by one of them, it is also divisible by 3, and hence no division by 9, 15, 21, ... that we perform in our algorithm will ever leave a remainder of 0. So, how can we skip these divisions? Well, the numbers divisible by neither 2 nor 3 are precisely the numbers of the form 6*k ± 1. Starting from 5 (since we're only interested in numbers greater than 1), they are 5, 7, 11, 13, 17, 19, ..., the step from one to the next alternates between 2 and 4, which is easy enough, so we can use

private static boolean isPrime(int n) {
    if (n % 2 == 0) return n == 2;
    if (n % 3 == 0) return n == 3;
    int step = 4, m = (int)Math.sqrt(n) + 1;
    for(int i = 5; i < m; step = 6-step, i += step) {
        if (n % i == 0) {
            return false;
        }
    }
    return true;
}

This gives us another speedup by a factor of (nearly) 1.5, so we'd need about one and a half hours to the hundred-millionth prime.

If we continue this route, the next step is the elimination of multiples of 5. The numbers coprime to 2, 3 and 5 are the numbers of the form

30*k + 1, 30*k + 7, 30*k + 11, 30*k + 13, 30*k + 17, 30*k + 19, 30*k + 23, 30*k + 29

so we'd need only divide by eight out of every thirty numbers (plus the three smallest primes). The steps from one to the next, starting from 7, cycle through 4, 2, 4, 2, 4, 6, 2, 6. That's still easy enough to implement and yields another speedup by a factor of 1.25 (minus a bit for more complicated code). Going further, the multiples of 7 would be eliminated, leaving 48 out of every 210 numbers to divide by, then 11 (480/2310), 13 (5760/30030) and so on. Each prime p whose multiples are eliminated yields a speedup of (almost) p/(p-1), so the return decreases while the cost (code complexity, space for the lookup table for the steps) increases with each prime.

In general, one would stop soonish, after eliminating the multiples of maybe six or seven primes (or even fewer). Here, however, we can follow through to the very end, when the multiples of all primes have been eliminated and only the primes are left as candidate divisors. Since we are finding all primes in order, each prime is found before it is needed as a candidate divisor and can then be stored for future use. This reduces the algorithmic complexity to - if I haven't miscalculated - O(n1.5 / √(log n)). At the cost of space usage for storing the primes.

With trial division, that is as good as it gets, you have to try and divide by all primes to √n or the first dividing n to determine the primality of n. That finds the hundred-millionth prime in about half an hour here.

So how about

Fast primality tests

Primes have other number-theoretic properties than the absence of nontrivial divisors which composite numbers usually don't have. Such properties, if they are fast to check, can form the basis of probabilistic or deterministic primality tests. The archetypical such property is associated with the name of Pierre de Fermat, who, in the early 17th century, found that

If p is a prime, then p is a divisor of (ap-a) for all a.

This - Fermat's so-called 'little theorem' - is, in the equivalent formulation

Let p be a prime and a not divisible by p. Then p divides ap-1 - 1.

the basis of most of the widespread fast primality tests (for example Miller-Rabin) and variants or analogues of that appear in even more (e.g. Lucas-Selfridge).

So if we want to know if a not too small odd number n is a prime (even and small numbers are efficiently treated by trial division), we can choose any number a (> 1) which is not a multiple of n, for example 2, and check whether n divides an-1 - 1. Since an-1 becomes huge, that is most efficiently done by checking whether a^(n-1) ≡ 1 (mod n), i.e. by modular exponentiation. If that congruence doesn't hold, we know that n is composite. If it holds, however, we cannot conclude that n is prime, for example 2^340 ≡ 1 (mod 341), but 341 = 11 * 31 is composite. Composite numbers n such that a^(n-1) ≡ 1 (mod n) are called Fermat pseudoprimes for the base a.

But such occurrences are rare. Given any base a > 1, although there are an infinite number of Fermat pseudoprimes to base a, they are much rarer than actual primes. For example, there are only 78 base-2 Fermat pseudoprimes and 76 base-3 Fermat pseudoprimes below 100000, but 9592 primes. So if one chooses an arbitrary odd n > 1 and an arbitrary base a > 1 and finds a^(n-1) ≡ 1 (mod n), there's a good chance that n is actually prime.

However, we are in a slightly different situation, we are given n and can only choose a. So, for an odd composite n, for how many a, 1 < a < n-1 can a^(n-1) ≡ 1 (mod n) hold? Unfortunately, there are composite numbers - Carmichael numbers - such that the congruence holds for every a coprime to n. That means that to identify a Carmichael number as composite with the Fermat test, we have to pick a base that is a multiple of one of n's prime divisors - there may not be many such multiples.

But we can strengthen the Fermat test so that composites are more reliably detected. If p is an odd prime, write p-1 = 2*m. Then, if 0 < a < p,

a^(p-1) - 1 = (a^m + 1) * (a^m - 1)

and p divides exactly one of the two factors (the two factors differ by 2, so their greatest common divisor is either 1 or 2). If m is even, we can split a^m - 1 in the same way. Continuing, if p-1 = 2^s * k with k odd, write

a^(p-1) - 1 = (a^(2^(s-1)*k) + 1) * (a^(2^(s-2)*k) + 1) * ... * (a^k + 1) * (a^k - 1)

then p divides exactly one of the factors. This gives rise to the strong Fermat test,

Let n > 2 be an odd number. Write n-1 = 2^s * k with k odd. Given any a with 1 < a < n-1, if

  1. a^k ≡ 1 (mod n) or
  2. a^((2^j)*k) ≡ -1 (mod n) for any j with 0 <= j < s

then n is a strong (Fermat) probable prime for base a. A composite strong base a (Fermat) probable prime is called a strong (Fermat) pseudoprime for the base a. Strong Fermat pseudoprimes are even rarer than ordinary Fermat pseudoprimes, below 1000000, there are 78498 primes, 245 base-2 Fermat pseudoprimes and only 46 base-2 strong Fermat pseudoprimes. More importantly, for any odd composite n, there are at most (n-9)/4 bases 1 < a < n-1 for which n is a strong Fermat pseudoprime.

So if n is an odd composite, the probability that n passes k strong Fermat tests with randomly chosen bases between 1 and n-1 (exclusive bounds) is less than 1/4^k.

A strong Fermat test takes O(log n) steps, each step involves one or two multiplications of numbers with O(log n) bits, so the complexity is O((log n)^3) with naive multiplication [for huge n, more sophisticated multiplication algorithms can be worthwhile].

The Miller-Rabin test is the k-fold strong Fermat test with randomly chosen bases. It is a probabilistic test, but for small enough bounds, short combinations of bases are known which give a deterministic result.

Strong Fermat tests are part of the deterministic APRCL test.

It is advisable to precede such tests with trial division by the first few small primes, since divisions are comparatively cheap and that weeds out most composites.

For the problem of finding the nth prime, in the range where testing all numbers for primality is feasible, there are known combinations of bases that make the multiple strong Fermat test correct, so that would give a faster - O(n*(log n)4) - algorithm.

For n < 2^32, the bases 2, 7, and 61 are sufficient to verify primality. Using that, the hundred-millionth prime is found in about six minutes.

Eliminating composites by prime divisors, the Sieve of Eratosthenes

Instead of investigating the numbers in sequence and checking whether each is prime from scratch, one can also consider the whole set of relevant numbers as one piece and eliminate the multiples of a given prime in one go. This is known as the Sieve of Eratosthenes:

To find the prime numbers not exceeding N

  1. make a list of all numbers from 2 to N
  2. for each k from 2 to N: if k is not yet crossed off, it is prime; cross off all multiples of k as composites

The primes are the numbers in the list which aren't crossed off.

This algorithm is fundamentally different from trial division, although both directly use the divisibility characterisation of primes, in contrast to the Fermat test and similar tests which use other properties of primes.

In trial division, each number n is paired with all primes not exceeding the smaller of √n and the smallest prime divisor of n. Since most composites have a very small prime divisor, detecting composites is cheap here on average. But testing primes is expensive, since there are relatively many primes below √n. Although there are many more composites than primes, the cost of testing primes is so high that it completely dominates the overall running time and renders trial division a relatively slow algorithm. Trial division for all numbers less than N takes O(N1.5 / (log N)²) steps.

In the sieve, each composite n is paired with all of its prime divisors, but only with those. Thus there the primes are the cheap numbers, they are only ever looked at once, while the composites are more expensive, they are crossed off multiple times. One might believe that since a sieve contains many more 'expensive' numbers than 'cheap' ones, it would overall be a bad algorithm. However, a composite number does not have many distinct prime divisors - the number of distinct prime divisors of n is bounded by log n, but usually it is much smaller, the average of the number of distinct prime divisors of the numbers <= n is log log n - so even the 'expensive' numbers in the sieve are on average no more (or hardly more) expensive than the 'cheap' numbers for trial division.

Sieving up to N, for each prime p, there are Θ(N/p) multiples to cross off, so the total number of crossings-off is Θ(∑ (N/p)) = Θ(N * log (log N)). This yields much faster algorithms for finding the primes up to N than trial division or sequential testing with the faster primality tests.

There is, however, a disadvantage to the sieve, it uses O(N) memory. (But with a segmented sieve, that can be reduced to O(√N) without increasing the time complexity.)

For finding the nth prime, instead of the primes up to N, there is also the problem that it is not known beforehand how far the sieve should reach.

The latter can be solved using the prime number theorem. The PNT says

π(x) ~ x/log x (equivalently: lim π(x)*log x/x = 1),

where π(x) is the number of primes not exceeding x (here and below, log must be the natural logarithm, for the algorithmic complexities it is not important which base is chosen for the logarithms). From that, it follows that p(n) ~ n*log n, where p(n) is the nth prime, and there are good upper bounds for p(n) known from deeper analysis, in particular

n*(log n + log (log n) - 1) < p(n) < n*(log n + log (log n)), for n >= 6.

So one can use that as the sieving limit, it doesn't exceed the target far.

The O(N) space requirement can be overcome by using a segmented sieve. One can then record the primes below √N for O(√N / log N) memory consumption and use segments of increasing length (O(√N) when the sieve is near N).

There are some easy improvements on the algorithm as stated above:

  1. start crossing off multiples of p only at , not at 2*p
  2. eliminate the even numbers from the sieve
  3. eliminate the multiples of further small primes from the sieve

None of these reduce the algorithmic complexity, but they all reduce the constant factors by a significant amount (as with trial division, the elimination of multiples of p yields lesser speedup for larger p while increasing the code complexity more than for smaller p).

Using the first two improvements yields

// Entry k in the array represents the number 2*k+3, so we have to do
// a bit of arithmetic to get the indices right.
public static int nthPrime(int n) {
    if (n < 2) return 2;
    if (n == 2) return 3;
    int limit, root, count = 1;
    limit = (int)(n*(Math.log(n) + Math.log(Math.log(n)))) + 3;
    root = (int)Math.sqrt(limit) + 1;
    limit = (limit-1)/2;
    root = root/2 - 1;
    boolean[] sieve = new boolean[limit];
    for(int i = 0; i < root; ++i) {
        if (!sieve[i]) {
            ++count;
            for(int j = 2*i*(i+3)+3, p = 2*i+3; j < limit; j += p) {
                sieve[j] = true;
            }
        }
    }
    int p;
    for(p = root; count < n; ++p) {
        if (!sieve[p]) {
            ++count;
        }
    }
    return 2*p+1;
}

which finds the hundred-millionth prime, 2038074743, in about 18 seconds. This time can be reduced to about 15 seconds (here, YMMV) by storing the flags packed, one bit per flag, instead of as booleans, since the reduced memory usage gives better cache locality.

Packing the flags, eliminating also multiples of 3 and using bit-twiddling for faster faster counting,

// Count number of set bits in an int
public static int popCount(int n) {
    n -= (n >>> 1) & 0x55555555;
    n = ((n >>> 2) & 0x33333333) + (n & 0x33333333);
    n = ((n >> 4) & 0x0F0F0F0F) + (n & 0x0F0F0F0F);
    return (n * 0x01010101) >> 24;
}

// Speed up counting by counting the primes per
// array slot and not individually. This yields
// another factor of about 1.24 or so.
public static int nthPrime(int n) {
    if (n < 2) return 2;
    if (n == 2) return 3;
    if (n == 3) return 5;
    int limit, root, count = 2;
    limit = (int)(n*(Math.log(n) + Math.log(Math.log(n)))) + 3;
    root = (int)Math.sqrt(limit);
    switch(limit%6) {
        case 0:
            limit = 2*(limit/6) - 1;
            break;
        case 5:
            limit = 2*(limit/6) + 1;
            break;
        default:
            limit = 2*(limit/6);
    }
    switch(root%6) {
        case 0:
            root = 2*(root/6) - 1;
            break;
        case 5:
            root = 2*(root/6) + 1;
            break;
        default:
            root = 2*(root/6);
    }
    int dim = (limit+31) >> 5;
    int[] sieve = new int[dim];
    for(int i = 0; i < root; ++i) {
        if ((sieve[i >> 5] & (1 << (i&31))) == 0) {
            int start, s1, s2;
            if ((i & 1) == 1) {
                start = i*(3*i+8)+4;
                s1 = 4*i+5;
                s2 = 2*i+3;
            } else {
                start = i*(3*i+10)+7;
                s1 = 2*i+3;
                s2 = 4*i+7;
            }
            for(int j = start; j < limit; j += s2) {
                sieve[j >> 5] |= 1 << (j&31);
                j += s1;
                if (j >= limit) break;
                sieve[j >> 5] |= 1 << (j&31);
            }
        }
    }
    int i;
    for(i = 0; count < n; ++i) {
        count += popCount(~sieve[i]);
    }
    --i;
    int mask = ~sieve[i];
    int p;
    for(p = 31; count >= n; --p) {
        count -= (mask >> p) & 1;
    }
    return 3*(p+(i<<5))+7+(p&1);
}

finds the hundred-millionth prime in about 9 seconds, which is not unbearably long.

There are other types of prime sieves, of particular interest is the Sieve of Atkin, which exploits the fact that certain congruence classes of (rational) primes are composites in the ring of algebraic integers of some quadratic extensions of ℚ. Here is not the place to expand on the mathematical theory, suffice it to say that the Sieve of Atkin has lower algorithmic complexity than the Sieve of Eratosthenes and hence is preferable for large limits (for small limits, a not overly optimised Atkin sieve has higher overhead and thus can be slower than a comparably optimised Eratosthenes sieve). D. J. Bernstein's primegen library (written in C) is well optimised for numbers below 232 and finds the hundred-millionth prime (here) in about 1.1 seconds.

The fast way

If we only want to find the nth prime, there is no intrinsic value in also finding all the smaller primes. If we can skip most of them, we can save a lot of time and work. Given a good approximation a(n) to the nth prime p(n), if we have a fast way to calculate the number of primes π(a(n)) not exceeding a(n), we can then sieve a small range above or below a(n) to identify the few missing or excess primes between a(n) and p(n).

We have seen an easily computed fairly good approximation to p(n) above, we could take

a(n) = n*(log n + log (log n))

for example.

A good method to compute π(x) is the Meissel-Lehmer method, which computes π(x) in roughly O(x^0.7) time (the exact complexity depends on the implementation, a refinement by Lagarias, Miller, Odlyzko, Deléglise and Rivat lets one compute π(x) in O(x2/3 / log² x) time).

Starting with the simple approximation a(n), we compute e(n) = π(a(n)) - n. By the prime number theorem, the density of primes near a(n) is about 1/log a(n), so we expect p(n) to be near b(n) = a(n) - log a(n)*e(n) and we would sieve a range slightly larger than log a(n)*e(n). For greater confidence that p(n) is in the sieved range, one can increase the range by a factor of 2, say, which almost certainly will be large enough. If the range seems too large, one can iterate with the better approximation b(n) in place of a(n), compute π(b(n)) and f(n) = π((b(n)) - n. Typically, |f(n)| will be much smaller than |e(n)|. If f(n) is approximately -e(n), c(n) = (a(n) + b(n)) / 2 will be a better approximation to p(n). Only in the very unlikely case that f(n) is very close to e(n) (and not very close to 0), finding a sufficiently good approximation to p(n) that the final sieving stage can be done in time comparable to computing π(a(n)) becomes a problem.

In general, after one or two improvements to the initial approximation, the range to be sieved is small enough for the sieving stage to have a complexity of O(n^0.75) or better.

This method finds the hundred-millionth prime in about 40 milliseconds, and the 1012-th prime, 29996224275833, in under eight seconds.


tl;dr: Finding the nth prime can be efficiently done, but the more efficient you want it, the more mathematics is involved.


I have Java code for most of the discussed algorithms prepared here, in case somebody wants to play around with them.


¹ Aside remark for overinterested souls: The definition of primes used in modern mathematics is different, applicable in much more general situations. If we adapt the school definition to include negative numbers - so a number is prime if it's neither 1 nor -1 and divisible only by 1, -1, itself and its negative - that defines (for integers) what is nowadays called an irreducible element of ℤ, however, for integers, the definitions of prime and irreducible elements coincide.

这篇关于计算并打印第 n 个质数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆