计算并打印第n个素数 [英] Calculating and printing the nth prime number

查看:157
本文介绍了计算并打印第n个素数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算素数,我已经完成了。但是我想计算并打印第n个素数(用户输入),在计算其余部分时(它们不会被打印),只打印第n个素数。



以下是我到目前为止所写的内容:

  import java.util.Scanner; 
/ **
*计算第n个素数
* @author {Zyst}
* /
公共类Prime {
public static void main( String [] args){

扫描仪输入=新扫描仪(System.in);
int n,
i = 2,
x = 2;

System.out.printf(此程序计算第n个素数\ n);
System.out.printf(请输入您要查找的第n个素数:);
n = input.nextInt();

for(i = 2,x = 2; i< = n; i ++){
for(x = 2; x< i; x ++){
if (i%x == 0){
break;
}
}
if(x == i){
System.out.printf(\ n%d是prime,x);
}
}
}
}

这个是我写的计算从1到n的素数的程序。但是,我希望它只打印第n个素数,



我所想做的就是每次都做一些int int和++找到一个素数,当计数== n然后它打印出那个数字,但我无法弄清楚如何降落它。

解决方案

要计算第n个素数,我知道两个主要变体。



直截了当的方式



这是为了计算从2开始的所有素数,直到你达到所需的n th



这可以完成不同程度的复杂性和效率,有两种概念上不同的方法。第一个是



按顺序测试所有数字的素数



这可以通过驱动程序函数来完成

  public static int nthPrime(int n){
int candidate,count;
for(candidate = 2,count = 0; count< n; ++ candidate){
if(isPrime(candidate)){
++ count;
}
}
//在计数达到n
返回候选者-1后,候选人已经增加一次;
}

确定效率的有趣部分是 isPrime function。



对于素数检查的明显方法,给定一个素数大于1的数字的定义,只能通过1我们在学校里学到的,是



试验师



将定义直接翻译成代码是

  private static boolean isPrime(int n){
for(int i = 2; i< n ; ++ i){
if(n%i == 0){
//我们天真,但不傻,如果
//数字有除数其他
//比1或者它本身,我们马上回来。
返回false;
}
}
返回true;
}

但是,您很快就会发现,如果您尝试它,它的简单性是伴随的缓慢。
通过该素性测试,您可以在几毫秒内(我的计算机上大约20)找到1000 th prime,7919,但找到10000 th 素数,104729,需要几秒钟(~2.4秒),100000 th 素数,1299709,几分钟(约5),百万素数,15485863,需要大约八个半小时,十点-millionth prime,179424673,周,等等。运行时复杂度比二次方更差 - Θ(n²* log n)。



所以我们想要稍微提高素性测试速度。许多人采取的步骤是认识到 n 的除数(除了 n 本身)最多可以是 N / 2
如果我们使用这个事实并让试验部门循环只运行到 n / 2 而不是 n-1 ,算法的运行时间如何变化?
对于复合数字,较低的循环限制不会改变任何东西。对于素数,试验分割的数量减半,因此整体而言,运行时间应该减少一个小于2的因子。如果你尝试一下,你会发现运行时间几乎完全减半,所以几乎所有的时间都用于验证素数的素数,尽管复合材料比素数更多。



现在,如果我们想要的话,这没有多大帮助找到亿万元素数,所以我们必须做得更好。试图进一步减少循环限制,让我们看看实际需要的是什么数字 n / 2 的上限。如果 n / 2 n 的除数,则 n / 2 是一个整数,换句话说, n 可被2整除。但是循环不会超过2,所以它永远不会(除了<$ c) $ c> n = 4 )达到 n / 2 。快乐,那么 n 的下一个最大可能除数是多少?
为什么, n / 3 当然。但 n / 3 只能是 n 的除数,如果它是一个整数,换句话说,如果 n 可被3整除。然后循环将退出3(或之前,2)并且永远不会达到 n / 3 n = 9 除外)。下一个可能的除数...



等一下!我们有 2< - > n / 2 3< - > N / 3 n的除数成对出现。



如果我们考虑对(d,n / d)相应的 n 的除数, d = n / d ,即 d =√n,或其中一个,比如 d ,小于另一个。但是然后 d * d< d *(n / d)= n d< √N。每对 n 的相应除数包含(至少)一个不超过√n的除数。



如果 n 是复合的,其最小的非平均除数不会超过 √n



因此我们可以将循环限制减少到√n,这降低了算法的运行时复杂性。它现在应该是Θ(n 1.5 *√(log n)),但根据经验,它似乎可以更好地扩展 - 但是,没有足够的数据可以从实证结果中得出可靠的结论。 p>

在大约16秒内找到百万分之一的素数,在不到9分钟内找到百万分之一,并且在大约四个半小时内找到百万分之一。这仍然很慢,但与十年左右的情况相差甚远,它将需要天真的试验部门。



由于有两个素数的素数和两个素数的正方形,如323 = 17 * 19,我们无法将试验分割循环的限制降低到√n以下。因此,在试用分区的同时,我们必须寻找其他方法来改进算法。



一个容易看到的事情是除了2以外没有素数是偶数,所以我们只需要在处理完2之后检查奇数。但这并没有太大的区别,因为偶数是找到复合数最便宜的 - 并且大部分时间仍用于验证质数的素数。但是,如果我们将偶数作为候选除数看,我们看到如果 n 可以被偶数整除, n 本身必须是偶数,所以(除了2)它将在除法之前被识别为复合,尝试任何大于2的偶数。因此,算法中出现的偶数大于2的所有除法必须留下非零余数。因此,我们可以省略这些划分并仅检查2的可分性和从3到√n的奇数。这将(确切地说)将数字确定为素数或复合数所需的除法数减半(因此不是确切地说),因此也就是运行时间。这是一个好的开始,但我们可以做得更好吗?



另一大数字系列是3的倍数。我们执行的每第三个除法是3的倍数,但是如果 n 可以被其中一个整除,它也可以被3整除,因此我们在算法中执行的不会分为9,15,21 ......永远留下0.
的剩余部分那么,我们怎么能跳过这些划分呢?那么,2和3都不能分割的数字正好是 6 * k±1 形式的数字。从5开始(因为我们只对大于1的数字感兴趣),它们是5,7,11,13,17,19 ......,从1到下一个的步骤在2到4之间交替,这是很容易,所以我们可以使用

  private static boolean isPrime(int n){
if(n%2 = = 0)返回n == 2;
如果(n%3 == 0)返回n == 3;
int step = 4,m =(int)Math.sqrt(n)+ 1;
for(int i = 5; i< m; step = 6-step,i + = step){
if(n%i == 0){
return false;
}
}
返回true;
}

这给我们带来了另一个加速因子(接近)1.5,所以我们需要大约一个半小时才能达到亿分之一。



如果我们继续这条路线,下一步就是消除5的倍数。 coprime to 2,3和5是表格的编号

  30 * k + 1,30 * k + 7,30 * k + 11,30 * k + 13,30 * k + 17,30 * k + 19,30 * k + 23,30 * k + 29 

所以我们只需要除了每30个数字中的8个(加上最小的三个素数)。从一个到下一个的步骤,从7开始,循环到4,2,4,2,4,6,2,6。这仍然很容易实现并产生另一个加速因子1.25(减去一点更复杂的代码)。更进一步,7的倍数将被消除,每210个数字中有48个除以,然后是11(480/2310),13(5760/30030),依此类推。每个素数 p 的倍数被消除,产生的速度(几乎) p /(p-1),所以返回减少而成本(代码复杂性,步骤查找表的空间)随着每个素数的增加而增加。



一般来说,在消除倍数后,人们会很快停止可能有六七个素数(甚至更少)。然而,在这里,我们可以追溯到最后,当所有素数的倍数被消除并且只有素数被留作候选除数时。由于我们按顺序查找所有素数,因此每个素数在需要作为候选除数之前找到,然后可以存储以备将来使用。这会将算法复杂度降低到 - 如果我没有计算错误 - O(n 1.5 /√(log n))。以存储素数的空间使用为代价。



通过试验分割,这是最好的,你必须尝试除以所有素数到√n或第一个除以 n 来确定 n 的素数。在这里发现了大约半个小时的第一百万次素数。



那么怎么样



快速素性测试



与没有复数通常没有的非平凡除数相比,素数具有其他数论性质。如果这些属性快速检查,则可以形成概率或确定性素性测试的基础。典型的这种财产与Pierre de Fermat的名字有关,他在17世纪早期发现


如果 p 是素数,那么 p 是(a p -a)所有 a


这 - 费马的所以 - 被称为'小定理' - 是等价的公式


p be一个素数和 a 不能被 p 整除。然后 p p-1 - 1。


大多数广泛的快速素性测试(例如Miller-Rabin)的基础,以及更多的变体或类似物(例如Lucas-Selfridge)。



因此,如果我们想知道一个不太小的奇数 n 是否是一个素数(偶数和小数字被试验部门有效处理),我们可以选择任何数字 a (> 1),它不是 n 的倍数,例如2,并检查是否 n 除以 n-1 - 1.由于 n-1 变得很大,通过检查$ b来最有效地完成$ b a ^(n-1)≡1(mod n),即通过模幂运算。如果这种一致性不成立,我们知道 n 是复合的。但是,如果它成立,我们不能断定 n 是素数,例如 2 ^340≡1(mod 341),但 341 = 11 * 31 是复合的。复合数 n 使得 a ^(n-1)≡1(mod n)被称为Fermat pseudoprimes base a



但这种情况很少发生。给定任何基数 a> 1 ,虽然基数 a 存在无限数量的费马伪像,但它们比实际素数少得多。例如,只有78个base-2 Fermat pseudoprimes和76个base-3 Fermat pseudoprimes低于100000,但是9592个primes。因此,如果选择任意奇数 n> 1 和任意基数 a> 1 并找到 a ^(n-1)≡1(mod n),很有可能 n 实际上是素数。



但是,我们的情况略有不同,我们给出 n 并且只能选择 a 。因此,对于奇数合成 n ,对于多少 a 1< a< n-1 可以 a ^(n-1)≡1(mod n)持有?
不幸的是,有一些复合数字 - 卡迈克尔数字 - 这样,每个 a 的一致性为 ñ。这意味着要将Carmichael数字识别为Fermat测试的复合数,我们必须选择一个基数,该基数是 n 的主要除数之一的倍数 - 可能没有很多这样的倍数。



但我们可以加强Fermat测试,以便更可靠地检测复合材料。如果 p 是一个奇数素数,写 p-1 = 2 * m 。然后,如果 0< a< p

  a ^(p-1) -  1 =(a ^ m + 1) *(a ^ m  -  1)

p 正好除以两个因子中的一个(两个因子相差2,所以它们的最大公约数是1或2)。如果 m 是偶数,我们可以用同样的方式拆分 a ^ m - 1 。继续,如果 p-1 = 2 ^ s * k k 奇数,写

  a ^(p-1) -  1 =(a ^(2 ^(s-1)* k)+ 1)*(a ^(2 ^(s-2)* k)+ 1)* ... *(a ^ k + 1)*(a ^ k  -  1)

然后 p 正好划分其中一个因素。这导致强大的费马测试,



n> 2 是一个奇数。写 n-1 = 2 ^ s * k k 奇数。给定 a 1< a< n-1 ,如果


  1. a ^k≡1(mod n)

  2. a ^((2 ^ j)*​​ k)≡-1(mod n) for任何 j 0< = j< s

然后 n 基础 a 的强(费马)概率。复合强基 a (费马)可能素数称为基数 a 的强(费马)伪值。强Fermat pseudoprimes甚至比普通Fermat pseudoprimes更罕见,低于1000000,有78498个primes,245个base-2 Fermat pseudoprimes和只有46个base-2强Fermat pseudoprimes。更重要的是,对于任何奇数复合 n ,最多(n-9)/ 4 基数 1< a< n-1 其中 n 是一个强大的费马假冒伪劣。



所以如果 n 是一个奇数合成, n 的概率通过 k 随机选择的基数在1和 n-1 (独占边界)之间的强大费马测试小于 1/4 ^ k



强大的费马测试采用O(log n)步,每一步涉及一个或两个数字乘以O(log n)位,因此复杂度为O((log n)^ 3)具有天真乘法[对于巨大的 n ,更复杂的乘法算法是值得的]。



Miller-Rabin检验是随机选择的碱基的k倍强Fermat检验。这是一个概率测试,但是对于足够小的界限,已知基数的短组合会给出确定性结果。



强费马测试是确定性APRCL测试的一部分。



建议在此类测试之前使用前几个小素数进行试验划分,因为划分相对便宜且可以清除大部分复合材料。



对于找到 n th prime的问题,在测试素数的所有数字是可行的范围内,有已知的基础组合使多重强Fermat检验正确,因此可以提供更快的速度 - O(n *(log n) 4 ) - 算法。



对于 n< 2 ^ 32 ,基数2,7和61足以验证素数。使用它,在大约六分钟内找到了第一亿个素数。



通过素数除数,Eratosthenes的Sieve消除复合物



不是按顺序调查数字并检查每个数字是否从头开始,也可以将整组相关数字视为一个整体,并一次性消除给定素数的倍数。这被称为Eratosthenes筛选:



查找不超过 N

$的素数b
$ b


  1. 列出所有数字从2到 N

  2. 每个 k 从2到 N :if k 尚未划掉,它是素数;交叉掉所有 k 的倍数作为复合材料

素数是没有划掉的列表。



这个算法与试验划分根本不同,虽然它们都直接使用素数的可分性表征,与费马试验和类似的测试使用素数的其他属性。



在试验部门,每个数字 n 与所有素数配对超过√n中的较小者和 n 的最小素数除数。由于大多数复合材料具有非常小的素数除数,因此检测复合材料在这里平均很便宜。但是测试素数是昂贵的,因为有相对多的素数低于√n。虽然复合材料比复合材料多得多,但测试质数的成本非常高,以至于它完全控制了整体运行时间,并使试验分区成为一种相对较慢的算法。所有小于 N 的数字的试验除法采用O(N 1.5 /(log N)²)步骤。



在筛子中,每个复合 n 与所有主要除数配对,但与它们配对。因此,素数是便宜的数字,它们只被看过一次,而复合材料更昂贵,它们被多次划掉。人们可能会认为,由于筛子包含的昂贵数字比廉价数字更多,因此整体上算法很糟糕。但是,复合数不具有许多不同的素数除数 - n 的不同质数除数的数量受 log n ,但通常它更小,数字< = n 的不同主要除数的平均值是 log log n - 所以即使筛子中的昂贵数字平均也不比试用部门的廉价数字贵(或几乎不多)。



筛选至 N ,对于每个素数 p ,有Θ(N / p)倍数交叉,所以交叉总数为Θ(Σ(N / p))=Θ(N * log(log N))。与使用更快的素性测试的试验除法或顺序测试相比,这可以产生更多更快的算法,用于查找高达 N 的素数。



然而,筛子有一个缺点,它使用 O(N)内存。 (但是使用分段筛,可以减少到 O(√N)而不会增加时间复杂度。)



为了找到 n th prime,而不是 N 的素数,还有一个问题是事先不知道筛子应该到达多远。



后者可以用素数定理求解。 PNT说

 π(x)~x / log x(相当于:limπ(x)* log x / x = 1),

其中π(x)是不超过 x 的素数(此处及以下, log 必须是自然对数,因为算法的复杂性选择哪个基数用于对数并不重要。从那以后, p(n)~n * log n ,其中 p(n) n th prime,并且 p(n)有更好的上限分析,特别是

  n *(log n + log(log n) -  1)< p(n)< n *(log n + log(log n)),n> = 6. 

所以可以使用它作为筛分限制,它不会超过目标远。



O(N)使用分段筛可以克服空间要求。然后可以记录√N下面的素数 O(√N/ log N)内存消耗并使用当筛子接近N时,增加长度(O(√N))。



如上所述,算法有一些简单的改进:


  1. 仅在处开始越过 p 的倍数,而不是 2 * p

  2. 消除筛子中的偶数

  3. 消除来自筛子的其他小质数的倍数

这些都不会降低算法的复杂性,但它们都会大幅减少常数因子(与试验部门一样,消除 p 的倍数会导致较大的 p 加速,同时增加代码复杂性比较小的 p )。



使用前两个改进产生

  //数组中的条目k表示数字2 * k + 3,所以我们必须做
//一些算术来使索引正确。
public static int nthPrime(int n){
if(n< 2)return 2;
如果(n == 2)返回3;
int limit,root,count = 1;
limit =(int)(n *(Math.log(n)+ Math.log(Math.log(n))))+ 3;
root =(int)Math.sqrt(limit)+ 1;
limit =(limit-1)/ 2;
root = root / 2 - 1;
boolean [] sieve = new boolean [limit];
for(int i = 0; i< root; ++ i){
if(!sieve [i]){
++ count;
for(int j = 2 * i *(i + 3)+ 3,p = 2 * i + 3; j sieve [j] = true ;
}
}
}
int p;
for(p = root; count< n; ++ p){
if(!sieve [p]){
++ count;
}
}
返回2 * p + 1;
}

在大约18秒内找到第亿个素数,2038074743。这个时间可以减少到大约15秒(这里是YMMV),通过存储打包的标志,每个标志一位,而不是布尔 s,因为减少了内存使用量更好的缓存局部性。



打包标志,消除3的倍数并使用bit-twiddling来加快计数速度,

  //计算int 
中的设置位数public static int popCount(int n){
n - =(n>>> 1 )& 0x55555555;
n =((n>>> 2)& 0x33333333)+(n& 0x33333333);
n =((n>>> 4)& 0x0F0F0F0F)+(n& 0x0F0F0F0F);
return(n * 0x01010101)>> 24;
}

//通过计算
//数组插槽的素数来加速计数,而不是单独计算。这产生
//另一个因素约为1.24左右。
public static int nthPrime(int n){
if(n< 2)return 2;
如果(n == 2)返回3;
如果(n == 3)返回5;
int limit,root,count = 2;
limit =(int)(n *(Math.log(n)+ Math.log(Math.log(n))))+ 3;
root =(int)Math.sqrt(limit);
开关(限制%6){
情况0:
限制= 2 *(限制/ 6) - 1;
休息;
案例5:
limit = 2 *(limit / 6)+ 1;
休息;
默认值:
limit = 2 *(limit / 6);
}
switch(root%6){
case 0:
root = 2 *(root / 6) - 1;
休息;
案例5:
root = 2 *(root / 6)+ 1;
休息;
默认值:
root = 2 *(root / 6);
}
int dim =(limit + 31)>> 5;
int [] sieve = new int [dim];
for(int i = 0; i< root; ++ i){
if((sieve [i>> 5]&(1<<(i& 31) ))== 0){
int start,s1,s2;
if((i& 1)== 1){
start = i *(3 * i + 8)+4;
s1 = 4 * i + 5;
s2 = 2 * i + 3;
} else {
start = i *(3 * i + 10)+7;
s1 = 2 * i + 3;
s2 = 4 * i + 7;
}
for(int j = start; j< limit; j + = s2){
sieve [j>> 5] | = 1<< (J&安培; 31);
j + = s1;
if(j> = limit)break;
筛[j>> 5] | = 1<< (J&安培; 31);
}
}
}
int i;
for(i = 0; count< n; ++ i){
count + = popCount(~sieve [i]);
}
--i;
int mask = ~sieve [i];
int p;
for(p = 31; count> = n; --p){
count - =(mask>> p)& 1;
}
返回3 *(p +(i << 5))+ 7+(p& 1);
}

在大约9秒钟内找到第亿个素数,这不是无法忍受的长。



还有其他类型的素子筛,特别感兴趣的是阿特金的筛子,它利用了某些同理类(理性)素数是复合物的事实。 ℚ的一些二次扩张的代数整数环。这里不是扩展数学理论的地方,只要说Atkin的Sieve比Eratosthenes的Sieve具有更低的算法复杂度,因此对于大的限制是优选的(对于小的限制,没有过度优化的Atkin筛具有更高的限制开销因此可能比同等优化的Eratosthenes筛更慢。
DJ Bernstein的 primegen 库(用C语言编写)针对2以下的数字进行了优化sup> 32 并在大约1.1秒内找到第亿个素数(此处)。



快速方式



如果我们只想找到 n th prime,那么找到所有较小的素数也没有内在价值。如果我们可以跳过大部分内容,我们可以节省大量时间和工作。给予 a(n) n th prime <$ c $的良好近似值c> p(n),如果我们有快速的方法来计算素数π(a(n))不超过 a(n),然后我们可以在 a(n)之上或之下筛选一小部分来识别少数缺失或多余的 a(n) p(n)之间的素数。



我们已经看到一个容易计算的相当好的近似 p(n),我们可以采取

  a(n)= n *(log n + log(log n))

例如。



计算π(x)的好方法是 Meissel-Lehmer方法,计算 π(x)大致 O(x ^ 0.7)时间(确切的复杂程度取决于实施,Lagarias,Miller,Odlyzko的改进) ,Deléglise和Riv at让我们在O(x 2/3 /log²x)时计算π(x)



从简单逼近 a(n)开始,我们计算 e(n)=π(a(n)) - n 。根据素数定理, a(n)附近的素数密度约为 1 / log a(n) ,所以我们希望 p(n)接近 b(n)= a(n) - log a(n)* e(n) 我们筛选的范围略大于 log a(n)* e(n)。为了更有信心 p(n)处于筛分范围内,可以将范围增加2倍,比如几乎肯定会足够大。如果范围看起来太大,可以用更好的近似值 b(n)代替 a(n),计算π(b(n)) f(n)=π((b(n)) - n 。通常, | f(n)| 将远小于 | e(n)| f(n)约为 -e(n) c(n)=( a(n)+ b(n))/ 2 将更好地逼近 p(n)。仅在非常不可能的情况下 f(n)非常接近 e(n)(并且非常接近0),找到足够的与 p(n)的良好近似,最终筛分阶段可以在时间上与计算π(a(n))成为一个问题。



一般来说,经过初步近似的一两次改进后,筛分范围足够小,筛分阶段就可以了O(n ^ 0.75)或更好的复杂度。



此方法在大约40毫秒内找到第亿个素数,并且10 12 -th prime,29996224275833,在8秒内。






tl;dr: Finding the nth prime can be efficiently done, but the more efficient you want it, the more mathematics is involved.






I have Java code for most of the discussed algorithms prepared here, in case somebody wants to play around with them.





$b $b

¹ Aside remark for overinterested souls: The definition of primes used in modern mathematics is different, applicable in much more general situations. If we adapt the school definition to include negative numbers - so a number is prime if it’s neither 1 nor -1 and divisible only by 1, -1, itself and its negative - that defines (for integers) what is nowadays called an irreducible element of ℤ, however, for integers, the definitions of prime and irreducible elements coincide.


I am trying to calculate prime numbers, which I've already done. But I want to calculate and print ONLY the nth prime number (User input), while calculating the rest (They won't be printed) only the nth prime number will be printed.

Here's what I've written so far:

import java.util.Scanner;
/**
 * Calculates the nth prime number
 * @author {Zyst}
 */
public class Prime {
    public static void main(String[] args) {

        Scanner input = new Scanner(System.in);
        int n, 
            i = 2, 
            x = 2;

        System.out.printf("This program calculates the nth Prime number\n");
        System.out.printf("Please enter the nth prime number you want to find: ");
        n = input.nextInt();

        for(i = 2, x = 2; i <= n; i++) {
            for(x = 2; x < i; x++) {
                if(i % x == 0) {
                    break;
                }
            }
            if(x == i) {
                System.out.printf("\n%d is prime", x);
            }
        }
    }
}

This is the program I wrote to calculate the prime numbers from 1 to n. However, I want it to only print the nth prime number,

What I've thought of doing is making some sort of count int and ++ing it every time it finds a prime, and when the count == n then it prints out that number, but I can't quite figure out how to land it.

解决方案

To calculate the n-th prime, I know two main variants.

The straightforward way

That is to count all the primes starting from 2 as you find them until you have reached the desired nth.

This can be done with different levels of sophistication and efficiency, and there are two conceptually different ways to do it. The first is

Testing the primality of all numbers in sequence

This would be accomplished by a driver function like

public static int nthPrime(int n) {
    int candidate, count;
    for(candidate = 2, count = 0; count < n; ++candidate) {
        if (isPrime(candidate)) {
            ++count;
        }
    }
    // The candidate has been incremented once after the count reached n
    return candidate-1;
}

and the interesting part that determines the efficiency is the isPrime function.

The obvious way for a primality check, given the definition of a prime as a number greater than 1 that is divisible only by 1 and by itself that we learned in school¹, is

Trial division

The direct translation of the definition into code is

private static boolean isPrime(int n) {
    for(int i = 2; i < n; ++i) {
        if (n % i == 0) {
            // We are naive, but not stupid, if
            // the number has a divisor other
            // than 1 or itself, we return immediately.
            return false;
        }
    }
    return true;
}

but, as you will soon discover if you try it, its simplicity is accompanied by slowness. With that primality test, you can find the 1000th prime, 7919, in a few milliseconds (about 20 on my computer), but finding the 10000th prime, 104729, takes seconds (~2.4s), the 100000th prime,1299709, several minutes (about 5), the millionth prime, 15485863, would take about eight and a half hours, the ten-millionth prime, 179424673, weeks, and so on. The runtime complexity is worse than quadratic - Θ(n² * log n).

So we'd like to speed the primality test up somewhat. A step that many people take is the realisation that a divisor of n (other than n itself) can be at most n/2. If we use that fact and let the trial division loop only run to n/2 instead of n-1, how does the running time of the algorithm change? For composite numbers, the lower loop limit doesn't change anything. For primes, the number of trial divisions is halved, so overall, the running time should be reduced by a factor somewhat smaller than 2. If you try it out, you will find that the running time is almost exactly halved, so almost all the time is spent verifying the primality of primes despite there being many more composites than primes.

Now, that didn't help much if we want to find the one-hundred-millionth prime, so we have to do better. Trying to reduce the loop limit further, let us see for what numbers the upper bound of n/2 is actually needed. If n/2 is a divisor of n, then n/2 is an integer, in other words, n is divisible by 2. But then the loop doesn't go past 2, so it never (except for n = 4) reaches n/2. Jolly good, so what's the next largest possible divisor of n? Why, n/3 of course. But n/3 can only be a divisor of n if it is an integer, in other words, if n is divisible by 3. Then the loop will exit at 3 (or before, at 2) and never reach n/3 (except for n = 9). The next largest possible divisor ...

Hang on a minute! We have 2 <-> n/2 and 3 <-> n/3. The divisors of n come in pairs.

If we consider the pair (d, n/d) of corresponding divisors of n, either d = n/d, i.e. d = √n, or one of them, say d, is smaller than the other. But then d*d < d*(n/d) = n and d < √n. Each pair of corresponding divisors of n contains (at least) one which does not exceed √n.

If n is composite, its smallest nontrivial divisor does not exceed √n.

So we can reduce the loop limit to √n, and that reduces the runtime complexity of the algorithm. It should now be Θ(n1.5 * √(log n)), but empirically it seems to scale a little bit better - however, there's not enough data to draw reliable conclusions from empirical results.

That finds the millionth prime in about 16 seconds, the ten-millionth in just under nine minutes, and it would find the one-hundred-millionth in about four and a half hours. That's still slow, but a far cry from the ten years or so it would take the naive trial division.

Since there are squares of primes and products of two close primes, like 323 = 17*19, we cannot reduce the limit for the trial division loop below √n. Therefore, while staying with trial division, we must look for other ways to improve the algorithm now.

One easily seen thing is that no prime other than 2 is even, so we need only check odd numbers after we have taken care of 2. That doesn't make much of a difference, though, since the even numbers are the cheapest to find composite - and the bulk of time is still spent verifying the primality of primes. However, if we look at the even numbers as candidate divisors, we see that if n is divisible by an even number, n itself must be even, so (excepting 2) it will have been recognised as composite before division by any even number greater than 2 is attempted. So all divisions by even numbers greater than 2 that occur in the algorithm must necessarily leave a nonzero remainder. We can thus omit these divisions and check for divisibility only by 2 and the odd numbers from 3 to √n. This halves (not quite exactly) the number of divisions required to determine a number as prime or composite and therefore the running time. That's a good start, but can we do better?

Another large family of numbers is the multiples of 3. Every third division we perform is by a multiple of 3, but if n is divisible by one of them, it is also divisible by 3, and hence no division by 9, 15, 21, ... that we perform in our algorithm will ever leave a remainder of 0. So, how can we skip these divisions? Well, the numbers divisible by neither 2 nor 3 are precisely the numbers of the form 6*k ± 1. Starting from 5 (since we're only interested in numbers greater than 1), they are 5, 7, 11, 13, 17, 19, ..., the step from one to the next alternates between 2 and 4, which is easy enough, so we can use

private static boolean isPrime(int n) {
    if (n % 2 == 0) return n == 2;
    if (n % 3 == 0) return n == 3;
    int step = 4, m = (int)Math.sqrt(n) + 1;
    for(int i = 5; i < m; step = 6-step, i += step) {
        if (n % i == 0) {
            return false;
        }
    }
    return true;
}

This gives us another speedup by a factor of (nearly) 1.5, so we'd need about one and a half hours to the hundred-millionth prime.

If we continue this route, the next step is the elimination of multiples of 5. The numbers coprime to 2, 3 and 5 are the numbers of the form

30*k + 1, 30*k + 7, 30*k + 11, 30*k + 13, 30*k + 17, 30*k + 19, 30*k + 23, 30*k + 29

so we'd need only divide by eight out of every thirty numbers (plus the three smallest primes). The steps from one to the next, starting from 7, cycle through 4, 2, 4, 2, 4, 6, 2, 6. That's still easy enough to implement and yields another speedup by a factor of 1.25 (minus a bit for more complicated code). Going further, the multiples of 7 would be eliminated, leaving 48 out of every 210 numbers to divide by, then 11 (480/2310), 13 (5760/30030) and so on. Each prime p whose multiples are eliminated yields a speedup of (almost) p/(p-1), so the return decreases while the cost (code complexity, space for the lookup table for the steps) increases with each prime.

In general, one would stop soonish, after eliminating the multiples of maybe six or seven primes (or even fewer). Here, however, we can follow through to the very end, when the multiples of all primes have been eliminated and only the primes are left as candidate divisors. Since we are finding all primes in order, each prime is found before it is needed as a candidate divisor and can then be stored for future use. This reduces the algorithmic complexity to - if I haven't miscalculated - O(n1.5 / √(log n)). At the cost of space usage for storing the primes.

With trial division, that is as good as it gets, you have to try and divide by all primes to √n or the first dividing n to determine the primality of n. That finds the hundred-millionth prime in about half an hour here.

So how about

Fast primality tests

Primes have other number-theoretic properties than the absence of nontrivial divisors which composite numbers usually don't have. Such properties, if they are fast to check, can form the basis of probabilistic or deterministic primality tests. The archetypical such property is associated with the name of Pierre de Fermat, who, in the early 17th century, found that

If p is a prime, then p is a divisor of (ap-a) for all a.

This - Fermat's so-called 'little theorem' - is, in the equivalent formulation

Let p be a prime and a not divisible by p. Then p divides ap-1 - 1.

the basis of most of the widespread fast primality tests (for example Miller-Rabin) and variants or analogues of that appear in even more (e.g. Lucas-Selfridge).

So if we want to know if a not too small odd number n is a prime (even and small numbers are efficiently treated by trial division), we can choose any number a (> 1) which is not a multiple of n, for example 2, and check whether n divides an-1 - 1. Since an-1 becomes huge, that is most efficiently done by checking whether a^(n-1) ≡ 1 (mod n), i.e. by modular exponentiation. If that congruence doesn't hold, we know that n is composite. If it holds, however, we cannot conclude that n is prime, for example 2^340 ≡ 1 (mod 341), but 341 = 11 * 31 is composite. Composite numbers n such that a^(n-1) ≡ 1 (mod n) are called Fermat pseudoprimes for the base a.

But such occurrences are rare. Given any base a > 1, although there are an infinite number of Fermat pseudoprimes to base a, they are much rarer than actual primes. For example, there are only 78 base-2 Fermat pseudoprimes and 76 base-3 Fermat pseudoprimes below 100000, but 9592 primes. So if one chooses an arbitrary odd n > 1 and an arbitrary base a > 1 and finds a^(n-1) ≡ 1 (mod n), there's a good chance that n is actually prime.

However, we are in a slightly different situation, we are given n and can only choose a. So, for an odd composite n, for how many a, 1 < a < n-1 can a^(n-1) ≡ 1 (mod n) hold? Unfortunately, there are composite numbers - Carmichael numbers - such that the congruence holds for every a coprime to n. That means that to identify a Carmichael number as composite with the Fermat test, we have to pick a base that is a multiple of one of n's prime divisors - there may not be many such multiples.

But we can strengthen the Fermat test so that composites are more reliably detected. If p is an odd prime, write p-1 = 2*m. Then, if 0 < a < p,

a^(p-1) - 1 = (a^m + 1) * (a^m - 1)

and p divides exactly one of the two factors (the two factors differ by 2, so their greatest common divisor is either 1 or 2). If m is even, we can split a^m - 1 in the same way. Continuing, if p-1 = 2^s * k with k odd, write

a^(p-1) - 1 = (a^(2^(s-1)*k) + 1) * (a^(2^(s-2)*k) + 1) * ... * (a^k + 1) * (a^k - 1)

then p divides exactly one of the factors. This gives rise to the strong Fermat test,

Let n > 2 be an odd number. Write n-1 = 2^s * k with k odd. Given any a with 1 < a < n-1, if

  1. a^k ≡ 1 (mod n) or
  2. a^((2^j)*k) ≡ -1 (mod n) for any j with 0 <= j < s

then n is a strong (Fermat) probable prime for base a. A composite strong base a (Fermat) probable prime is called a strong (Fermat) pseudoprime for the base a. Strong Fermat pseudoprimes are even rarer than ordinary Fermat pseudoprimes, below 1000000, there are 78498 primes, 245 base-2 Fermat pseudoprimes and only 46 base-2 strong Fermat pseudoprimes. More importantly, for any odd composite n, there are at most (n-9)/4 bases 1 < a < n-1 for which n is a strong Fermat pseudoprime.

So if n is an odd composite, the probability that n passes k strong Fermat tests with randomly chosen bases between 1 and n-1 (exclusive bounds) is less than 1/4^k.

A strong Fermat test takes O(log n) steps, each step involves one or two multiplications of numbers with O(log n) bits, so the complexity is O((log n)^3) with naive multiplication [for huge n, more sophisticated multiplication algorithms can be worthwhile].

The Miller-Rabin test is the k-fold strong Fermat test with randomly chosen bases. It is a probabilistic test, but for small enough bounds, short combinations of bases are known which give a deterministic result.

Strong Fermat tests are part of the deterministic APRCL test.

It is advisable to precede such tests with trial division by the first few small primes, since divisions are comparatively cheap and that weeds out most composites.

For the problem of finding the nth prime, in the range where testing all numbers for primality is feasible, there are known combinations of bases that make the multiple strong Fermat test correct, so that would give a faster - O(n*(log n)4) - algorithm.

For n < 2^32, the bases 2, 7, and 61 are sufficient to verify primality. Using that, the hundred-millionth prime is found in about six minutes.

Eliminating composites by prime divisors, the Sieve of Eratosthenes

Instead of investigating the numbers in sequence and checking whether each is prime from scratch, one can also consider the whole set of relevant numbers as one piece and eliminate the multiples of a given prime in one go. This is known as the Sieve of Eratosthenes:

To find the prime numbers not exceeding N

  1. make a list of all numbers from 2 to N
  2. for each k from 2 to N: if k is not yet crossed off, it is prime; cross off all multiples of k as composites

The primes are the numbers in the list which aren't crossed off.

This algorithm is fundamentally different from trial division, although both directly use the divisibility characterisation of primes, in contrast to the Fermat test and similar tests which use other properties of primes.

In trial division, each number n is paired with all primes not exceeding the smaller of √n and the smallest prime divisor of n. Since most composites have a very small prime divisor, detecting composites is cheap here on average. But testing primes is expensive, since there are relatively many primes below √n. Although there are many more composites than primes, the cost of testing primes is so high that it completely dominates the overall running time and renders trial division a relatively slow algorithm. Trial division for all numbers less than N takes O(N1.5 / (log N)²) steps.

In the sieve, each composite n is paired with all of its prime divisors, but only with those. Thus there the primes are the cheap numbers, they are only ever looked at once, while the composites are more expensive, they are crossed off multiple times. One might believe that since a sieve contains many more 'expensive' numbers than 'cheap' ones, it would overall be a bad algorithm. However, a composite number does not have many distinct prime divisors - the number of distinct prime divisors of n is bounded by log n, but usually it is much smaller, the average of the number of distinct prime divisors of the numbers <= n is log log n - so even the 'expensive' numbers in the sieve are on average no more (or hardly more) expensive than the 'cheap' numbers for trial division.

Sieving up to N, for each prime p, there are Θ(N/p) multiples to cross off, so the total number of crossings-off is Θ(∑ (N/p)) = Θ(N * log (log N)). This yields much faster algorithms for finding the primes up to N than trial division or sequential testing with the faster primality tests.

There is, however, a disadvantage to the sieve, it uses O(N) memory. (But with a segmented sieve, that can be reduced to O(√N) without increasing the time complexity.)

For finding the nth prime, instead of the primes up to N, there is also the problem that it is not known beforehand how far the sieve should reach.

The latter can be solved using the prime number theorem. The PNT says

π(x) ~ x/log x (equivalently: lim π(x)*log x/x = 1),

where π(x) is the number of primes not exceeding x (here and below, log must be the natural logarithm, for the algorithmic complexities it is not important which base is chosen for the logarithms). From that, it follows that p(n) ~ n*log n, where p(n) is the nth prime, and there are good upper bounds for p(n) known from deeper analysis, in particular

n*(log n + log (log n) - 1) < p(n) < n*(log n + log (log n)), for n >= 6.

So one can use that as the sieving limit, it doesn't exceed the target far.

The O(N) space requirement can be overcome by using a segmented sieve. One can then record the primes below √N for O(√N / log N) memory consumption and use segments of increasing length (O(√N) when the sieve is near N).

There are some easy improvements on the algorithm as stated above:

  1. start crossing off multiples of p only at , not at 2*p
  2. eliminate the even numbers from the sieve
  3. eliminate the multiples of further small primes from the sieve

None of these reduce the algorithmic complexity, but they all reduce the constant factors by a significant amount (as with trial division, the elimination of multiples of p yields lesser speedup for larger p while increasing the code complexity more than for smaller p).

Using the first two improvements yields

// Entry k in the array represents the number 2*k+3, so we have to do
// a bit of arithmetic to get the indices right.
public static int nthPrime(int n) {
    if (n < 2) return 2;
    if (n == 2) return 3;
    int limit, root, count = 1;
    limit = (int)(n*(Math.log(n) + Math.log(Math.log(n)))) + 3;
    root = (int)Math.sqrt(limit) + 1;
    limit = (limit-1)/2;
    root = root/2 - 1;
    boolean[] sieve = new boolean[limit];
    for(int i = 0; i < root; ++i) {
        if (!sieve[i]) {
            ++count;
            for(int j = 2*i*(i+3)+3, p = 2*i+3; j < limit; j += p) {
                sieve[j] = true;
            }
        }
    }
    int p;
    for(p = root; count < n; ++p) {
        if (!sieve[p]) {
            ++count;
        }
    }
    return 2*p+1;
}

which finds the hundred-millionth prime, 2038074743, in about 18 seconds. This time can be reduced to about 15 seconds (here, YMMV) by storing the flags packed, one bit per flag, instead of as booleans, since the reduced memory usage gives better cache locality.

Packing the flags, eliminating also multiples of 3 and using bit-twiddling for faster faster counting,

// Count number of set bits in an int
public static int popCount(int n) {
    n -= (n >>> 1) & 0x55555555;
    n = ((n >>> 2) & 0x33333333) + (n & 0x33333333);
    n = ((n >> 4) & 0x0F0F0F0F) + (n & 0x0F0F0F0F);
    return (n * 0x01010101) >> 24;
}

// Speed up counting by counting the primes per
// array slot and not individually. This yields
// another factor of about 1.24 or so.
public static int nthPrime(int n) {
    if (n < 2) return 2;
    if (n == 2) return 3;
    if (n == 3) return 5;
    int limit, root, count = 2;
    limit = (int)(n*(Math.log(n) + Math.log(Math.log(n)))) + 3;
    root = (int)Math.sqrt(limit);
    switch(limit%6) {
        case 0:
            limit = 2*(limit/6) - 1;
            break;
        case 5:
            limit = 2*(limit/6) + 1;
            break;
        default:
            limit = 2*(limit/6);
    }
    switch(root%6) {
        case 0:
            root = 2*(root/6) - 1;
            break;
        case 5:
            root = 2*(root/6) + 1;
            break;
        default:
            root = 2*(root/6);
    }
    int dim = (limit+31) >> 5;
    int[] sieve = new int[dim];
    for(int i = 0; i < root; ++i) {
        if ((sieve[i >> 5] & (1 << (i&31))) == 0) {
            int start, s1, s2;
            if ((i & 1) == 1) {
                start = i*(3*i+8)+4;
                s1 = 4*i+5;
                s2 = 2*i+3;
            } else {
                start = i*(3*i+10)+7;
                s1 = 2*i+3;
                s2 = 4*i+7;
            }
            for(int j = start; j < limit; j += s2) {
                sieve[j >> 5] |= 1 << (j&31);
                j += s1;
                if (j >= limit) break;
                sieve[j >> 5] |= 1 << (j&31);
            }
        }
    }
    int i;
    for(i = 0; count < n; ++i) {
        count += popCount(~sieve[i]);
    }
    --i;
    int mask = ~sieve[i];
    int p;
    for(p = 31; count >= n; --p) {
        count -= (mask >> p) & 1;
    }
    return 3*(p+(i<<5))+7+(p&1);
}

finds the hundred-millionth prime in about 9 seconds, which is not unbearably long.

There are other types of prime sieves, of particular interest is the Sieve of Atkin, which exploits the fact that certain congruence classes of (rational) primes are composites in the ring of algebraic integers of some quadratic extensions of ℚ. Here is not the place to expand on the mathematical theory, suffice it to say that the Sieve of Atkin has lower algorithmic complexity than the Sieve of Eratosthenes and hence is preferable for large limits (for small limits, a not overly optimised Atkin sieve has higher overhead and thus can be slower than a comparably optimised Eratosthenes sieve). D. J. Bernstein's primegen library (written in C) is well optimised for numbers below 232 and finds the hundred-millionth prime (here) in about 1.1 seconds.

The fast way

If we only want to find the nth prime, there is no intrinsic value in also finding all the smaller primes. If we can skip most of them, we can save a lot of time and work. Given a good approximation a(n) to the nth prime p(n), if we have a fast way to calculate the number of primes π(a(n)) not exceeding a(n), we can then sieve a small range above or below a(n) to identify the few missing or excess primes between a(n) and p(n).

We have seen an easily computed fairly good approximation to p(n) above, we could take

a(n) = n*(log n + log (log n))

for example.

A good method to compute π(x) is the Meissel-Lehmer method, which computes π(x) in roughly O(x^0.7) time (the exact complexity depends on the implementation, a refinement by Lagarias, Miller, Odlyzko, Deléglise and Rivat lets one compute π(x) in O(x2/3 / log² x) time).

Starting with the simple approximation a(n), we compute e(n) = π(a(n)) - n. By the prime number theorem, the density of primes near a(n) is about 1/log a(n), so we expect p(n) to be near b(n) = a(n) - log a(n)*e(n) and we would sieve a range slightly larger than log a(n)*e(n). For greater confidence that p(n) is in the sieved range, one can increase the range by a factor of 2, say, which almost certainly will be large enough. If the range seems too large, one can iterate with the better approximation b(n) in place of a(n), compute π(b(n)) and f(n) = π((b(n)) - n. Typically, |f(n)| will be much smaller than |e(n)|. If f(n) is approximately -e(n), c(n) = (a(n) + b(n)) / 2 will be a better approximation to p(n). Only in the very unlikely case that f(n) is very close to e(n) (and not very close to 0), finding a sufficiently good approximation to p(n) that the final sieving stage can be done in time comparable to computing π(a(n)) becomes a problem.

In general, after one or two improvements to the initial approximation, the range to be sieved is small enough for the sieving stage to have a complexity of O(n^0.75) or better.

This method finds the hundred-millionth prime in about 40 milliseconds, and the 1012-th prime, 29996224275833, in under eight seconds.


tl;dr: Finding the nth prime can be efficiently done, but the more efficient you want it, the more mathematics is involved.


I have Java code for most of the discussed algorithms prepared here, in case somebody wants to play around with them.


¹ Aside remark for overinterested souls: The definition of primes used in modern mathematics is different, applicable in much more general situations. If we adapt the school definition to include negative numbers - so a number is prime if it's neither 1 nor -1 and divisible only by 1, -1, itself and its negative - that defines (for integers) what is nowadays called an irreducible element of ℤ, however, for integers, the definitions of prime and irreducible elements coincide.

这篇关于计算并打印第n个素数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆