Primesk在Haskell [英] Prime Sieve in Haskell

查看:129
本文介绍了Primesk在Haskell的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对哈斯克尔很新,我只是想找到前200万个素数的总和。我试图使用筛子生成素数(我认为Eratosthenes的筛子?),但它确实非常慢,我不知道为什么。这是我的代码。

  sieve(x:xs)= x:(sieve $ filter(\ a  - > a`mod` x / = 0)xs)
ans = sum $ takeWhile(< 2000000)(sieve [2 ..])


$

解决方案

这是非常缓慢的,因为算法是一个试用部门,不如果仔细观察算法的作用,你会发现对于每个素数 p ,其没有较小素数因子的倍数将从候选名单中删除(以前除去具有较小素数因子的倍数)。

所以每个数字除以全部直到它作为其最小素数除数的倍数被移除,或者如果它是素数,它出现在其余候选者列表的头部。

对于复合数字,这并不是特别糟糕,因为大多数复合数字具有小的素数因子,并且在最坏的情况下是最小的pri我的 n 的除数不超过√n



<但是质数除以所有质数较小的素数,所以直到第k个素数被发现为素数时,它已被除以所有 k-1 更小的素数。如果 m 素数低于限制 n ,那么找到所有素数的工作就是

(1-1)+(2-1)+(3-1)+ ... +(m-1)= m * b
$ b

  (m-1)/ 2 

部门。根据素数定理,低于 n 是渐近地 n / log n (其中 log 表示自然对数)。消除复合材料的工作可以粗略地限制在 n *√n部门之间,所以对于不小于 n 的部分与花在素数上的工作相比,可以忽略不计。

对于200万的素数,特纳筛需要大约10个10分。此外,它需要解构和重构很多列表单元。



一个停在平方根处的试用部门,

  isPrime n = go 2 
where
go d
| d * d> n =真
| n`rem` d == 0 = False
|否则=去(d + 1)

素数=过滤isPrime [2 ..]


$ b $如果 isPrime n 检查,b

将需要少于1.9 * 10 9个分区√n - 实际上,它只需要179492732,因为复合材料通常便宜)(1)和少得多的列表操作。此外,通过跳过偶数(除了 2 )作为候选除数,将所需部门的数量减半,这个试用部门很容易实现。



Eratosthenes的筛选不需要任何分割,只使用 O(n * log(log n))操作,这比以前快了很多:



primeSum.hs

  module Main(main)where 

import System.Environment(getArgs)
import Math.NumberTheory.Primes

main :: IO ()
main = do
args< - getArgs
let lim = case $ args of
(a:_) - >阅读
_ - > 1000000
印刷。 sum $ takeWhile(< = lim)素数

并且运行它的限制为1000万:

  $ ghc -O2 primeSum&& time ./primeSum 10000000 
[1的1]编译Main(primeSum.hs,primeSum.o)
链接primeSum ...
3203324994356

real 0m0。 085s
用户0m0.084s
sys 0m0.000s

我们让试用部门只运行到100万(固定类型为 Int ):

  $ ghc -O2 tdprimeSum&&时间./tdprimeSum 1000000 
[1的1]编译主(tdprimeSum.hs,tdprimeSum.o)
链接tdprimeSum ...
37550402023

real 0m0。 768s
用户0m0.765s
sys 0m0.002s

而特纳筛只有100000:

  $ ghc -O2 tuprimeSum&&时间./tuprimeSum 100000 
[1的1]编译主要(tuprimeSum.hs,tuprimeSum.o)
链接tuprimeSum ...
454396537

真实0m2。 712s
用户0m2.703s
sys 0m0.005s






(1)残酷的估计是:

  2000000 
Σ√k≈4/3 *√2* 10 ^ 9
k = 1

评估到两位有效数字。由于大多数数据是具有小素数因素的复合数据 - 一半的数字是偶数并且只用一个分数 - 这大大高估了所需的分部数量。

只需考虑素数即可获得所需的分割数:

pre code>Σ√p≈2/3 * N ^ 1.5 / log N
p< N
p prime

其中, N = 2000000 给出了大约1.3 * 10 8 。这是正确的数量级,但低估了一个不平凡的因素(对于 N 增长缓慢下降到1,并且对于 N>从不大于2 ; 10 )。



除了素数之外,素数的平方和两个紧密素数的乘积还需要试划分(几乎)√k,因此如果有足够多的工作,对整体工作做出显着贡献。

然而,处理半压缩所需的分部却是以

  N ^ 1.5 /(log N)^ 2 

因此对于非常大的 N ,它变得可以忽略不计治疗素数的成本。但在试行可行的范围内,它们仍然有很大贡献。

I'm very new to Haskell and I'm just trying to find the sum of the first 2 million primes. I'm trying to generate the primes using a sieve (I think the sieve of Eratosthenes?), but it's really really slow and I don't know why. Here is my code.

sieve (x:xs) = x:(sieve $ filter (\a -> a `mod` x /= 0) xs)
ans = sum $ takeWhile (<2000000) (sieve [2..])

Thanks in advance.

解决方案

It is very slow because the algorithm is a trial division that doesn't stop at the square root.

If you look closely what the algorithm does, you see that for each prime p, its multiples that have no smaller prime divisors are removed from the list of candidates (the multiples with smaller prime divisors were removed previously).

So each number is divided by all primes until either it is removed as a multiple of its smallest prime divisor or it appears at the head of the list of remaining candidates if it is a prime.

For the composite numbers, that isn't particularly bad, since most composite numbers have small prime divisors, and in the worst case, the smallest prime divisor of n doesn't exceed √n.

But the primes are divided by all smaller primes, so until the kth prime is found to be prime, it has been divided by all k-1 smaller primes. If there are m primes below the limit n, the work needed to find all of them prime is

(1-1) + (2-1) + (3-1) + ... + (m-1) = m*(m-1)/2

divisions. By the Prime number theorem, the number of primes below n is asymptotically n / log n (where log denotes the natural logarithm). The work to eliminate the composites can crudely be bounded by n * √n divisions, so for not too small n that is negligible in comparison to the work spent on the primes.

For the primes to two million, the Turner sieve needs roughly 1010 divisions. Additionally, it needs to deconstruct and reconstruct a lot of list cells.

A trial division that stops at the square root,

isPrime n = go 2
  where
    go d
      | d*d > n        = True
      | n `rem` d == 0 = False
      | otherwise      = go (d+1)

primes = filter isPrime [2 .. ]

would need fewer than 1.9*109 divisions (brutal estimate if every isPrime n check went to √n - actually, it takes only 179492732 because composites are generally cheap)(1) and much fewer list operations. Additionally, this trial division is easily improvable by skipping even numbers (except 2) as candidate divisors, which halves the number of required divisions.

A sieve of Eratosthenes doesn't need any divisions and uses only O(n * log (log n)) operations, that is quite a bit faster:

primeSum.hs:

module Main (main) where

import System.Environment (getArgs)
import Math.NumberTheory.Primes

main :: IO ()
main = do
    args <- getArgs
    let lim = case args of
                (a:_) -> read a
                _     -> 1000000
    print . sum $ takeWhile (<= lim) primes

And running it for a limit of 10 million:

$ ghc -O2 primeSum && time ./primeSum 10000000
[1 of 1] Compiling Main             ( primeSum.hs, primeSum.o )
Linking primeSum ...
3203324994356

real    0m0.085s
user    0m0.084s
sys     0m0.000s

We let the trial division run only to 1 million (fixing the type as Int):

$ ghc -O2 tdprimeSum && time ./tdprimeSum 1000000
[1 of 1] Compiling Main             ( tdprimeSum.hs, tdprimeSum.o )
Linking tdprimeSum ...
37550402023

real    0m0.768s
user    0m0.765s
sys     0m0.002s

And the Turner sieve only to 100000:

$ ghc -O2 tuprimeSum && time ./tuprimeSum 100000
[1 of 1] Compiling Main             ( tuprimeSum.hs, tuprimeSum.o )
Linking tuprimeSum ...
454396537

real    0m2.712s
user    0m2.703s
sys     0m0.005s


(1) The brutal estimate is

2000000
   ∑ √k ≈ 4/3*√2*10^9
 k = 1

evaluated to two significant digits. Since most numbers are composites with a small prime factor - half the numbers are even and take only one division - that vastly overestimates the number of divisions required.

A lower bound for the number of required divisions would be obtained by considering primes only:

   ∑ √p ≈ 2/3*N^1.5/log N
 p < N
p prime

which, for N = 2000000 gives roughly 1.3*108. That is the right order of magnitude, but underestimates by a nontrivial factor (decreasing slowly to 1 for growing N, and never greater than 2 for N > 10).

Besides the primes, also the squares of primes and the products of two close primes require the trial division to go up to (nearly) √k and hence contribute significantly to the overall work if there are sufficiently many.

The number of divisions needed to treat the semiprimes is however bounded by a constant multiple of

N^1.5/(log N)^2

so for very large N it becomes negligible relative to the cost of treating primes. But in the range where trial division is feasible at all, they still contribute significantly.

这篇关于Primesk在Haskell的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆