将优化的Eratosthenes筛从Python移植到C ++ [英] Porting optimized Sieve of Eratosthenes from Python to C++

查看:162
本文介绍了将优化的Eratosthenes筛从Python移植到C ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

前一段时间,我使用了(blazing fast)primesieve在python中,我发现这里:

Some time ago I used the (blazing fast) primesieve in python that I found here: Fastest way to list all primes below N in python

更确切地说,这个实现:

To be precise, this implementation:

def primes2(n):
    """ Input n>=6, Returns a list of primes, 2 <= p < n """
    n, correction = n-n%6+6, 2-(n%6>1)
    sieve = [True] * (n/3)
    for i in xrange(1,int(n**0.5)/3+1):
      if sieve[i]:
        k=3*i+1|1
        sieve[      k*k/3      ::2*k] = [False] * ((n/6-k*k/6-1)/k+1)
        sieve[k*(k-2*(i&1)+4)/3::2*k] = [False] * ((n/6-k*(k-2*(i&1)+4)/6-1)/k+1)
    return [2,3] + [3*i+1|1 for i in xrange(1,n/3-correction) if sieve[i]]

现在我可以通过自动跳过2,3等等的多次来略微理解优化的想法,但是当把这个算法移植到C ++时, (我对python有很好的理解,对C ++有一个合理的/不好的理解,但对于rock'n roll是足够好的)。

Now I can slightly grasp the idea of the optimizing by automaticly skipping multiples of 2, 3 and so on, but when it comes to porting this algorithm to C++ I get stuck (I have a good understanding of python and a reasonable/bad understanding of C++, but good enough for rock 'n roll).

isqrt()只是一个简单的整数平方根函数):

What I currently have rolled myself is this (isqrt() is just a simple integer square root function):

template <class T>
void primesbelow(T N, std::vector<T> &primes) {
    T sievemax = (N-3 + (1-(N % 2))) / 2;
    T i;
    T sievemaxroot = isqrt(sievemax) + 1;

    boost::dynamic_bitset<> sieve(sievemax);
    sieve.set();

    primes.push_back(2);

    for (i = 0; i <= sievemaxroot; i++) {
        if (sieve[i]) {
            primes.push_back(2*i+3);
            for (T j = 3*i+3; j <= sievemax; j += 2*i+3) sieve[j] = 0; // filter multiples
        }
    }

    for (; i <= sievemax; i++) {
        if (sieve[i]) primes.push_back(2*i+3);
    }
}

这个实现是体面的,会自动跳过2的倍数,但是如果我可以移植Python实现,我认为它可以更快(50%-30%左右)。

This implementation is decent and automatically skips multiples of 2, but if I could port the Python implementation I think it could be much faster (50%-30% or so).

要比较结果(希望这个问题将成功回答),当前执行时间与 N = 100000000 g ++ -O3 在Q6600 Ubuntu 10.10是1230ms。

To compare the results (in the hope this question will be successfully answered), the current execution time with N=100000000, g++ -O3 on a Q6600 Ubuntu 10.10 is 1230ms.

现在我会喜欢一些帮助,或者理解上面的Python实现,或者你将它移植给我(不是有用的)。

Now I would love some help with either understanding what the above Python implementation does or that you would port it for me (not as helpful though).

EDIT

有关我发现困难的额外资讯。

Some extra information about what I find difficult.

我对使用的技术(如校正变量)有一些麻烦,一般来说它是如何组合在一起的。一个链接到一个网站解释不同的Eratosthenes优化(除了简单的网站说,你只是跳过2,3和5的倍数,然后得到一个1000行C文件砰你)将是真棒。

I have trouble with the techniques used like the correction variable and in general how it comes together. A link to a site explaining different Eratosthenes optimizations (apart from the simple sites that say "well you just skip multiples of 2, 3 and 5" and then get slam you with a 1000 line C file) would be awesome.

我不认为我会有一个100%的直接和直接端口的问题,但因为毕竟这是为了学习,是完全没用的。

I don't think I would have issues with a 100% direct and literal port, but since after all this is for learning that would be utterly useless.

EDIT

查看原始numpy版本中的代码后,实际上很容易实现,有些思想不太难理解。这是我想出的C ++版本。我发布在这里的完整版本,以帮助更多的读者,如果他们需要一个非常有效的primesieve,不是两百万行代码。该primesieve在与上述相同的机器上在大约415ms内使所有原子低于100000000。这是一个3倍的加速,比我预期的好。

After looking at the code in the original numpy version, it actually is pretty easy to implement and with some thinking not too hard to understand. This is the C++ version I came up with. I'm posting it here in full version to help further readers in case they need a pretty efficient primesieve that is not two million lines of code. This primesieve does all primes under 100000000 in about 415 ms on the same machine as above. That's a 3x speedup, better then I expected!

#include <vector>
#include <boost/dynamic_bitset.hpp>

// http://vault.embedded.com/98/9802fe2.htm - integer square root
unsigned short isqrt(unsigned long a) {
    unsigned long rem = 0;
    unsigned long root = 0;

    for (short i = 0; i < 16; i++) {
        root <<= 1;
        rem = ((rem << 2) + (a >> 30));
        a <<= 2;
        root++;

        if (root <= rem) {
            rem -= root;
            root++;
        } else root--;

    }

    return static_cast<unsigned short> (root >> 1);
}

// http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
// http://stackoverflow.com/questions/5293238/porting-optimized-sieve-of-eratosthenes-from-python-to-c/5293492
template <class T>
void primesbelow(T N, std::vector<T> &primes) {
    T i, j, k, l, sievemax, sievemaxroot;

    sievemax = N/3;
    if ((N % 6) == 2) sievemax++;

    sievemaxroot = isqrt(N)/3;

    boost::dynamic_bitset<> sieve(sievemax);
    sieve.set();

    primes.push_back(2);
    primes.push_back(3);

    for (i = 1; i <= sievemaxroot; i++) {
        if (sieve[i]) {
            k = (3*i + 1) | 1;
            l = (4*k-2*k*(i&1)) / 3;

            for (j = k*k/3; j < sievemax; j += 2*k) {
                sieve[j] = 0;
                sieve[j+l] = 0;
            }

            primes.push_back(k);
        }
    }

    for (i = sievemaxroot + 1; i < sievemax; i++) {
        if (sieve[i]) primes.push_back((3*i+1)|1);
    }
}


推荐答案

I我会尽量解释尽可能多的。 sieve 数组有一个不寻常的索引方案;因此,数字 6 * k + 1 将存储在位置中, 2 * k k * 6 + 5 将存储在位置 2 * k + 1 3 * i + 1 | 1 操作是相反的:它使用 2 * n 并将它们转换为 6 * n + 1 ,并采用 2 * n + 1 ,并将其转换为 6 * n + 5 +1 | 1 1 3 5 )。主循环使用 5 (当 i时)对所有具有该属性的数字重复 k is 1); i 是对 k 。第一个片更新到 sieve 然后用 k * k / 3 + 2 * m * k m 一个自然数);那些索引的相应数字从 k ^ 2 开始,并在每个步骤增加 6 * k 。第二切片更新开始于索引 k *(k-2 *(i& 1)+4)/ 3 (number k *(k + 4)为 k congruent至 1 mod code>和 k *(k + 2)否则类似地增加 6 * k

I'll try to explain as much as I can. The sieve array has an unusual indexing scheme; it stores a bit for each number that is congruent to 1 or 5 mod 6. Thus, a number 6*k + 1 will be stored in position 2*k and k*6 + 5 will be stored in position 2*k + 1. The 3*i+1|1 operation is the inverse of that: it takes numbers of the form 2*n and converts them into 6*n + 1, and takes 2*n + 1 and converts it into 6*n + 5 (the +1|1 thing converts 0 to 1 and 3 to 5). The main loop iterates k through all numbers with that property, starting with 5 (when i is 1); i is the corresponding index into sieve for the number k. The first slice update to sieve then clears all bits in the sieve with indexes of the form k*k/3 + 2*m*k (for m a natural number); the corresponding numbers for those indexes start at k^2 and increase by 6*k at each step. The second slice update starts at index k*(k-2*(i&1)+4)/3 (number k * (k+4) for k congruent to 1 mod 6 and k * (k+2) otherwise) and similarly increases the number by 6*k at each step.

这是另一个尝试解释:let candidates 是所有数字的集合至少5个并且符合 1 5 mod 6 。如果将该集合中的两个元素相乘,则会得到集合中的另一个元素。让 succ(k)替代候选中的 k 候选中大于 k 的下一个元素(按数字顺序)。在这种情况下,筛子的内循环基本上是(使用 sieve 的正常索引):

Here's another attempt at an explanation: let candidates be the set of all numbers that are at least 5 and are congruent to either 1 or 5 mod 6. If you multiply two elements in that set, you get another element in the set. Let succ(k) for some k in candidates be the next element (in numerical order) in candidates that is larger than k. In that case, the inner loop of the sieve is basically (using normal indexing for sieve):

for k in candidates:
  for (l = k; ; l += 6) sieve[k * l] = False
  for (l = succ(k); ; l += 6) sieve[k * l] = False

存储在 sieve 中,与以下内容相同:

Because of the limitations on which elements are stored in sieve, that is the same as:

for k in candidates:
  for l in candidates where l >= k:
    sieve[k * l] = False

$ b b

这将删除候选中 k 的所有倍数(<$ c $在当前 k 被用作 l 更早或当它用作 k 现在)。

which will remove all multiples of k in candidates (other than k itself) from the sieve at some point (either when the current k was used as l earlier or when it is used as k now).

这篇关于将优化的Eratosthenes筛从Python移植到C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆