计算序列中的互质数 [英] Counting coprimes in a sequence

查看:94
本文介绍了计算序列中的互质数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具有n个<= 10 ^ 6个整数的序列,所有不超过m <= 3 * 10 ^ 6个整数,我想计算其中有多少个互质对。如果两个数字的最大公约数最大为1,则它们是互质的。

Having a sequence of n <= 10^6 integers, all not exceeding m <= 3*10^6, I'd like to count how many coprime pairs are in it. Two numbers are coprime if their greatest common divisor is 1.

这可以用O(n ^ 2 log n)轻松完成,但这显然是减慢速度的方法,作为极限表明更接近O(n log n)。无法快速完成的一件事是将所有数字都排除在外,并且在每个数字中抛弃多次出现相同质数的情况,但这并不会带来任何重大改进。我还想计算相反的数-具有共同除数的对。可以成组完成-首先计算所有对,它们的最小公质数为2,然后是3、5等,但在我看来,这就像另一个死胡同。

It can be done trivially in O(n^2 log n), but this is obviously way to slow, as the limit suggests something closer to O(n log n). One thing than can be done quickly is factoring out all the numbers, and also throwing out multiple occurences of the same prime in each, but that doesn't lead to any significant improvement. I also thought of counting the opposite - pairs that have a common divisor. It could be done in groups - firstly counting all the pairs that their smallest common prime divisor is 2, then 3, 5, and etc., but it seems to me like an other dead end.

推荐答案

根据您的回答,我提出了一个更快的替代方案。在我的工作PC上,我的C ++实现(底部)大约需要 350ms 来解决任何问题实例;在我的旧笔记本电脑上,耗时刚好超过1秒。该算法避免了所有除法和模运算,并且仅使用O(m)空间。

I've come up with a slightly faster alternative based on your answer. On my work PC my C++ implementation (bottom) takes about 350ms to solve any problem instance; on my old laptop, it takes just over 1s. This algorithm avoids all division and modulo operations, and uses only O(m) space.

与您的算法一样,基本思想是通过以下方法应用包含-排除原理:枚举每个不包含重复因子的2≤i≤m的每个数字,并精确计数一次,对于每个这样的i,计算输入中可被i整除的数目,然后从总数中减去或减去该总数。关键区别在于,我们可以简单地通过测试输入中是否出现i的每个可能倍数来愚蠢地计数部分,这仍然只需要O(m log m)时间。

As with your algorithm, the basic idea is to apply the Inclusion-Exclusion Principle by enumerating every number 2 <= i <= m that contains no repeated factors exactly once, and for each such i, counting the number of numbers in the input that are divisible by i and either adding or subtracting this from the total. The key difference is that we can do the counting part "stupidly", simply by testing whether each possible multiple of i appears in the input, and this still takes just O(m log m) time.

最里面的行 c + = v [j] .freq; countCoprimes()重复吗?对于每个不包含重复素数的数2≤i≤m,外循环主体执行一次。该迭代计数的上限值为m。内循环每次将i步进前进[2..m]范围,因此它在单个外循环迭代期间执行的操作数上限为m / i。因此,最内层线的迭代总数以从i = 2到m / i的m的和为上限。可以将m因子移到总和之外以获得上限

How many times does the innermost line c += v[j].freq; in countCoprimes() repeat? The body of the outer loop is executed once for each number 2 <= i <= m that contains no repeated prime factors; this iteration count is trivially upper-bounded by m. The inner loop advances i steps at a time through the range [2..m], so the number of operations it performs during a single outer loop iteration is upper-bounded by m / i. Therefore the total number of iterations of the innermost line is upper-bounded by the sum from i=2 to m of m/i. The m factor can be moved outside the sum to get an upper bound of

m * sum{i=2..m}(1/i)

该和是调和级数的部分和,并且它是log(m)的上限,因此最里面的循环迭代的总数是O(m log m)。

That sum is a partial sum in a harmonic series, and it is upper-bounded by log(m), so the total number of innermost loop iterations is O(m log m).

extendedEratosthenes()旨在通过避免所有除法和保持O(m)内存使用率。实际上,所有 countCoprimes()都需要知道2≤< = i≤m是(a)是否具有重复的素因,是否不存在。 ,(b)是否具有偶数或奇数个素数。为了计算(b),我们可以利用一个事实,即Eratosthenes筛网有效地击中了具有给定i且其独特质数按递增顺序排列的任何给定i,因此我们可以翻转一下( parity 字段 struct entry 中的字段),以跟踪我是偶数还是奇数个因子。每个数字均以等于1的 prod 字段开头;要记录(a),只需将其 prod 字段设置为0,即可简单地淘汰任何包含质数平方的数字。目的:如果 v [i] .prod == 0 ,则表明我被发现具有重复因素;否则,它包含到目前为止发现的(必要不同)因素的乘积。此工具的(相当次要)实用程序是,它允许我们在m的平方根处停止主筛网循环,而不是一直到m:到现在为止,对于任何没有重复因子的给定i, v [i] .prod == i ,在这种情况下,我们找到了所有i的因素,或者 v [i] .prod< i ,在这种情况下,我必须具有我们尚未考虑的正好一个因子> sqrt(3000000)。我们可以通过第二个非嵌套循环找到所有这些剩余的大因素。

extendedEratosthenes() is designed to reduce constant factors by avoiding all divisions and keeping to O(m) memory usage. All countCoprimes() actually needs to know for a number 2 <= i <= m is (a) whether it has repeated prime factors, and if it doesn't, (b) whether it has an even or odd number of prime factors. To calculate (b) we can make use of the fact that the Sieve of Eratosthenes effectively "hits" any given i with its distinct prime factors in increasing order, so we can just flip a bit (the parity field in struct entry) to keep track of whether i has an even or odd number of factors. Each number starts with a prod field equal to 1; to record (a) we simply "knock out" any number that contains the square of a prime number as a factor by setting its prod field to 0. This field serves a dual purpose: if v[i].prod == 0, it indicates that i was discovered to have repeated factors; otherwise it contains the product of the (necessarily distinct) factors discovered so far. The (fairly minor) utility of this is that it allows us to stop the main sieve loop at the square root of m, instead of going all the way up to m: by now, for any given i that has no repeated factors, either v[i].prod == i, in which case we have found all the factors for i, or v[i].prod < i, in which case i must have exactly one factor > sqrt(3000000) that we have not yet accounted for. We can find all such remaining "large factors" with a second, non-nested loop.

#include <iostream>
#include <vector>

using namespace std;

struct entry {
    int freq;       // Frequency that this number occurs in the input list
    int parity;     // 0 for even number of factors, 1 for odd number
    int prod;       // Product of distinct prime factors
};

const int m = 3000000;      // Maximum input value
int n = 0;                  // Will be number of input values
vector<entry> v;

void extendedEratosthenes() {
    int i;
    for (i = 2; i * i <= m; ++i) {
        if (v[i].prod == 1) {
            for (int j = i, k = i; j <= m; j += i) {
                if (--k) {
                    v[j].parity ^= 1;
                    v[j].prod *= i;
                } else {
                    // j has a repeated factor of i: knock it out.
                    v[j].prod = 0;
                    k = i;
                }
            }
        }
    }

    // Fix up numbers with a prime factor above their square root.
    for (; i <= m; ++i) {
        if (v[i].prod && v[i].prod != i) {
            v[i].parity ^= 1;
        }
    }
}

void readInput() {
    int i;
    while (cin >> i) {
        ++v[i].freq;
        ++n;
    }
}

void countCoprimes() {
    __int64 total = static_cast<__int64>(n) * (n - 1) / 2;
    for (int i = 2; i <= m; ++i) {
        if (v[i].prod) {
            // i must have no repeated factors.

            int c = 0;
            for (int j = i; j <= m; j += i) {
                c += v[j].freq;
            }

            total -= (v[i].parity * 2 - 1) * static_cast<__int64>(c) * (c - 1) / 2;
        }
    }

    cerr << "Total number of coprime pairs: " << total << "\n";
}

int main(int argc, char **argv) {
    cerr << "Initialising array...\n";
    entry initialElem = { 0, 0, 1 };
    v.assign(m + 1, initialElem);

    cerr << "Performing extended Sieve of Eratosthenes...\n";
    extendedEratosthenes();

    cerr << "Reading input...\n";
    readInput();

    cerr << "Counting coprimes...\n";
    countCoprimes();

    return 0;
}

这篇关于计算序列中的互质数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆