非重复可分红数(优化) [英] Number of non repeat divisible dividends (optimize)

查看:65
本文介绍了非重复可分红数(优化)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种方法可以改进我的算法以在计算时间上获得最佳算法?

Is there a way to improve my algorithm to get the optimal one in terms of computing time?

给出一定范围的非零分红数[A,B](1 <= A,B <= 10 ^ 18),以及除数D = {x:1 <= x< = 50 },我想在[A,B]中找到可以除以集合D中任何数字的非重复股息总数.

Given a range of non-zero dividend numbers [A,B] (1<=A,B<=10^18), and a set of divisors D = {x: 1 <= x <= 50}, I want to find the total number of non-repeating dividends in [A,B] that can be divided by any number in set D.

示例1(不需要太多时间)

股息数范围[1,10]和除数为{2,3,4}

  • 除数2可除的数字列表:2,4,6,8,10
  • 除数3可除的数字列表:3,6,9
  • 除数4可除的数字列表:4,8
  • list of numbers divisible for divisor 2: 2,4,6,8,10
  • list of numbers divisible for divisor 3: 3,6,9
  • list of numbers divisible for divisor 4: 4,8

所以[1,10] = 7

示例2(花费大量时间)

[A,B] = [79031836253224256, 403892407826392155]
D = {44, 46, 50, 8, 35,41,6,18, 24, 23, 7, 45, 39, 46, 4, 35, 34}
Output: 161802330522861274

Python中算法的第一版

def solution(A,B,D):                                                                                                                                                                         
    a = set()                                                                                                      
    for i in range(A,B+1):                                                                                         
            for j in D:                                                                                            
                    if i%j == 0:                                                                                   
                            a.add(i)                                                                               
    # return the count of a                                                                                        
    return len(a)   

if __name__ == "__main__":                                                                                             
        print(solution(1,10,[2,3,4]))                                                                                  

推荐答案

一个关键的发现是[ A B ]的范围非常大,因此难以接受从 A 迭代到 B .但是,除数列表的大小相对较小.

One key observation is that the range [A, B] is very large, making it prohibitive to iterate from A to B. However, the size of the divisor lists is relatively small.

我们可以在不显式构造各个集合的情况下计算所需的数字.

We can count the required number without explicitly constructing the respective set.

首先,请注意,如果 D 仅包含一个数字(我们将该数字表示为 d ),那么我们可以轻松地找到该数字中的倍数范围:将是地板(B/d)-地板((A-1)/d).问题是,当 D 具有多个元素时,简单地对每个元素的上述公式求和将导致多次计数一些数字.

First, notice that if D would consist of only a single number (let's denote that number as d), then we could easily find the number of multiples in that range: it would be floor(B / d) - floor((A - 1) / d). The problem is that when D has more than one elements, simply summing the above formulas for each element would lead to counting some numbers multiple times.

我们可以使用包含-排除原则来解决上述问题 a>.

We can tackle the aforementioned issue by using the inclusion-exclusion principle.

我将通过一个小的示例来说明这种技术: A = 2, B = 8, D = {2,3}.

I'll illustrate this technique using a small example: A = 2, B = 8, D = {2, 3}.

  • 计算有2个除数的数字.有 4 个这样的数字:{2、4、6、8}.

  • Count how many numbers have 2 as a divisor. There are 4 such numbers: {2, 4, 6, 8}.

计算有3个除数的数字.有 2 个这样的数字:{3,6}.

Count how many numbers have 3 as a divisor. There are 2 such numbers: {3, 6}.

计算将2和3都除数的数字.仅有 1 个这样的数字:{6}.

Count how many numbers have both 2 and 3 as a divisor. There is just 1 such number: {6}.

根据包含-排除原理,要计算给定集合中有多少个数字具有至少一个除数,我们需要将步骤1和2的结果相加,并将步骤3的结果相减(这是因为在步骤2中1和2,我们对数字6进行了两次计数).因此,结果为 4 + 2 - 1 = 5 .

According to the inclusion-exclusion principle, to count how many numbers have at least one divisor in the given set we need to add the results from steps 1 and 2, and subtract the result from step 3 (this is because during steps 1 and 2 we have counted the number 6 twice). Thus, the result is 4 + 2 - 1 = 5.

通常,我们需要考虑集合 D 的所有子集.对于每个子集,我们需要计算可被当前子集中的所有数字整除的范围为[ A B ].此数字等于floor(B/LCM)-floor((A-1)/LCM),其中LCM是当前子集中所有数字的最小公倍数.

In general, we need to consider all subsets of the set D. For each subset, we need to count how many numbers in the range [A, B] are divisible by all the numbers in the current subset. This number is equal to floor(B / LCM) - floor((A - 1) / LCM), where LCM is the least common multiple of all the numbers in the current subset.

预处理除数列表.

上述想法的直接实现导致产生一个代码,该代码调查O(2 M )个子集,其中 M 是最大数量集合 D 中的除数(在当前问题设置中, M = 50).但是,我们可以使用 A提到的观察结果. Bau 在注释中减少 D 并将子集数减少为O(2 M /2 ).

The direct implementation of the above ideas leads to a code that investigates O(2M) subsets, where M is the maximum number of divisors in the set D (in the current problem settings, M = 50). However, we can use the observation mentioned by A. Bau in the comments to reduce D and reduce the number of subsets to O(2M/2).

如果 D 中的数字 d 1 是另一个数字 d 2 ,从 D 开始,然后 d 1 对最终结果没有影响,我们可以消除它.

If a number d1 in D is a multiple of another number d2 from D, then d1 has no influence in the final result, and we can eliminate it.

在当前问题设置中, D 中的所有数字都位于{1、2 ... 50}中,此预处理可以保证我们最多保留25个数字.

In the current problem settings, where all numbers in D are in {1, 2, .. 50}, this pre-processing guarantees that we are left with at most 25 numbers.

我们最多将有25个数字这一事实很有趣,但我不会详细介绍(除非有人感兴趣)-简而言之,我们可以对集合{1、2, ... 50}分成25个不相交的子集,其中第i个 子集包含形式为S i = {(2∙i-1)∙2 k的数字}.选择25个以上的数字可确保我们在同一集合中拥有2个数字,并且其中较大的数字将除以较小的数字.此外,这个界限很严格,因为我们可以找到25个数字的集合,这些数字无法进一步减少,例如{26,27,..,50}.

The demonstration for the fact that we'll have at most 25 numbers is interesting, but I won't go into the details (unless someone is interested) -- in brief, we can partition the set {1, 2, ... 50} into 25 disjoint subsets, where ith subset contains numbers of the form Si = {(2 ∙ i - 1) ∙ 2k}. Choosing more than 25 numbers guarantees that we'll have 2 numbers in the same set, and the larger of them will divide the smaller one. Furthermore, this bound is tight, because we can find sets of 25 numbers that can't be reduced any further, e.g. {26, 27, .., 50}.

对于25个数字的最坏情况,我们需要研究2 25 = 33,554,432个子集,这是可行的,因为可以快速研究每个子集.

For the worst case of 25 numbers, there are 225 = 33,554,432 subsets that we need to look into, which is feasible, since each subset can be rapidly investigated.

以下代码实现了上述想法,对于大型示例,运行时间为0.002秒:

The following code implements the above ideas and runs in 0.002 seconds for the large example:

import itertools
import time


def gcd2(a, b):
    while b > 0:
        a, b = b, a % b

    return a


def lcm2(a, b):
    return int(a * b / gcd2(a, b))


def lcm(list):
    ans = list[0]

    for i in range(1, len(list)):
        ans = lcm2(ans, list[i])

    return ans


def preprocess(D):
    D2 = []
    D = sorted(D)

    for i in range(len(D)):
        include = True
        for j in range(i):
            if D[i] % D[j] == 0:
                include = False
                break

        if include:
            D2.append(D[i])

    return D2


def solution2(A, B, D):
    D = preprocess(D)

    ans = 0
    for num in range(1, len(D) + 1):
        for list in itertools.combinations(D, num):
            v = lcm(list)

            sign = 1 if len(list) % 2 == 1 else -1
            val = sign * ((B // v) - (A - 1) // v)
            ans += val

    return ans


if __name__ == '__main__':
    [A, B] = [79031836253224256, 403892407826392155]
    D = [44, 46, 50, 8, 35, 41, 6, 18, 24, 23, 7, 45, 39, 46, 4, 35, 34]

    t = time.time()
    print(solution2(A, B, D))
    print('processing took {} s. '.format(time.time() - t))

结果是:

161802330522861274
processing took 0.00200653076171875 s. 

讨论.

正如גלעדברקן在评论中指出的那样,在最坏的情况下这仍然需要花费很长时间,其中包括除数集为{26,27,...,50}.在我的计算机上,大约需要6分钟.

As גלעד ברקן pointed out in the comments, this still takes a long time in the worst situation, consisting of the divisor set being {26, 27, ..., 50}. On my computer it takes around 6 minutes.

在那种情况下,大约需要考虑3300万个子集,并且需要为每个此类子集计算最小公倍数(LCM).

In that case there are about 33 millions subset that need to be considered, and the least common multiple (LCM) needs to be calculated for each such subset.

现在,LCM计算是针对每个子集独立进行的,但是可以共享一些计算(因为 LCM(union(S1,S2))= LCM(LCM(S1),LCM(S2) )).

Right now, the LCM computation is made independently for each subset, but it is possible to share some computations (because LCM(union(S1, S2)) = LCM(LCM(S1), LCM(S2))).

尽管如此,这可能还不够,因为即使LCM是即时计算的,当前的方法在我的计算机上仍将花费约18秒的时间.

Nevertheless, that might still not be enough, because even if the LCM was computed instantaneously, the current approach would still take around 18 seconds on my computer.

最后,我进行了另一个实验,并测量了迭代3300万个条目并在每次迭代期间执行一次整数除法和加法的时间.在python中,这大约需要8.4秒,而在C ++中,则大约需要0.12秒.

Finally, I made another experiment, and I measured the time of iterating through 33 million entries, and performing a single integer division and an addition during each iteration. This takes around 8.4 seconds in python, and about 0.12 seconds in C++.

总而言之,对于给出的问题设置,此答案中描述的技术可能不是最佳方法,但是也可能仅以更快的编程语言来实现这些想法就可以满足要求.

In conclusion, it might be the case that the techniques described in this answer are not the best for the given problem settings, but it's also possible that simply implementing these ideas in a faster programming language would fit the requirements.

这篇关于非重复可分红数(优化)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆