如何尽快计算以十进制数字(大于一百万)的巨大序列给出的整数的基数3? [英] How to calculate as quick as possible the base 3 value of an integer which is given as a huge sequence of decimal digits (more than one million)?

查看：89 发布时间：2020/9/18 19:14:39 python performance base

本文介绍了如何尽快计算以十进制数字(大于一百万)的巨大序列给出的整数的基数3?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们从教授那里得到了这项任务.先决条件是:

We got this task from our professor. Prerequisites are:

使用Python 3并仅使用内置函数(不使用numpy).
主要任务:在5秒钟内查找并存储结果.
次要任务，很高兴:不仅找到基数b = 3的值，而且还找到基数b = 3 ** k(k = 2,3,4)的值.

与我们的第一个直接解决方案相比，我们的性能提高了96倍(快了将近100倍)，但仍未达到5秒的限制(目前，i7笔记本电脑的速度为25秒) . [我们的教授在纯Python中也没有解决方案，因此这是一项研究任务.]

Compared to our 1st straight-forward solution, we achieved an improvement by factor 96 (almost 100 times faster), but still it doesn't fulfill the 5 sec limit (currently, we are at 25 sec on an i7 laptop). [Our prof also has no solution in pure Python, so it's a bit of a research task.]

完整的代码(包括测试调用)在此处:总的来说，它显示从原来的2400秒(= 40分钟)提高到25秒.但是，我们需要对性能5进行另一项性能改进.有人有想法可以提供帮助吗?

The complete code (including test calls) is here: Overall, it shows an improvement from originally 2400 sec (= 40 min) to 25 sec. However, we need another performance improvement of factor 5. Does someone have ideas and can help?

# -*- coding: utf-8 -*-
#
# Convert a long random sequence of base-10 digits to integers base 3**k with k=1,2,3,4
# 
# Task for phdgroupA: length of sequence is 1.5*(10**6)
#                     time < 5 sec
#                     Use Python 3 (standard libraries only, no numpy) !
#
# Testcase with a very small sequence, made purely of the digit 7:
# (see sagemath or www.math.com/tables/general/base_conv.htm)
# numlen = 12  -->  777777777777_base10
#                =  2202100120200002212221010_base3
#                =  2670520085833_base9
#                =  2k9fi2np3_base27   ("digits": 0123456789ab...pq)
#                   [2, 20, 9, 15, 18, 2, 23, 25, 3]
#                =  2[61]5[18]8[53][30]_base81
#                   [2, 61, 5, 18, 8, 53, 30]
# 


# Convert decimal number n to a sequence of list elements with integer values in the range 0 to base-1.
# With divmod, it's ca. 1/3 faster than using n%b and then n//=b.
def numberToBase(n, b):
    digits = []
    while n:
        n, rem = divmod(n, b)
        digits.append(rem)
    return digits[::-1]


# Step 0: Create string of nlen digits
def step0(nlen):
    rd = 7  # which digit to repeat
    string_val = "".join(str(rd) for i in range(nlen))
    return string_val  # end of step0()


# Step 1: Convert string to int (the string contains only decimal digits)
def step1(string_val, option_chunk=True):
    if option_chunk == True:
        string_val_len = len(string_val)
        Chunk_len = 90000
        Read_len = 0
        int_valChunk = 0
        int_valLocal = 0
        ii = 0
        while Read_len < string_val_len:
            string_val_ChunkRead = string_val[ii*Chunk_len:(ii+1)*Chunk_len]
            Chunk_lenRead = len(string_val_ChunkRead)
            int_valChunk = int(string_val_ChunkRead)
            ii += 1
            int_valLocal = int_valLocal * 10**Chunk_lenRead + int_valChunk
            Read_len += Chunk_lenRead
        int_val = int_valLocal
    else:
        int_val = int(string_val)
    return int_val  # end of step1()


# Step 2: Convert given integer to another base
def step2(n, b, convsteps):
    nList = []
    if convsteps == 3:  # Here the conversion is done in 3 steps
        expos = 10000, 300
        base_a = b ** expos[0]
        base_b = b ** expos[1]
        nList1 = numberToBase(n, base_a)  # That's the time killer in this part
        nList2 = [numberToBase(ll, base_b) for ll in nList1]
        nList3 = [numberToBase(mm, b) for ll in nList2 for mm in ll]
        nList = [mm for ll in nList3 for mm in ll]
    else: # Do conversion in one bulk
        nList = numberToBase(n, b)
    return nList  # end of step2()



if __name__ == '__main__':

    # Calculate the string of digits
    numlen = 1500000  # number of digits = length of sequence
    string_value = step0(numlen)

    # Calculate the integer value of the string_value
    int_value = step1(string_value, option_chunk=True)

    # Convert int_value to list of numbers of the given bases
    convsteps = 3  # value of '3' makes step2() 50-60 times faster than value '1'

    b = 3
    numList = step2(int_value, b, convsteps)
    print('3**1: numList begin:', numList[:10])  # Expect: [2, 0, 1, 0, 0, 1, 1, 0, 2, 1]

想法可能是，步骤1中的块可以具有其他大小?还是可以更好地平衡中间转换的两个大基础?还是可以更直接地将字符串从十进制数字转换为以3为基的列表?

Ideas may be, the chunk in step 1 could have another size? Or the two big bases for the intermediate conversions could be better balanced? Or the conversion from a string of decimal digits to a list of base 3 could be made more directly?

说明:上面的Python代码中的算法分3个步骤起作用:

Description: The algorithm in the Python code above works in 3 steps:

第0步:获取数据. 在这里，出于测试目的，我们创建了一系列长度为150万个数字的十进制数字. 此值通常是我们将从文件中获得的随机值. 然后将该序列存储为字符串.
第1步:将该字符串转换为整数(默认值为10).
第2步:将该整数转换为以b = 3为底的整数.

step 0: Get data. Here we create -- for test purposes -- a sequence of decimal digits of a length of 1.5 million digits. This value is normally a value we will get as a random value from file. The sequence is then stored as a string.
step 1: Convert that string to an integer (default is base 10).
step 2: Convert that integer to an integer of base b=3.

与最初的直接解决方案相比，这三个变化带来了最多的改进:

These three changes caused the most improvements (compared to the initial straight-forward solution):

在步骤2中使用的辅助函数 numberToBase (n，b). 将整数n转换为以b为底的整数.结果是一个列表 b的十进制整数的整数.按顺序读取列表是基数b中的结果数.改善是通过使用内置函数'divmod'代替两个命令n％b 而while循环中的n//= b.这带来了性能上的提升因素2.

The helper function numberToBase(n, b) which is used in step 2, converts the integer n to an integer of base b. The result is a list of decimal integers each of base b. Reading the list as a sequence is the resulting number in base b. The improvement was achieved by using the build-in function 'divmod' instead of the two commands n%b and n//=b within the while loop. This brought a performance boost of factor 2.

函数 step2 (n，b，convsteps)将给定的整数n转换为基数b的整数(其中b = 3).最初，我们称辅助函数 numberToBase (n，b)一次.然后，我们介绍了 step2 ()中的中间步骤-因此n并未迁移到最终版本基本步骤，但分3步进行.中间的基础很多大于最终基准b.这些中间基础转换步骤快2倍:60倍.

Function step2(n, b, convsteps) converts the given integer n into an integer of base b (with b=3). Initially, we called the helper function numberToBase(n, b) once. Then, we introduced intermediate steps in step2() -- so n wasn't migrated to the final base in one step, but in 3 steps. The intermediate bases are much bigger than the final basis b. These intermediate base conversions made step 2 much quicker: 60 times.

step1 ()函数的速度提高了4倍.

Function step1() was made 4 times faster by reading the string in chunks and by doing the conversion for each junk separately.

任何想法都值得欢迎.请使用time()测试您的想法，并定量说明其优势.我们在此处检查的其他答案是，没有使用那么长的十进制数字序列(在字符串中)，或者没有关注基本转换的性能.

Any idea is welcome. Please test your ideas with time() to also give a quantitative statement about its advantage. Other answers we checked here, didn't not use such a long sequence of decimal digits (in the string) or didn't focus on the performance of the base conversion.

如何尽快计算以十进制数字(大于一百万)的巨大序列给出的整数的基数3? [英] How to calculate as quick as possible the base 3 value of an integer which is given as a huge sequence of decimal digits (more than one million)?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何尽快计算以十进制数字(大于一百万)的巨大序列给出的整数的基数3? [英] How to calculate as quick as possible the base 3 value of an integer which is given as a huge sequence of decimal digits (more than one million)?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭