如何尽快计算以十进制数字(大于一百万)的巨大序列给出的整数的基数3? [英] How to calculate as quick as possible the base 3 value of an integer which is given as a huge sequence of decimal digits (more than one million)?

查看:89
本文介绍了如何尽快计算以十进制数字(大于一百万)的巨大序列给出的整数的基数3?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们从教授那里得到了这项任务.先决条件是:

We got this task from our professor. Prerequisites are:

  • 使用Python 3并仅使用内置函数(不使用numpy).
  • 主要任务:在5秒钟内查找并存储结果.
  • 次要任务,很高兴:不仅找到基数b = 3的值,而且还找到基数b = 3 ** k(k = 2,3,4)的值.

与我们的第一个直接解决方案相比,我们的性能提高了96倍(快了将近100倍),但仍未达到5秒的限制(目前,i7笔记本电脑的速度为25秒) . [我们的教授在纯Python中也没有解决方案,因此这是一项研究任务.]

Compared to our 1st straight-forward solution, we achieved an improvement by factor 96 (almost 100 times faster), but still it doesn't fulfill the 5 sec limit (currently, we are at 25 sec on an i7 laptop). [Our prof also has no solution in pure Python, so it's a bit of a research task.]

完整的代码(包括测试调用)在此处:总的来说,它显示从原来的2400秒(= 40分钟)提高到25秒.但是,我们需要对性能5进行另一项性能改进.有人有想法可以提供帮助吗?

The complete code (including test calls) is here: Overall, it shows an improvement from originally 2400 sec (= 40 min) to 25 sec. However, we need another performance improvement of factor 5. Does someone have ideas and can help?

# -*- coding: utf-8 -*-
#
# Convert a long random sequence of base-10 digits to integers base 3**k with k=1,2,3,4
# 
# Task for phdgroupA: length of sequence is 1.5*(10**6)
#                     time < 5 sec
#                     Use Python 3 (standard libraries only, no numpy) !
#
# Testcase with a very small sequence, made purely of the digit 7:
# (see sagemath or www.math.com/tables/general/base_conv.htm)
# numlen = 12  -->  777777777777_base10
#                =  2202100120200002212221010_base3
#                =  2670520085833_base9
#                =  2k9fi2np3_base27   ("digits": 0123456789ab...pq)
#                   [2, 20, 9, 15, 18, 2, 23, 25, 3]
#                =  2[61]5[18]8[53][30]_base81
#                   [2, 61, 5, 18, 8, 53, 30]
# 


# Convert decimal number n to a sequence of list elements with integer values in the range 0 to base-1.
# With divmod, it's ca. 1/3 faster than using n%b and then n//=b.
def numberToBase(n, b):
    digits = []
    while n:
        n, rem = divmod(n, b)
        digits.append(rem)
    return digits[::-1]


# Step 0: Create string of nlen digits
def step0(nlen):
    rd = 7  # which digit to repeat
    string_val = "".join(str(rd) for i in range(nlen))
    return string_val  # end of step0()


# Step 1: Convert string to int (the string contains only decimal digits)
def step1(string_val, option_chunk=True):
    if option_chunk == True:
        string_val_len = len(string_val)
        Chunk_len = 90000
        Read_len = 0
        int_valChunk = 0
        int_valLocal = 0
        ii = 0
        while Read_len < string_val_len:
            string_val_ChunkRead = string_val[ii*Chunk_len:(ii+1)*Chunk_len]
            Chunk_lenRead = len(string_val_ChunkRead)
            int_valChunk = int(string_val_ChunkRead)
            ii += 1
            int_valLocal = int_valLocal * 10**Chunk_lenRead + int_valChunk
            Read_len += Chunk_lenRead
        int_val = int_valLocal
    else:
        int_val = int(string_val)
    return int_val  # end of step1()


# Step 2: Convert given integer to another base
def step2(n, b, convsteps):
    nList = []
    if convsteps == 3:  # Here the conversion is done in 3 steps
        expos = 10000, 300
        base_a = b ** expos[0]
        base_b = b ** expos[1]
        nList1 = numberToBase(n, base_a)  # That's the time killer in this part
        nList2 = [numberToBase(ll, base_b) for ll in nList1]
        nList3 = [numberToBase(mm, b) for ll in nList2 for mm in ll]
        nList = [mm for ll in nList3 for mm in ll]
    else: # Do conversion in one bulk
        nList = numberToBase(n, b)
    return nList  # end of step2()



if __name__ == '__main__':

    # Calculate the string of digits
    numlen = 1500000  # number of digits = length of sequence
    string_value = step0(numlen)

    # Calculate the integer value of the string_value
    int_value = step1(string_value, option_chunk=True)

    # Convert int_value to list of numbers of the given bases
    convsteps = 3  # value of '3' makes step2() 50-60 times faster than value '1'

    b = 3
    numList = step2(int_value, b, convsteps)
    print('3**1: numList begin:', numList[:10])  # Expect: [2, 0, 1, 0, 0, 1, 1, 0, 2, 1]

想法可能是,步骤1中的块可以具有其他大小?还是可以更好地平衡中间转换的两个大基础?还是可以更直接地将字符串从十进制数字转换为以3为基的列表?

Ideas may be, the chunk in step 1 could have another size? Or the two big bases for the intermediate conversions could be better balanced? Or the conversion from a string of decimal digits to a list of base 3 could be made more directly?

说明:上面的Python代码中的算法分3个步骤起作用:

Description: The algorithm in the Python code above works in 3 steps:

  • 第0步:获取数据. 在这里,出于测试目的,我们创建了一系列 长度为150万个数字的十进制数字. 此值通常是我们将从文件中获得的随机值. 然后将该序列存储为字符串.
  • 第1步:将该字符串转换为整数(默认值为10).
  • 第2步:将该整数转换为以b = 3为底的整数.
  • step 0: Get data. Here we create -- for test purposes -- a sequence of decimal digits of a length of 1.5 million digits. This value is normally a value we will get as a random value from file. The sequence is then stored as a string.
  • step 1: Convert that string to an integer (default is base 10).
  • step 2: Convert that integer to an integer of base b=3.

与最初的直接解决方案相比,这三个变化带来了最多的改进:

These three changes caused the most improvements (compared to the initial straight-forward solution):

  1. 在步骤2中使用的辅助函数 numberToBase (n,b). 将整数n转换为以b为底的整数.结果是一个列表 b的十进制整数的整数.按顺序读取列表 是基数b中的结果数.改善是通过 使用内置函数'divmod'代替两个命令n%b 而while循环中的n//= b.这带来了性能上的提升 因素2.

  1. The helper function numberToBase(n, b) which is used in step 2, converts the integer n to an integer of base b. The result is a list of decimal integers each of base b. Reading the list as a sequence is the resulting number in base b. The improvement was achieved by using the build-in function 'divmod' instead of the two commands n%b and n//=b within the while loop. This brought a performance boost of factor 2.

函数 step2 (n,b,convsteps)将给定的整数n转换为 基数b的整数(其中b = 3).最初,我们称 辅助函数 numberToBase (n,b)一次.然后,我们介绍了 step2 ()中的中间步骤-因此n并未迁移到最终版本 基本步骤,但分3步进行.中间的基础很多 大于最终基准b.这些中间基础转换步骤 快2倍:60倍.

Function step2(n, b, convsteps) converts the given integer n into an integer of base b (with b=3). Initially, we called the helper function numberToBase(n, b) once. Then, we introduced intermediate steps in step2() -- so n wasn't migrated to the final base in one step, but in 3 steps. The intermediate bases are much bigger than the final basis b. These intermediate base conversions made step 2 much quicker: 60 times.

step1 ()函数的速度提高了4倍.

Function step1() was made 4 times faster by reading the string in chunks and by doing the conversion for each junk separately.

任何想法都值得欢迎.请使用time()测试您的想法,并定量说明其优势.我们在此处检查的其他答案是,没有使用那么长的十进制数字序列(在字符串中),或者没有关注基本转换的性能.

Any idea is welcome. Please test your ideas with time() to also give a quantitative statement about its advantage. Other answers we checked here, didn't not use such a long sequence of decimal digits (in the string) or didn't focus on the performance of the base conversion.

推荐答案

好的,我认为这是解决方案

ok I think this is the solution

base3to9={
   "00":"0",
   "01":"1",
   "02":"2",
   "10":"3",
   "11":"4",
   "12":"5",
   "20":"6",
   "21":"7",
   "22":"8",   
}
def convert_base3_to_base9(s):
    s = '0'*(len(s)%2) + s # ensure that the string is the right length
    return "".join(base3to9[s[i:i+2]] for i in range(0,len(s),2))

print(convert_base3_to_base9("12012120121010"))
# 5176533

然后您可以推断出来

base3to27 = {
    "000":"0",
    "001":"1",
    ...
    "222":"Q"
}
def convert_base3_to_base27(s):
    s = '0'*(len(s)%3) + s # ensure that the string is the right length
    return "".join(base3to27[s[i:i+3]] for i in range(0,len(s),3))

基本上根本没有数学可做...只是O(1)字典查找...应该真的非常快

basically no math to do at all ... just O(1) dict lookups ... should be really quite fast

这篇关于如何尽快计算以十进制数字(大于一百万)的巨大序列给出的整数的基数3?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆