如何尽快计算以十进制数字(大于一百万)的巨大序列给出的整数的基数3? [英] How to calculate as quick as possible the base 3 value of an integer which is given as a huge sequence of decimal digits (more than one million)?
问题描述
我们从教授那里得到了这项任务.先决条件是:
We got this task from our professor. Prerequisites are:
- 使用Python 3并仅使用内置函数(不使用numpy).
- 主要任务:在5秒钟内查找并存储结果.
- 次要任务,很高兴:不仅找到基数b = 3的值,而且还找到基数b = 3 ** k(k = 2,3,4)的值.
与我们的第一个直接解决方案相比,我们的性能提高了96倍(快了将近100倍),但仍未达到5秒的限制(目前,i7笔记本电脑的速度为25秒) . [我们的教授在纯Python中也没有解决方案,因此这是一项研究任务.]
Compared to our 1st straight-forward solution, we achieved an improvement by factor 96 (almost 100 times faster), but still it doesn't fulfill the 5 sec limit (currently, we are at 25 sec on an i7 laptop). [Our prof also has no solution in pure Python, so it's a bit of a research task.]
完整的代码(包括测试调用)在此处:总的来说,它显示从原来的2400秒(= 40分钟)提高到25秒.但是,我们需要对性能5进行另一项性能改进.有人有想法可以提供帮助吗?
The complete code (including test calls) is here: Overall, it shows an improvement from originally 2400 sec (= 40 min) to 25 sec. However, we need another performance improvement of factor 5. Does someone have ideas and can help?
# -*- coding: utf-8 -*-
#
# Convert a long random sequence of base-10 digits to integers base 3**k with k=1,2,3,4
#
# Task for phdgroupA: length of sequence is 1.5*(10**6)
# time < 5 sec
# Use Python 3 (standard libraries only, no numpy) !
#
# Testcase with a very small sequence, made purely of the digit 7:
# (see sagemath or www.math.com/tables/general/base_conv.htm)
# numlen = 12 --> 777777777777_base10
# = 2202100120200002212221010_base3
# = 2670520085833_base9
# = 2k9fi2np3_base27 ("digits": 0123456789ab...pq)
# [2, 20, 9, 15, 18, 2, 23, 25, 3]
# = 2[61]5[18]8[53][30]_base81
# [2, 61, 5, 18, 8, 53, 30]
#
# Convert decimal number n to a sequence of list elements with integer values in the range 0 to base-1.
# With divmod, it's ca. 1/3 faster than using n%b and then n//=b.
def numberToBase(n, b):
digits = []
while n:
n, rem = divmod(n, b)
digits.append(rem)
return digits[::-1]
# Step 0: Create string of nlen digits
def step0(nlen):
rd = 7 # which digit to repeat
string_val = "".join(str(rd) for i in range(nlen))
return string_val # end of step0()
# Step 1: Convert string to int (the string contains only decimal digits)
def step1(string_val, option_chunk=True):
if option_chunk == True:
string_val_len = len(string_val)
Chunk_len = 90000
Read_len = 0
int_valChunk = 0
int_valLocal = 0
ii = 0
while Read_len < string_val_len:
string_val_ChunkRead = string_val[ii*Chunk_len:(ii+1)*Chunk_len]
Chunk_lenRead = len(string_val_ChunkRead)
int_valChunk = int(string_val_ChunkRead)
ii += 1
int_valLocal = int_valLocal * 10**Chunk_lenRead + int_valChunk
Read_len += Chunk_lenRead
int_val = int_valLocal
else:
int_val = int(string_val)
return int_val # end of step1()
# Step 2: Convert given integer to another base
def step2(n, b, convsteps):
nList = []
if convsteps == 3: # Here the conversion is done in 3 steps
expos = 10000, 300
base_a = b ** expos[0]
base_b = b ** expos[1]
nList1 = numberToBase(n, base_a) # That's the time killer in this part
nList2 = [numberToBase(ll, base_b) for ll in nList1]
nList3 = [numberToBase(mm, b) for ll in nList2 for mm in ll]
nList = [mm for ll in nList3 for mm in ll]
else: # Do conversion in one bulk
nList = numberToBase(n, b)
return nList # end of step2()
if __name__ == '__main__':
# Calculate the string of digits
numlen = 1500000 # number of digits = length of sequence
string_value = step0(numlen)
# Calculate the integer value of the string_value
int_value = step1(string_value, option_chunk=True)
# Convert int_value to list of numbers of the given bases
convsteps = 3 # value of '3' makes step2() 50-60 times faster than value '1'
b = 3
numList = step2(int_value, b, convsteps)
print('3**1: numList begin:', numList[:10]) # Expect: [2, 0, 1, 0, 0, 1, 1, 0, 2, 1]
想法可能是,步骤1中的块可以具有其他大小?还是可以更好地平衡中间转换的两个大基础?还是可以更直接地将字符串从十进制数字转换为以3为基的列表?
Ideas may be, the chunk in step 1 could have another size? Or the two big bases for the intermediate conversions could be better balanced? Or the conversion from a string of decimal digits to a list of base 3 could be made more directly?
说明:上面的Python代码中的算法分3个步骤起作用:
Description: The algorithm in the Python code above works in 3 steps:
- 第0步:获取数据. 在这里,出于测试目的,我们创建了一系列 长度为150万个数字的十进制数字. 此值通常是我们将从文件中获得的随机值. 然后将该序列存储为字符串.
- 第1步:将该字符串转换为整数(默认值为10).
- 第2步:将该整数转换为以b = 3为底的整数.
- step 0: Get data. Here we create -- for test purposes -- a sequence of decimal digits of a length of 1.5 million digits. This value is normally a value we will get as a random value from file. The sequence is then stored as a string.
- step 1: Convert that string to an integer (default is base 10).
- step 2: Convert that integer to an integer of base b=3.
与最初的直接解决方案相比,这三个变化带来了最多的改进:
These three changes caused the most improvements (compared to the initial straight-forward solution):
-
在步骤2中使用的辅助函数 numberToBase (n,b). 将整数n转换为以b为底的整数.结果是一个列表 b的十进制整数的整数.按顺序读取列表 是基数b中的结果数.改善是通过 使用内置函数'divmod'代替两个命令n%b 而while循环中的n//= b.这带来了性能上的提升 因素2.
The helper function numberToBase(n, b) which is used in step 2, converts the integer n to an integer of base b. The result is a list of decimal integers each of base b. Reading the list as a sequence is the resulting number in base b. The improvement was achieved by using the build-in function 'divmod' instead of the two commands n%b and n//=b within the while loop. This brought a performance boost of factor 2.
函数 step2 (n,b,convsteps)将给定的整数n转换为 基数b的整数(其中b = 3).最初,我们称 辅助函数 numberToBase (n,b)一次.然后,我们介绍了 step2 ()中的中间步骤-因此n并未迁移到最终版本 基本步骤,但分3步进行.中间的基础很多 大于最终基准b.这些中间基础转换步骤 快2倍:60倍.
Function step2(n, b, convsteps) converts the given integer n into an integer of base b (with b=3). Initially, we called the helper function numberToBase(n, b) once. Then, we introduced intermediate steps in step2() -- so n wasn't migrated to the final base in one step, but in 3 steps. The intermediate bases are much bigger than the final basis b. These intermediate base conversions made step 2 much quicker: 60 times.
step1 ()函数的速度提高了4倍.
Function step1() was made 4 times faster by reading the string in chunks and by doing the conversion for each junk separately.
任何想法都值得欢迎.请使用time()测试您的想法,并定量说明其优势.我们在此处检查的其他答案是,没有使用那么长的十进制数字序列(在字符串中),或者没有关注基本转换的性能.
Any idea is welcome. Please test your ideas with time() to also give a quantitative statement about its advantage. Other answers we checked here, didn't not use such a long sequence of decimal digits (in the string) or didn't focus on the performance of the base conversion.
推荐答案
好的,我认为这是解决方案
ok I think this is the solution
base3to9={
"00":"0",
"01":"1",
"02":"2",
"10":"3",
"11":"4",
"12":"5",
"20":"6",
"21":"7",
"22":"8",
}
def convert_base3_to_base9(s):
s = '0'*(len(s)%2) + s # ensure that the string is the right length
return "".join(base3to9[s[i:i+2]] for i in range(0,len(s),2))
print(convert_base3_to_base9("12012120121010"))
# 5176533
然后您可以推断出来
base3to27 = {
"000":"0",
"001":"1",
...
"222":"Q"
}
def convert_base3_to_base27(s):
s = '0'*(len(s)%3) + s # ensure that the string is the right length
return "".join(base3to27[s[i:i+3]] for i in range(0,len(s),3))
基本上根本没有数学可做...只是O(1)字典查找...应该真的非常快
basically no math to do at all ... just O(1) dict lookups ... should be really quite fast
这篇关于如何尽快计算以十进制数字(大于一百万)的巨大序列给出的整数的基数3?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!