用python编写大字符串 [英] Writing huge strings in python
问题描述
我有一个很长的字符串,几乎有 1 兆字节长,我需要将其写入文本文件.常规
I have a very long string, almost a megabyte long, that I need to write to a text file. The regular
file = open("file.txt","w")
file.write(string)
file.close()
可以用,但是太慢了,有什么方法可以写得更快吗?
works but is too slow, is there a way I can write faster?
我正在尝试将数百万位数字写入文本文件该数字的顺序为 math.factorial(67867957)
I am trying to write a several million digit number to a text file
the number is on the order of math.factorial(67867957)
这是在分析中显示的内容:
This is what shows on profiling:
203 function calls (198 primitive calls) in 0.001 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 re.py:217(compile)
1 0.000 0.000 0.000 0.000 re.py:273(_compile)
1 0.000 0.000 0.000 0.000 sre_compile.py:172(_compile_charset)
1 0.000 0.000 0.000 0.000 sre_compile.py:201(_optimize_charset)
4 0.000 0.000 0.000 0.000 sre_compile.py:25(_identityfunction)
3/1 0.000 0.000 0.000 0.000 sre_compile.py:33(_compile)
1 0.000 0.000 0.000 0.000 sre_compile.py:341(_compile_info)
2 0.000 0.000 0.000 0.000 sre_compile.py:442(isstring)
1 0.000 0.000 0.000 0.000 sre_compile.py:445(_code)
1 0.000 0.000 0.000 0.000 sre_compile.py:460(compile)
5 0.000 0.000 0.000 0.000 sre_parse.py:126(__len__)
12 0.000 0.000 0.000 0.000 sre_parse.py:130(__getitem__)
7 0.000 0.000 0.000 0.000 sre_parse.py:138(append)
3/1 0.000 0.000 0.000 0.000 sre_parse.py:140(getwidth)
1 0.000 0.000 0.000 0.000 sre_parse.py:178(__init__)
10 0.000 0.000 0.000 0.000 sre_parse.py:183(__next)
2 0.000 0.000 0.000 0.000 sre_parse.py:202(match)
8 0.000 0.000 0.000 0.000 sre_parse.py:208(get)
1 0.000 0.000 0.000 0.000 sre_parse.py:351(_parse_sub)
2 0.000 0.000 0.000 0.000 sre_parse.py:429(_parse)
1 0.000 0.000 0.000 0.000 sre_parse.py:67(__init__)
1 0.000 0.000 0.000 0.000 sre_parse.py:726(fix_flags)
1 0.000 0.000 0.000 0.000 sre_parse.py:738(parse)
3 0.000 0.000 0.000 0.000 sre_parse.py:90(__init__)
1 0.000 0.000 0.000 0.000 {built-in method compile}
1 0.001 0.001 0.001 0.001 {built-in method exec}
17 0.000 0.000 0.000 0.000 {built-in method isinstance}
39/38 0.000 0.000 0.000 0.000 {built-in method len}
2 0.000 0.000 0.000 0.000 {built-in method max}
8 0.000 0.000 0.000 0.000 {built-in method min}
6 0.000 0.000 0.000 0.000 {built-in method ord}
48 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5 0.000 0.000 0.000 0.000 {method 'find' of 'bytearray' objects}
1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
推荐答案
您的问题是 str(long)
对于 Python 中的大整数(数百万位)非常慢.它是 Python 中的二次运算(位数),即,对于 ~1e8 位数,它可能需要 ~1e16 操作才能将整数转换为十进制字符串.
Your issue is that str(long)
is very slow for large intergers (millions of digits) in Python. It is a quadratic operation (in number of digits) in Python i.e., for ~1e8 digits it may require ~1e16 operations to convert the integer to a decimal string.
写入一个 500MB 的文件应该不需要几个小时,例如:
Writing to a file 500MB should not take hours e.g.:
$ python3 -c 'open("file", "w").write("a"*500*1000000)'
几乎立即返回.ls -l file
确认文件已创建并且具有预期的大小.
returns almost immediately. ls -l file
confirms that the file is created and it has the expected size.
计算math.factorial(67867957)
(结果有~500M 位)可能需要几个小时,但使用pickle
保存它是即时的:
Calculating math.factorial(67867957)
(the result has ~500M digits) may take several hours but saving it using pickle
is instantaneous:
import math
import pickle
n = math.factorial(67867957) # takes a long time
with open("file.pickle", "wb") as file:
pickle.dump(n, file) # very fast (comparatively)
使用 n = pickle.load(open('file.pickle', 'rb'))
重新加载它需要不到一秒钟的时间.
To load it back using n = pickle.load(open('file.pickle', 'rb'))
takes less than a second.
str(n)
仍在我的机器上运行(50 小时后).
str(n)
is still running (after 50 hours) on my machine.
要快速获得十进制表示,您可以使用 gmpy2
:
To get the decimal representation fast, you could use gmpy2
:
$ python -c'import gmpy2;open("file.gmpy2", "w").write(str(gmpy2.fac(67867957)))'
在我的机器上不到 10 分钟.
It takes less than 10 minutes on my machine.
这篇关于用python编写大字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!