在Python中写入巨大的字符串 [英] Writing huge strings in python

查看：152 发布时间：2017/11/4 20:53:42 python performance python-3.x file-io

本文介绍了在Python中写入巨大的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个非常长的字符串，几乎是一个兆字节长，我需要写入一个文本文件。常规的

file = open（file.txt，w） file.write（string） file.close（）
作品太慢了，有没有办法写得更快？

我试图写一个数百万位的数字到一个文本文件
这个数字的顺序是数学.factorial（67867957）

这是在分析中显示的内容：

<$ p $在0.001秒内203个函数调用（198个原始调用）

命令：标准名称

ncalls tottime percall cumtime percall文件名：lineno（函数）
1 0.000 0.000 0.000 0.000< string>：1（< module>）
1 0.000 0.000 0.000 0.000 re.py:217（compile）
1 0.000 0.000 0.000 0.000 py：273（_compile）
1 0.000 0.000 0.000 0.000 sre_compile.py:172 (_compile_charset）
1 0.000 0.000 0.000 0.000 sre_compile.p y：201（_optimize_charset）
4 0.000 0.000 0.000 0.000 sre_compile.py:25(_identityfunction）
3/1 0.000 0.000 0.000 0.000 sre_compile.py:33（_compile）
1 0.000 0.000 0.000 0.000 sre_compile.py:341（_compile_info）
2 0.000 0.000 0.000 0.000 sre_compile.py:442（发带）
1 0.000 0.000 0.000 0.000 sre_compile.py:445（_code）
1 0.000 0.000 0.000 0.000 sre_compile.py:460（compile）
5 0.000 0.000 0.000 0.000 sre_parse.py:126 (__len__）
12 0.000 0.000 0.000 0.000 sre_parse.py:130(__getitem__）
7 0.000 0.000 0.000 0.000 sre_parse.py:138(append）
3/1 0.000 0.000 0.000 0.000 sre_parse.py:140（getwidth）
1 0.000 0.000 0.000 0.000 sre_parse.py:178(__init__）
10 0.000 0.000 0.000 0.000 sre_parse.py:183（_next）
2 0.0 00 0.000 0.000 0.000 sre_parse.py:202（match）
8 0.000 0.000 0.000 0.000 sre_parse.py:208（get）
1 0.000 0.000 0.000 0.000 sre_parse.py:351(_parse_sub）
2 0.000 0.000 0.000 0.000 sre_parse.py:429（_parse）
1 0.000 0.000 0.000 0.000 sre_parse.py:67(__init__）
1 0.000 0.000 0.000 0.000 sre_parse.py:726（fix_flags）
1 0.000 0.000 0.000 0.000 sre_parse.py:738（parse）
3 0.000 0.000 0.000 0.000 sre_parse.py:90(__init__）
1 0.000 0.000 0.000 0.000 {内置方法编译}
1 0.001 0.001 0.001 0.001 {内置方法exec}
17 0.000 0.000 0.000 0.000 {内置方法isinstance}
39/38 0.000 0.000 0.000 0.000 {内置方法len}
2 0.000 0.000 0.000 0.000 {最大内置法}
8 0.000 0.000 0.000 0.000 {内置方法min}
6 0.000 0.000 0.000 0.000 {内置方法ord}
48 0.000 0.000 0.000 0.000 {list'对象的方法'append'}
1 0.000 0.000 0.000 0.000 {_lsprof.Profiler对象的方法'disable'}
5 0.000 0.000 0.000 0.000 {'bytearray'对象的方法'find'
1 0.000 0.000 0.000 0.000 {方法' 'of'dict'objects}

解决方案
在python中， str（long）对于大整数（数百万位数）非常缓慢。这是一个Python中的二次运算（以位数为单位），即对于〜1e8的数字，它可能要求〜1e16操作将整数转换为十进制字符串。

写入文件500MB不应该花费几个小时，例如：

$ python3 -c'open（file，w）。write（a* 500 * 1000000）'
几乎立即返回。 ls -l file 确认文件已被创建，并且具有预期的大小。

计算 math.factorial（67867957）（结果有大约500M位）可能需要几个小时，但是使用 pickle 保存是瞬间的： p>

导入数学导入pickle n = math.factorial（67867957）＃需要很长时间打开（file.pickle，wb）作为文件： pickle.dump（n，file）＃非常快（比较）
使用 n = pickle.load（open（'file.pickle'，'rb'））占用不到一秒。

（$）仍在运行50小时）。

要快速获得十进制表示，您可以
$ python -c'import gmpy2; open（file.gmpy2，w）。write（str（gmpy2.fac（67 867957）））'
我的机器只需不到10分钟。 $ b
I have a very long string, almost a megabyte long, that I need to write to a text file. The regular
file = open("file.txt","w") file.write(string) file.close()
works but is too slow, is there a way I can write faster?

I am trying to write a several million digit number to a text file the number is on the order of math.factorial(67867957)

This is what shows on profiling:
203 function calls (198 primitive calls) in 0.001 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 re.py:217(compile) 1 0.000 0.000 0.000 0.000 re.py:273(_compile) 1 0.000 0.000 0.000 0.000 sre_compile.py:172(_compile_charset) 1 0.000 0.000 0.000 0.000 sre_compile.py:201(_optimize_charset) 4 0.000 0.000 0.000 0.000 sre_compile.py:25(_identityfunction) 3/1 0.000 0.000 0.000 0.000 sre_compile.py:33(_compile) 1 0.000 0.000 0.000 0.000 sre_compile.py:341(_compile_info) 2 0.000 0.000 0.000 0.000 sre_compile.py:442(isstring) 1 0.000 0.000 0.000 0.000 sre_compile.py:445(_code) 1 0.000 0.000 0.000 0.000 sre_compile.py:460(compile) 5 0.000 0.000 0.000 0.000 sre_parse.py:126(__len__) 12 0.000 0.000 0.000 0.000 sre_parse.py:130(__getitem__) 7 0.000 0.000 0.000 0.000 sre_parse.py:138(append) 3/1 0.000 0.000 0.000 0.000 sre_parse.py:140(getwidth) 1 0.000 0.000 0.000 0.000 sre_parse.py:178(__init__) 10 0.000 0.000 0.000 0.000 sre_parse.py:183(__next) 2 0.000 0.000 0.000 0.000 sre_parse.py:202(match) 8 0.000 0.000 0.000 0.000 sre_parse.py:208(get) 1 0.000 0.000 0.000 0.000 sre_parse.py:351(_parse_sub) 2 0.000 0.000 0.000 0.000 sre_parse.py:429(_parse) 1 0.000 0.000 0.000 0.000 sre_parse.py:67(__init__) 1 0.000 0.000 0.000 0.000 sre_parse.py:726(fix_flags) 1 0.000 0.000 0.000 0.000 sre_parse.py:738(parse) 3 0.000 0.000 0.000 0.000 sre_parse.py:90(__init__) 1 0.000 0.000 0.000 0.000 {built-in method compile} 1 0.001 0.001 0.001 0.001 {built-in method exec} 17 0.000 0.000 0.000 0.000 {built-in method isinstance} 39/38 0.000 0.000 0.000 0.000 {built-in method len} 2 0.000 0.000 0.000 0.000 {built-in method max} 8 0.000 0.000 0.000 0.000 {built-in method min} 6 0.000 0.000 0.000 0.000 {built-in method ord} 48 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 5 0.000 0.000 0.000 0.000 {method 'find' of 'bytearray' objects} 1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}

解决方案
Your issue is that str(long) is very slow for large intergers (millions of digits) in Python. It is a quadratic operation (in number of digits) in Python i.e., for ~1e8 digits it may require ~1e16 operations to convert the integer to a decimal string.

Writing to a file 500MB should not take hours e.g.:
$ python3 -c 'open("file", "w").write("a"*500*1000000)'
returns almost immediately. ls -l file confirms that the file is created and it has the expected size.

Calculating math.factorial(67867957) (the result has ~500M digits) may take several hours but saving it using pickle is instantaneous:
import math import pickle n = math.factorial(67867957) # takes a long time with open("file.pickle", "wb") as file: pickle.dump(n, file) # very fast (comparatively)
To load it back using n = pickle.load(open('file.pickle', 'rb')) takes less than a second.

str(n) is still running (after 50 hours) on my machine.

To get the decimal representation fast, you could use gmpy2:
$ python -c'import gmpy2;open("file.gmpy2", "w").write(str(gmpy2.fac(67867957)))'
It takes less than 10 minutes on my machine.

这篇关于在Python中写入巨大的字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Python中写入巨大的字符串 [英] Writing huge strings in python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Python中写入巨大的字符串 [英] Writing huge strings in python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭