Python 3 - 更快的打印& I / O [英] Python 3 - Faster Print & I/O
问题描述
Python中的实际数学运算最多只需要几秒钟。打印到一个文件需要几个小时;我并不总是有。
有没有办法加速I / O?
从我所知道的数字存储在RAM中(或者至少我假设,这是唯一会占用11GB内存的东西),但是Python不会立即将它打印到文本文件中。有没有办法将该信息 - 如果是数字 - 转储到文件?我已经尝试了任务管理器的转储,它给了我一个22GB的转储文件(是的,你读的是正确的),它看起来不像我在那里找的东西,尽管它不是很清楚。如果它有所不同,我有Python 3.5.1(蟒蛇和Spyder),Windows 8.1 x64和16GB的RAM。顺便说一下,我在脚本里面运行Garbage Collect(gc module),并删除了不需要的变量,所以这11GB不只是垃圾。
使用HDF5文件格式
编写大量的数据。
$ b HDF5的格式非常高效,可以通过各种工具访问。
准备好迎接一些挑战:
- HDF5有多个python包,你必须找到一个适合你的需要
- 安装并不总是很简单(但可能有Windows安装二进制文件)
- 需要一点研究才能理解数据结构存储。
- 偶尔会需要一些CPU周期 - 通常情况下,您会快速写入大量数据,并且一次必须刷新到磁盘。此时它开始压缩数据,可能需要几秒钟的时间。请参阅 GIL对于C扩展中的IO有界线程(HDF5) a>
- 安装并不总是很简单(但可能有Windows安装二进制文件)
无论如何,我认为,很可能,您将会管理和更快地写入文件,更小的文件,更容易处理。
I'm currently involved in a Python project that involves handling massive amounts of data. In this, I have to print massive amounts of data to files. They are always one-liners, but sometimes consisting of millions of digits.
The actual mathematical operations in Python only take seconds, minutes at most. Printing them to a file takes up to several hours; which I don't always have.
Is there any way of speeding up the I/O?
From what I figure, the number is stored in the RAM (Or at least I assume so, it's the only thing which would take up 11GB of RAM), but Python does not print it to a text file immediately. Is there a way to dump that information -- if it is the number -- to a file? I've tried Task Manager's Dump, which gave me a 22GB dump file (Yes, you read that right), and it doesn't look like there's what I was looking for in there, albeit it wasn't very clear.
If it makes a difference, I have Python 3.5.1 (Anaconda and Spyder), Windows 8.1 x64 and 16GB RAM.
By the way, I do run Garbage Collect (gc module) inside the script, and I delete variables that are not needed, so those 11GB aren't just junk.
Use HDF5 file format
The problem is, you have to write a lot of data.
HDF5 is format being very efficient in size and allowing access to it by various tools.
Be prepared for few challenges:
- there are multiple python packages for HDF5, you will have to find the one which fits your needs
- installation is not always very simple (but there might be Windows installation binary)
- expect a bit of study to understand the data structures to be stored.
- it will occasionally need some CPU cycles - typically you write a lot of data quickly and at one moment it has to be flushed to the disk. At this moment it starts compressing the data what can take few seconds. See GIL for IO bounded thread in C extension (HDF5)
Anyway, I think, it is very likely, you will manage and apart of faster writes to the files you will also gain smaller files, which are simpler to handle.
这篇关于Python 3 - 更快的打印& I / O的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!