Python 3 - 更快的打印& I / O [英] Python 3 - Faster Print & I/O

查看:164
本文介绍了Python 3 - 更快的打印& I / O的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在参与一个涉及处理海量数据的Python项目。在这里,我必须打印大量的数据文件。它们总是单行的,但有时也包含数百万位数字。



Python中的实际数学运算最多只需要几秒钟。打印到一个文件需要几个小时;我并不总是有。

有没有办法加速I / O?

从我所知道的数字存储在RAM中(或者至少我假设,这是唯一会占用11GB内存的东西),但是Python不会立即将它打印到文本文件中。有没有办法将该信息 - 如果是数字 - 转储到文件?我已经尝试了任务管理器的转储,它给了我一个22GB的转储文件(是的,你读的是正确的),它看起来不像我在那里找的东西,尽管它不是很清楚。如果它有所不同,我有Python 3.5.1(蟒蛇和Spyder),Windows 8.1 x64和16GB的RAM。顺便说一下,我在脚本里面运行Garbage Collect(gc module),并删除了不需要的变量,所以这11GB不只是垃圾。

解决方案

使用HDF5文件格式



编写大量的数据。
$ b HDF5的格式非常高效,可以通过各种工具访问。



准备好迎接一些挑战:


  • HDF5有多个python包,你必须找到一个适合你的需要
  • 安装并不总是很简单(但可能有Windows安装二进制文件)
  • 需要一点研究才能理解数据结构存储。
  • 偶尔会需要一些CPU周期 - 通常情况下,您会快速写入大量数据,并且一次必须刷新到磁盘。此时它开始压缩数据,可能需要几秒钟的时间。请参阅 GIL对于C扩展中的IO有界线程(HDF5) a>



无论如何,我认为,很可能,您将会管理和更快地写入文件,更小的文件,更容易处理。


I'm currently involved in a Python project that involves handling massive amounts of data. In this, I have to print massive amounts of data to files. They are always one-liners, but sometimes consisting of millions of digits.

The actual mathematical operations in Python only take seconds, minutes at most. Printing them to a file takes up to several hours; which I don't always have.

Is there any way of speeding up the I/O?
From what I figure, the number is stored in the RAM (Or at least I assume so, it's the only thing which would take up 11GB of RAM), but Python does not print it to a text file immediately. Is there a way to dump that information -- if it is the number -- to a file? I've tried Task Manager's Dump, which gave me a 22GB dump file (Yes, you read that right), and it doesn't look like there's what I was looking for in there, albeit it wasn't very clear.

If it makes a difference, I have Python 3.5.1 (Anaconda and Spyder), Windows 8.1 x64 and 16GB RAM.

By the way, I do run Garbage Collect (gc module) inside the script, and I delete variables that are not needed, so those 11GB aren't just junk.

解决方案

Use HDF5 file format

The problem is, you have to write a lot of data.

HDF5 is format being very efficient in size and allowing access to it by various tools.

Be prepared for few challenges:

Anyway, I think, it is very likely, you will manage and apart of faster writes to the files you will also gain smaller files, which are simpler to handle.

这篇关于Python 3 - 更快的打印& I / O的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆