在Python中存储大文件的最快方式 [英] Fastest way to store large files in Python

查看:1391
本文介绍了在Python中存储大文件的最快方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近询问了一个问题,了解如何将大型python对象保存到文件。我之前遇到过将大量Python字典转换为字符串并通过 write()将它们写入文件的问题。现在我使用泡菜。虽然它的工作原理,文件是令人难以置信的大(> 5 GB)。我在这样大文件的领域没有经验。我想知道是否会更快,甚至可能,在将这个pickle文件存储到内存之前压缩。

I recently asked a question regarding how to save large python objects to file. I had previously run into problems converting massive Python dictionaries into string and writing them to file via write(). Now I am using pickle. Although it works, the files are incredibly large (> 5 GB). I have little experience in the field of such large files. I wanted to know if it would be faster, or even possible, to zip this pickle file prior to storing it to memory.

推荐答案

Python代码在实现数据序列化时会非常慢。
如果你尝试在纯Python中创建一个等同于Pickle的东西,你会看到它将是超级慢。
幸运的是,内置的模块性能相当不错。

Python code would be extremely slow when it comes to implementing data serialization. If you try to create an equivalent to Pickle in pure Python, you'll see that it will be super slow. Fortunately the built-in modules which perform that are quite good.

除了 cPickle 会找到 marshal 模块,这是一个快得多。
但是它需要一个真正的文件句柄(而不是类文件对象)。
你可以 import marshal as Pickle 并看到差别。
我不认为你可以做一个比这更快的自定义序列化程序...

Apart from cPickle, you will find the marshal module, which is a lot faster. But it needs a real file handle (not from a file-like object). You can import marshal as Pickle and see the difference. I don't think you can make a custom serializer which is a lot faster than this...

这是一个实际的href =http://kbyanc.blogspot.com/2007/07/python-serializer-benchmarks.html =nofollow>严重的Python序列化基准

Here's an actual (not so old) serious benchmark of Python serializers

这篇关于在Python中存储大文件的最快方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆