使用python组织tar bz2文件中的文件 [英] Organizing files in tar bz2 file with python

查看:151
本文介绍了使用python组织tar bz2文件中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在bz2文件中放置了大约200,000个文本文件。我的问题是,当我扫描bz2文件以提取所需的数据时,它的运行速度非常慢。它必须浏览整个bz2文件以完善我要查找的单个文件。

I have about 200,000 text files that are placed in a bz2 file. The issue I have is that when I scan the bz2 file to extract the data I need, it goes extremely slow. It has to look through the entire bz2 file to fine the single file I am looking for. Is there anyway to speed this up?

此外,我考虑过可能要组织tar.bz2中的文件,以便让我知道在哪里查看。反正有组织到bz2中的文件吗?

Also, I thought about possibly organizing the files in the tar.bz2 so I can instead have it know where to look. Is there anyway to organize files that are put into a bz2?

更多信息/编辑:
我需要查询每个文本文件的压缩文件。是否有更好的压缩方法支持如此大量的文件并进行了彻底压缩?

More Info/ I need to query the compressed file for each textfile. Is there a better compression method that supports such a large number of files and is as thoroughly compressed?

推荐答案

您是否必须使用bzip2?阅读它是文档,很明显,它不是设计好的支持随机访问。也许您应该使用更符合您要求的压缩格式。好的旧的Zip格式支持随机访问,但是压缩得当然会更糟。

Do you have to use bzip2? Reading it's documentation, it's quite clear it's not designed to support random access. Perhaps you should use a compression format that more closely matches your requirements. The good old Zip format supports random access, but might compress worse, of course.

这篇关于使用python组织tar bz2文件中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆