使用python组织tar bz2文件中的文件 [英] Organizing files in tar bz2 file with python
问题描述
我在bz2文件中放置了大约200,000个文本文件。我的问题是,当我扫描bz2文件以提取所需的数据时,它的运行速度非常慢。它必须浏览整个bz2文件以完善我要查找的单个文件。
I have about 200,000 text files that are placed in a bz2 file. The issue I have is that when I scan the bz2 file to extract the data I need, it goes extremely slow. It has to look through the entire bz2 file to fine the single file I am looking for. Is there anyway to speed this up?
此外,我考虑过可能要组织tar.bz2中的文件,以便让我知道在哪里查看。反正有组织到bz2中的文件吗?
Also, I thought about possibly organizing the files in the tar.bz2 so I can instead have it know where to look. Is there anyway to organize files that are put into a bz2?
更多信息/编辑:
我需要查询每个文本文件的压缩文件。是否有更好的压缩方法支持如此大量的文件并进行了彻底压缩?
More Info/ I need to query the compressed file for each textfile. Is there a better compression method that supports such a large number of files and is as thoroughly compressed?
推荐答案
您是否必须使用bzip2?阅读它是文档,很明显,它不是设计好的支持随机访问。也许您应该使用更符合您要求的压缩格式。好的旧的Zip格式支持随机访问,但是压缩得当然会更糟。
Do you have to use bzip2? Reading it's documentation, it's quite clear it's not designed to support random access. Perhaps you should use a compression format that more closely matches your requirements. The good old Zip format supports random access, but might compress worse, of course.
这篇关于使用python组织tar bz2文件中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!