读取顺序文件-压缩文件与未压缩文件 [英] Read sequential file - Compressed file vs Uncompressed

查看:126
本文介绍了读取顺序文件-压缩文件与未压缩文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找从磁盘读取顺序文件的最快方法。
我在某些帖子中读到,如果我使用lz4压缩文件,则与读取平面文件相比,我可以获得更好的性能,因为我将最大限度地减少I / O操作。

I am looking for the fastest way to read a sequential file from disk. I read in some posts that if I compressed the file using, for example, lz4, I could achieve better performance than read the flat file, because I will minimize the i/o operations.

但是当我尝试这种方法时,扫描lz4压缩文件比扫描平面文件给我的性能差。我没有尝试上面的lz4demo,但在寻找它时,我的代码非常相似。

But when I try this approach, scanning a lz4 compressed file gives me a poor performance than scanning the flat file. I didn't try the lz4demo above, but looking for it, my code is very similar.

我发现了以下基准:
> http://skipperkongen.dk/2012/02/28/uncompressed-versus-compressed-阅读/
http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c?r=75

I have found this benchmarks: http://skipperkongen.dk/2012/02/28/uncompressed-versus-compressed-read/ http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c?r=75

真的有可能吗在未压缩的文件上读取压缩的顺序文件以提高性能?我在做什么错?

Is it really possible to improve performance reading a compressed sequential file over an uncompressed one? What am I doing wrong?

推荐答案

是的,可以通过使用压缩来改善磁盘读取。

Yes, it is possible to improve disk read by using compression.

如果您使用多线程读取器,则最有可能发生这种情况:一个线程从磁盘读取压缩数据,而另一个线程对内存中的先前压缩块进行解码。

This effect is most likely to happen if you use a multi-threaded reader : while one thread reads compressed data from disk, the other one decode the previous compressed block within memory.

考虑到LZ4的速度,解码操作可能在另一个线程完成读取下一个块之前完成。这样,您将获得与测试文件的压缩率成比例的带宽改进。

Considering the speed of LZ4, the decoding operation is likely to finish before the other thread complete reading the next block. This way, you'll achieved a bandwidth improvement, proportional to the compression ratio of the tested file.

显然,基准测试还需要考虑其他因素。例如,HDD的寻道时间比SSD大几个数量级,在恶劣环境下,它可能成为时序的主要部分,从而将带宽优势降低到零。

Obviously, there are other effects to consider when benchmarking. For example, seek times of HDD are several order of magnitude larger than SSD, and under bad circumstances, it can become the dominant part of the timing, reducing any bandwidth advantage to zero.

这篇关于读取顺序文件-压缩文件与未压缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆