在 64 位平台上获取非常大的 .gz 文件的文件大小 [英] get the filesize of very large .gz file on a 64bit platform

查看:16
本文介绍了在 64 位平台上获取非常大的 .gz 文件的文件大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 gz 的规范,文件大小保存在 .gz 文件的最后 4 个字节中.

According to the specifiction of gz the filesize is saved in the last 4bytes of a .gz file.

我创建了 2 个文件

dd if=/dev/urandom of=500M bs=1024 count=500000
dd if=/dev/urandom of=5G bs=1024 count=5000000

我压缩了它们

gzip 500M 5G

我检查了最后 4 个字节

I checked the last 4 bytes doing

tail -c4 500M|od -I      (returns 512000000 as expected)
tail -c4 5G|od -I        (returns 825032704 as not expected)

看来,撞到了看不见的32位屏障,使得写入ISIZE的值完全是无稽之谈.这比他们使用一些错误位更烦人.

It seems that hitting the invisible 32bit barrier, makes the value written into the ISIZE completely nonsense. Which is more annoying, than if they had used some error bit instead.

有谁知道一种方法可以从 .gz 中获取未压缩的 .gz 文件大小而不提取它?

Does anyone know of a way to get the uncompressed .gz filesize from the .gz without extracting it?

谢谢

规范:http://www.gzip.org/zlib/rfc-gzip.html

如果有人尝试,您可以使用/dev/zero 而不是/dev/urandom

edit: if anyone to try it out, you could use /dev/zero instead of /dev/urandom

推荐答案

没有.

获得压缩流的确切大小的唯一方法是实际去解压缩它(即使您将所有内容都写入/dev/null 并只计算字节数).

The only way to get the exact size of a compressed stream is to actually go and decompress it (even if you write everything to /dev/null and just count the bytes).

值得注意的是,ISIZE定义为

Its worth noting that ISIZE is defined as

ISIZE(输入尺寸)
这包含原始(未压缩)输入的大小
数据模 2^32.

ISIZE (Input SIZE)
This contains the size of the original (uncompressed) input
data modulo 2^32.

在 gzip RFC 中,所以它实际上并不破坏 在 32 位屏障上,您看到的是预期行为.

in the gzip RFC so it isn't actually breaking at the 32-bit barrier, what you're seeing is expected behavior.

这篇关于在 64 位平台上获取非常大的 .gz 文件的文件大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆