什么是稀疏文件,为什么我们需要它? [英] What is a sparse file and why do we need it?

查看:399
本文介绍了什么是稀疏文件,为什么我们需要它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是稀疏文件,为什么我们需要它?
我唯一能得到的是它是一个非常大的文件,并且效率很高(以GB为单位)。效率如何?

What is a sparse file and why do we need it? The only thing that I am able to get is that it is a very large file and it is efficient(in gigabytes). How is it efficient ?

推荐答案

假设您的文件中有许多空字节 \x00 。这些许多空字节 \x00 被称为空洞。存储空字节只是效率不高,我们知道文件中有许多空字节,那么为什么将它们存储在存储设备上呢?相反,我们可以存储描述那些零的元数据。当进程读取文件时,零字节块是动态生成的,而不是存储在物理存储中(请参见Wikipedia的示意图):

Say you have a file with many empty bytes \x00. These many empty bytes \x00 are called holes. Storing empty bytes is just not efficient, we know there are many of them in the file, so why store them on the storage device? We could instead store metadata describing those zeros. When a process reads the file those zero byte blocks get generated dynamically as opposed to being stored on physical storage (look at this schematic from Wikipedia):

这是稀疏文件之所以有效的原因,因为它不将零存储在磁盘上,而是保存了足够的数据来描述零被生成。

This is why a sparse file is efficient, because it does not store the zeros on disk, instead it holds enough data describing the zeros that will be generated.

注意:逻辑文件的大小大于稀疏文件的物理文件的大小。这是因为我们尚未将零物理存储在存储设备上。

Note: the logical file size is greater than the physical file size for sparse files. This is because we have not stored the zeros physically on a storage device.

编辑:

运行时:

$ dd if=/dev/zero of=output bs=1G count=4

此处命令将4G空字节块复制到输出。要查看以下内容:

The command here copies 4G blocks of null bytes to output. To see that:

$ stat output
File: ouput
  Size: 4294967296      Blocks: 8388616    IO Block: 4096   regular file
--omitted--

您可以看到此文件具有 8388616 块分配给它们,这些块只存储从 / dev / zero 复制的空字节,它们确实占用了物理空间磁盘空间,它们是存储在磁盘上的孔(稀疏零)。 dd 完成了您要的操作,将数据块从一个文件复制到另一个文件。

You can see that this file has 8388616 blocks allocated to it, these blocks store nothing but empty bytes copied from /dev/zero and they do occupy physical disk space, they're holes stored on disk (sparse zeros). dd did what you asked for, copying blocks of data from one file to another.

现在,运行以下命令以检测孔并使文件稀疏就位:

Now, run this command to detect the holes and make the file sparse in-place:

$ fallocate -d output
$ stat output
File: swapfile
  Size: 4294967296      Blocks: 0          IO Block: 4096   regular file
--omitted--

您注意到了吗?现在的块数为0,因为仅存储空字节的块已取消分配。请记住,输出的块什么也不存储,只有一堆空零, fallocate -d 检测到包含以下内容的块由于此文件的所有块均包含零,因此仅将空零释放,并将它们释放。

Do you notice something? The the number of blocks now is 0 because the blocks that were storing only empty bytes were de-allocated. Remember, output's blocks store nothing, only a bunch of empty zeros, fallocate -d detected the blocks that contain only empty zeros and deallocated them, since all the blocks for this file contain zeros, they were all de-allocated.

还要注意大小如何保持不变。这是文件的逻辑(虚拟)大小,而不是磁盘上的大小。至关重要的是要知道 output 现在不占用 physical 存储空间,它分配了0个块,因此我并没有真正使用磁盘空间。运行 fallocate -d 后保留的大小,因此当您以后从文件中读取时,会在运行时获得文件系统为您生成的空字节。 输出的物理大小为零,它不使用数据块。

Also notice how the size remained the same. This is the logical (virtual) size of the file, not its size on disk. It's crucial to know that output doesn't occupy physical storage space now, it has 0 blocks allocated to it and thus I doesn't really use disk space. The size preserved after running fallocate -d so when you later read from the file, you get the empty bytes generated to you by the filesystem at runtime. The physical size of output however, is zero, it uses no data blocks.

请记住,当您读取 output 文件时,空字节是由文件系统在运行时动态生成的,它们不是实际上是物理存储在磁盘上的, stat 报告的文件大小是逻辑大小,而输出的物理大小为零。在这种情况下,当进程读取文件时,文件系统必须生成4G的空字节。

Remember, when you read output file the empty bytes are generated by the filesystem at runtime dynamically, they're not really physically stored on disk, and the file's size as reported by stat is the logical size, and the physical size is zero for output. In this case the filesystem has to generate 4G of empty bytes when a process reads the file.

使用 dd

$ dd if=/dev/zero of=output2 bs=1G seek=0 count=0
$ stat 
stat output2
  File: output2
  Size: 4294967296      Blocks: 0          IO Block: 4096   regular file

GNU dd 内部使用 lseek ftruncate ,因此检查truncate(2)和lseek(2)。

GNU dd internally uses lseek and ftruncate, so check truncate(2) and lseek(2).

这篇关于什么是稀疏文件,为什么我们需要它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆