为什么Windows使用FAT结构而不是传统的链接列表,并为每个数据块的下一个指针指向一个文件? [英] Why did Windows use the FAT structure instead of a conventional linked list with a next pointer for each data block of a file?

查看:282
本文介绍了为什么Windows使用FAT结构而不是传统的链接列表,并为每个数据块的下一个指针指向一个文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么不能像传统的链表一样存储对表中的下一个节点的引用呢?也就是说,使用下一个指针?

解决方案

这是由于对齐。 FAT (以及任何其他文件系统)将文件数据存储在一个或多个整个扇区底层存储。因为底层存储只能读取和写入整个扇区,所以这种分配可以高效地访问文件的内容。

交错问题



当一个程序想要在文件中存储某个东西的时候,它提供了一个缓冲区,比如1​​MB的数据来存储。现在,如果文件的数据扇区还必须保持 next 指向下一个扇区的指针,则这个指针信息将需要与实际用户数据交错。因此,文件系统需要建立另一个缓冲区(略多于提供的1MB),为每个输出扇区复制一些用户数据和相应的 next 指针并给出这个新的缓冲区存储。这会有点低效。除非文件系统总是将文件数据存储到新的扇区(通常不会),否则重写 next 指针也是多余的。



更大的问题是在文件上尝试读操作时。现在,文件将像磁带设备一样工作:只有文件的主要元数据中已知第一个扇区的位置,为了到达扇区1000,文件系统将需要先读取所有扇区:读扇区0,找到从加载的下一个指针指向扇区1的地址,读取扇区1等。对于每个随机I / O(假设硬盘驱动器)大约10ms的寻道时间,到达扇区1000将需要约10秒。即使扇区是按顺序排列的,而文件系统驱动程序处理扇区N的数据时,磁头将在下一个扇区上飞行,并且当发出扇区N + 1的读取时,可能为时太晚,需要磁盘整个旋转(7200 RPM驱动器为8.3ms),然后才能再次读取下一个扇区。写入单个扇区通常是原子操作(取决于硬件):断电后读回扇区返回其中的任何一个扇区旧的内容或没有中间状态的新内容。数据库应用程序通常需要知道哪些写入是原子的。如果文件系统在同一扇区中交织文件数据和元数据,则需要向应用程序报告小于实际扇区大小。例如,不是说512字节,它可能需要报告504.但它不能这样做,因为扇区大小通常被应用程序假设为2的幂。此外,存储在这样的文件系统上的文件如果被复制到另一个文件系统将很可能不可用文件系统与不同报告的扇区大小。

更好的方法

FAT格式更好,因为所有 next 指针存储在相邻的扇区中。对于FAT12,FAT16和不是很大的FAT32卷,整个表都足够小,以适应内存。 FAT仍然将链接列表中的文件块记录下来,因此为了实现高效的随机访问,实现需要缓存每个文件的链。在足够大的卷上(可以运行足够大的文件),这样的缓存可能不再适合内存。 / wiki / Ext3rel =nofollow> ext3 uses 直接和间接块。这种简单的格式避免了FAT需要的预处理的需要,并且在需要间接块时,每I / O只需要最少量的附加读取。这些额外的读取操作系统缓存,使其开销往往可以忽略不计。



其他变种也是可能的,并使用各种文件系统。
$ b

随机笔记



为了完整起见,一些硬盘驱动器可以格式化为稍大的扇区大小(比如520字节)文件系统可以在同一个扇区中打包512字节的文件数据和几个字节的元数据。然而由于上述原因,我不认为有人使用这种格式来存储文件下一个扇区的地址。这些额外的字节可以更好地使用:额外的校验和和时间戳记。我相信时间戳是用来改善一些RAID系统的性能。仍然这样的用法很少见,大多数软件根本无法使用它们。

有些文件系统可以直接将文件元数据中足够小的文件内容保存起来占领不同的部门。 ReiserFS 有争议尾部打包。这在这里并不重要:大文件仍然可以通过正确映射到存储扇区中获益。

Instead of storing references to next nodes in a table, why couldn't it be just stored like a conventional linked list, that is, with a next pointer?

解决方案

This is due to alignment. FAT (and just about any other file system) stores file data in one or more whole sectors of the underlying storage. Because the underlying storage can only read and write whole sectors such allocation allows efficient access to the contents of a file.

Issues with interleaving

When a program wants to store something in a file it provides a buffer, say 1MB of data to store. Now if the file's data sectors have to also keep next pointers to their next sector, this pointer information will need to be interleaved with the actual user data. So the file system would need to build another buffer (of slightly more than the provided 1MB), for each output sector copy some of the user data and the corresponding next pointer and give this new buffer to the storage. This would be somewhat inefficient. Unless the file system always stores file data to new sectors (and most usually don't), rewriting these next pointers will also be redundant.

The bigger problem would be when read operation is attempted on the file. Files will now work like tape devices: with only the location of the first sector known in the file's primary metadata, in order to reach sector 1000, the file system will need to read all sectors before it in order: read sector 0, find the address of sector 1 from the loaded next pointer, read sector 1, etc. With typical seek times of around 10 ms per random I/O (assuming a hard disk drive), reaching sector 1000 will take about 10 seconds. Even if sectors are sequentially ordered, while the file system driver processes sector N's data, the disk head will be flying over the next sector and when the read for sector N+1 is issued it may be too late, requiring the disk to rotate entire revolution (8.3ms for 7200 RPM drive) before being able to read the next sector again. On-disk cache can and will help with that though.

Writing single sector is usually atomic operation (depends on hardware): reading back the sector after power failure returns either its old content or the new one without intermediate states. Database applications usually need to know which writes would be atomic. If the file system interleaves file data and metadata in the same sectors, it will need to report smaller than the actual sector size to the application. For example instead of say 512 bytes it may need to report 504. But it can't do it because sector size is usually assumed by applications to be power of 2. Furthermore file stored on such filesystem would very likely be unusable if copied to another file system with different reported sector size.

Better approaches

The FAT format is better because all next pointers are stored in adjacent sectors. For FAT12, FAT16 and not very large FAT32 volumes the entire table is small enough to fit in memory. FAT still records the blocks of a file in a linked list, so to have efficient random access, an implementation needs to cache the chain per file. On large enough volumes (that can sport large enough file) such cache may no longer fit in memory.

ext3 uses direct and indirect blocks. This simple format avoids the need for preprocessing that FAT requires and goes by with only minimal amount of additional reads per I/O when indirect blocks are needed. These additional reads are cached by the operating system so that their overhead is often negligible.

Other variants are also possible and used by various file systems.

Random notes

For the sake of completeness, some hard disk drives can be formatted with slightly larger sector sizes (say 520 bytes) so that the file system can pack 512 bytes of file data with several bytes of metadata in the same sector. Yet because of the above, I don't believe anyone has used such formats for storing the address of the file's next sector. These additional bytes can be put to better use: additional checksums and timestamping come to mind. The timestamping I believe is used to improve the performance of some RAID systems. Still such usage is rare, and most software can't work with them at all.

Some file systems can save the content of small enough files in the file metadata directly without occupying distinct sectors. ReiserFS has the controversial tail packing. This is not important here: large files still benefit from having proper mapping to storage sectors.

这篇关于为什么Windows使用FAT结构而不是传统的链接列表,并为每个数据块的下一个指针指向一个文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆