使用带有文件的 fseek(file, 0, SEEK_END) 了解二进制流的未定义行为 [英] Understanding undefined behavior for a binary stream using fseek(file, 0, SEEK_END) with a file

查看:34
本文介绍了使用带有文件的 fseek(file, 0, SEEK_END) 了解二进制流的未定义行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C 规范有一个有趣的脚注(#268 C11dr §7.21.3 9)

The C spec has an interesting footnote (#268 C11dr §7.21.3 9)

将文件位置指示器设置为文件结尾,与 fseek(file, 0, SEEK_END) 一样,对于二进制流具有未定义的行为(因为可能出现尾随空字符)或对于任何具有状态相关编码但不一定以初始移位状态结束的流."

"Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state."

这是否适用于读取文件的二进制流?(从物理设备)

IMO,磁盘上的二进制文件只是字节的海洋.在我看来,二进制文件不能具有与状态相关的编码,因为它是一个 binary 文件.我对面向二进制宽流"的概念以及它是否适用于磁盘 I/O 的概念很模糊.

IMO, a binary file on a disk is just a sea of bytes. It seems to me that a binary file could not have state-dependent encoding as it is a binary file. I'm fuzzy on the concept of "binary wide-oriented streams" and if that even could apply to disk I/O.

我看到在串行流(如 com 端口)或 stdin 上调用 fseek(file, 0, SEEK_END) 可能无法达到真正的目的,因为 end 尚未确定.因此将问题缩小到物理文件.

I see that calling fseek(file, 0, SEEK_END) on a serial stream like a com port or maybe stdin may not get to the true end as the end is yet to be determined. Thus the narrowing of the question to physical files.

[edit] 答案:对老年人的担忧(可能直到 1980 年代后期).目前在 2014 年,Windows、POSIT 特定和非外来其他:不是问题.

[edit] Answer: A concern with older (maybe up to late 1980s). Presently in 2014, Windows, POSIT-specific and non-exotic others: not a problem.

@Shafik Yaghmour 在 使用 fseek 和 ftell 确定文件大小是否存在漏洞?.@Jerry Coffin 讨论 CP/M 作为二进制文件并不总是具有精确的长度.(每个 wiki 128 字节的记录).

@Shafik Yaghmour provides a good reference in Using fseek and ftell to determine the size of a file has a vulnerability?. There @Jerry Coffin discusses CP/M as binary files not always having a precise length. (128-byte records per wiki).

感谢@Keith Thompson 的回答.

Thanks to @Keith Thompson answer for the meat of the answer.

这共同解释了规范的(因为可能出现尾随空字符)"注释.

Together this explains the specs's "(because of possible trailing null characters)" comment.

推荐答案

在您可能使用的任何系统上,二进制文件都将是具有精确指定大小的 8 位字节序列.但并非所有系统都以这种方式存储文件,C 标准经过精心设计,可移植到具有不寻常特征的系统.

Binary files are going to be sequences of 8-bit bytes, with an exact specified size, on any system you're likely to use. But not all systems store files that way, and the C standard is carefully designed to allow portability to systems with unusual characteristics.

例如,一个符合标准的 C 实现可能在将文件存储为 512 字节块序列的操作系统上运行,而没有指示最终块中有多少字节是重要的.在这样的系统上,当创建二进制文件时,操作系统可能会用零字节填充最终块的其余部分.当您从此类文件中读取时,填充字节可能会出现在输入中(即使它们从未显式写入文件),也可能会被忽略(即使创建文件的程序可能已显式写入它们).

For example, a conforming C implementation might run on an operating system that stores files as sequences of 512-byte blocks, with no indication of how many bytes of the final block are significant. On such a system, when a binary file is created, the OS might pad the remainder of the final block with zero bytes. When you read from such a file, the padding bytes might either appear in the input (even though they were never explicitly written to the file), or they might be ignored (even though the program that created the file might have written them explicitly).

如果您正在从不可搜索的流中读取数据(例如键盘输入),那么 fseek(file, 0, SEEK_END) 不仅会给您带来不好的结果,它还会指示通过返回非零结果而失败.(在 POSIX 兼容的系统上,它返回 -1 并设置 errno;ISO C 不需要.)

If you're reading from a non-seekable stream (for example keyboard input), then fseek(file, 0, SEEK_END) won't just give you a bad result, it will indicate failure by returning a non-zero result. (On POSIX-compliant systems, it returns -1 and sets errno; ISO C doesn't require that.)

在大多数系统上,二进制文件上的 fseek(file, 0, SEEK_END) 将寻找文件的实际末尾(由写入文件的确切字节数确定的位置),或返回明确的失败指示.如果您仍然使用 POSIX 特定的功能,您可以放心地假设这种行为;您可能可以对 Windows 和许多其他系统做出相同的假设.如果您希望您的代码 100% 可移植到外来系统,则不应假设二进制文件不会被额外的零字节填充.

On most systems, fseek(file, 0, SEEK_END) on a binary file will either seek to the actual end of the file (a position determined by exactly how many bytes were written to the file), or return a clear failure indication. If you're using POSIX-specific features anyway, you can safely assume this behavior; you can probably make the same assumption for Windows and a number of other systems. If you want your code to be 100% portable to exotic systems, you shouldn't assume that binary files won't be padded with extra zero bytes.

这篇关于使用带有文件的 fseek(file, 0, SEEK_END) 了解二进制流的未定义行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆