使用fseek的(文件,0,SEEK_END)与文件理解未定义行为二进制流 [英] Understanding undefined behavior for a binary stream using fseek(file, 0, SEEK_END) with a file

查看:2166
本文介绍了使用fseek的(文件,0,SEEK_END)与文件理解未定义行为二进制流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

的C规格有一个有趣的注脚(#268 C11dr§7.21.39)

The C spec has an interesting footnote (#268 C11dr §7.21.3 9)

设置文件位置指示器到档案结尾,与,已经未定义二进制流的行为(因为 fseek的(文件,0,SEEK_END)可能的结尾的空字符),或与国家有关的编码数据流的任何不确实在初始移位状态结束。

"Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state."

这是否永远适用于二进制流读取一个文件?(从物理设备)

IMO,磁盘上的二进制文件仅仅是一个字节的海洋。在我看来,一个二进制文件不能有状态相关的编码,因为它是一个的二进制的文件。我对二元面向宽流的概念模糊,如果这甚至可以应用到磁盘I / O。

IMO, a binary file on a disk is just a sea of bytes. It seems to me that a binary file could not have state-dependent encoding as it is a binary file. I'm fuzzy on the concept of "binary wide-oriented streams" and if that even could apply to disk I/O.

我看到,呼吁像一个COM端口或者一个串行流 fseek的(文件,0,SEEK_END) 标准输入可能无法得到真正的结束作为的结束的是尚未确定。因此,这个问题到物理文件狭窄。

I see that calling fseek(file, 0, SEEK_END) on a serial stream like a com port or maybe stdin may not get to the true end as the end is yet to be determined. Thus the narrowing of the question to physical files.

答:与旧关注的一个问题(也许高达80年代后期)。 presently在2014年时,Windows,具体POSIT和非异国情调的人:没有问题的。

[edit] Answer: A concern with older (maybe up to late 1980s). Presently in 2014, Windows, POSIT-specific and non-exotic others: not a problem.

@Shafik Yaghmour提供<一个很好的参考href=\"http://stackoverflow.com/questions/5957845/using-fseek-and-ftell-to-determine-the-size-of-a-file-has-a-vulnerability/5958175#5958175\">Using fseek的和FTELL确定文件的大小有漏洞?。有@Jerry棺材讨论 CP / M 为二进制文件并不总是有一个precise长度。 (每维基128字节记录)。

@Shafik Yaghmour provides a good reference in Using fseek and ftell to determine the size of a file has a vulnerability?. There @Jerry Coffin discusses CP/M as binary files not always having a precise length. (128-byte records per wiki).

由于@Keith汤普森回答的答案的肉。

Thanks to @Keith Thompson answer for the meat of the answer.

这一起解释了规范的(因为可能结尾的空字符)的评论。

Together this explains the specs's "(because of possible trailing null characters)" comment.

推荐答案

二进制文件将是8位字节序列,具有精确指定的大小,你可能会使用任何系统上。但不是的所有的系统中存储​​文件的方式,和C标准是经过精心设计,让便携性与不寻常的特性的系统。

Binary files are going to be sequences of 8-bit bytes, with an exact specified size, on any system you're likely to use. But not all systems store files that way, and the C standard is carefully designed to allow portability to systems with unusual characteristics.

例如,标准的C实现可能的操作系统,存储文件为512字节的块序列上运行,没有指示的最终块有多少字节显著。在这样的系统中,当被创建二进制文件,操作系统威力垫与零字节的最后块的剩余部分。当你从这样的文件中读取,填充字节可能要么出现在输入(即使他们从来没有明确地写入文件),否则可能会被忽略(即使创建该文件的程序可能已明确写入它们)

For example, a conforming C implementation might run on an operating system that stores files as sequences of 512-byte blocks, with no indication of how many bytes of the final block are significant. On such a system, when a binary file is created, the OS might pad the remainder of the final block with zero bytes. When you read from such a file, the padding bytes might either appear in the input (even though they were never explicitly written to the file), or they might be ignored (even though the program that created the file might have written them explicitly).

如果你从非可查找流中读取(例如键盘输入),那么 fseek的(文件,0,SEEK_END)将不只是给你一个坏的结果,它会通过返回一个非零结果指示失败。 (在POSIX兼容的系统,它返回-1并设置错误号; ISO C不要求)

If you're reading from a non-seekable stream (for example keyboard input), then fseek(file, 0, SEEK_END) won't just give you a bad result, it will indicate failure by returning a non-zero result. (On POSIX-compliant systems, it returns -1 and sets errno; ISO C doesn't require that.)

在大多数系统中, fseek的(文件,0,SEEK_END)上一个二进制文件要么寻求到文件的实际结束(位置由到底有多少确定字节都被写入到文件),或返回一个明确的故障指示。如果您在使用特定的POSIX的功能,无论如何,你可以放心地假设此行为;你或许可以为Windows同样的假设和其他一些系统。如果你希望你的code至100%移植到异国情调的系统,你不应该假定二进制文件不会有额外的零字节填充。

On most systems, fseek(file, 0, SEEK_END) on a binary file will either seek to the actual end of the file (a position determined by exactly how many bytes were written to the file), or return a clear failure indication. If you're using POSIX-specific features anyway, you can safely assume this behavior; you can probably make the same assumption for Windows and a number of other systems. If you want your code to be 100% portable to exotic systems, you shouldn't assume that binary files won't be padded with extra zero bytes.

这篇关于使用fseek的(文件,0,SEEK_END)与文件理解未定义行为二进制流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆