使用ftell查找文件大小 [英] Use ftell to find the file size

查看:410
本文介绍了使用ftell查找文件大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  fseek(f, 0, SEEK_END); 
  size = ftell(f);

如果ftell(f)告诉我们当前文件位置,则此处的大小应为从文件末尾到开头的偏移量.为什么大小不是ftell(f)+1? ftell(f)难道不应该只给我们文件末尾的位置吗?

If ftell(f) tells us the current file position, the size here should be the offset from the end of the file to the beginning. Why is the size not ftell(f)+1? Should not ftell(f) only give us the position of the end of the file?

推荐答案

文件位置类似于文本输入小部件中的光标:它们位于文件字节之间.如果我画一幅画,这可能是最容易理解的:

File positions are like the cursor in a text entry widget: they are in between the bytes of the file. This is maybe easiest to understand if I draw a picture:

这是一个假设文件.它包含四个字符: a b c d .每个字符都有一个小方框,我们称其为字节". (此文件为ASCII.)第五个框已被划掉,因为它尚不是文件的一部分,但是,如果将第五个字符附加到文件中,它将出现.

This is a hypothetical file. It contains four characters: a, b, c, and d. Each character gets a little box to itself, which we call a "byte". (This file is ASCII.) The fifth box has been crossed out because it's not part of the file yet, but but if you appended a fifth character to the file it would spring into existence.

此文件中的有效文件位置是0、1、2、3和4.它们有五个,而不是四个;它们对应于框之前,之后和之间的垂直线.当您打开文件时(假设您不使用"a"),您将从位置0开始,位置0,即文件中第一个字节之前的行.当您寻找结尾时,您到达位置4,即文件中最后一个字节之后的行.因为我们从零开始计数,所以这也是文件中的字节数. (这是为什么我们从零开始而不是从一开始计数的几个原因之一.)

The valid file positions in this file are 0, 1, 2, 3, and 4. There are five of them, not four; they correspond to the vertical lines before, after, and in between the boxes. When you open the file (assuming you don't use "a"), you start out on position 0, the line before the first byte in the file. When you seek to the end, you arrive at position 4, the line after the last byte in the file. Because we start counting from zero, this is also the number of bytes in the file. (This is one of the several reasons why we start counting from zero, rather than one.)

我有义务警告您,原因有几个

I am obliged to warn you that there are several reasons why

fseek(fp, 0, SEEK_END);
long int nbytes = ftell(fp);

可能无法提供您真正想要的号码,具体取决于您所说的文件大小"和文件内容.排名不分先后

might not give you the number you actually want, depending on what you mean by "file size" and on the contents of the file. In no particular order:

  • 在Windows上,如果您以文本模式打开文件,则从该文件上的ftell获得的数字距文件开头的 not 字节偏移;它们更像fgetpos cookie,只能在随后的fseek调用中使用.如果您需要在Windows上的文本文件中四处查找,最好以二进制模式打开该文件,然后自己处理DOS和Unix行结尾—实际上,这实际上是我对生产代码的建议,因为在Unix系统上完全有可能具有DOS行尾的文件,反之亦然.

  • On Windows, if you open a file in text mode, the numbers you get from ftell on that file are not byte offsets from the beginning of the file; they are more like fgetpos cookies, that can only be used in a subsequent call to fseek. If you need to seek around in a text file on Windows you may be better off opening the file in binary mode and dealing with both DOS and Unix line endings yourself — this is actually my recommendation for production code in general, because it's perfectly possible to have a file with DOS line endings on a Unix system, or vice versa.

long int是32位的系统上,文件可以很容易地大于该大小,在这种情况下ftell将失败,返回− 1并将errno设置为EOVERFLOW.符合POSIX.1-2001的系统提供了称为 ftello 返回一个off_t数量,该数量表示更大的文件大小,前提是您将#define _FILE_OFFSET_BITS 64放在所有源文件的最顶部(在任何#include之前).我不知道Windows的等效语言是什么.

On systems where long int is 32 bits, files can easily be bigger than that, in which case ftell will fail, return −1 and set errno to EOVERFLOW. POSIX.1-2001-compliant systems provide a function called ftello that returns an off_t quantity that can represent larger file sizes, provided you put #define _FILE_OFFSET_BITS 64 at the very top of all your source files (before any #includes). I don't know what the Windows equivalent is.

如果您的文件包含的字符超出了ASCII,则文件中的 bytes 数很可能与其中的 characters 数不同文件. (例如,如果文件以UTF-8编码,则字符将占用3个字节,Ä将占用2个或3个字节,具体取决于它是"composed"和జ్ఞా将占用十二个字节,因为尽管是单个字形,它是由四个Unicode代码点组成的字符串.)如果您的目标是将整个文件读入malloc,则ftell(o)仍会告诉您正确的数字以传递给malloc内存,但是遍历字符"将不会像for (i = 0; i < len; i++)那样简单.

If your file contains characters that are beyond ASCII, then the number of bytes in the file is very likely to be different from the number of characters in the file. (For instance, if the file is encoded in UTF-8, the character will take up three bytes, Ä will take up either two or three bytes depending on whether it's "composed", and జ్ఞా will take up twelve bytes because, despite being a single grapheme, it's a string of four Unicode code points.) ftell(o) will still tell you the correct number to pass to malloc, if your goal is to read the entire file into memory, but iterating over "characters" will not be so simple as for (i = 0; i < len; i++).

如果您正在使用C的宽流"和宽字符",则就像Windows上的文本流一样,从该文件上的ftell获取的数字不是字节偏移量,可能没有用.除了随后对fseek的调用以外的任何内容.但是,无论如何,宽泛的流和字符是一个不好的设计.如果您坚持手工处理狭窄的流和字符,那么您实际上更有可能正确处理世界上所有的语言.

If you are using C's "wide streams" and "wide characters", then, just like text streams on Windows, the numbers you get from ftell on that file are not byte offsets and may not be useful for anything other than subsequent calls to fseek. But wide streams and characters are a bad design anyway; you're actually more likely to be able to handle all the world's languages correctly if you stick to processing UTF-8 by hand in narrow streams and characters.

这篇关于使用ftell查找文件大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆