C文件大小差异 [英] C file size discrepency

查看：101 发布时间：2020/11/6 3:45:44 c filesize file-pointer

本文介绍了C文件大小差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试学习C，目前正在研究玩具脚本. 现在，它只是打开一个文本文件，按char读取char，然后将其吐到命令行上.

I am trying to learn C, and am currently working on a toy script. Right now, it simply opens a text file, reads it char by char, and spits it out onto the command line.

我查找了如何查看文件的大小(先使用fseek()，然后使用ftell())，但是返回的结果与我遍历文件时在while循环中对字符计数所得到的数字不匹配.

I looked up how to see the size of a file (using fseek() and then ftell()), but the result it returns doesn't match up with the number I get from counting the characters in a while loop as I iterate through the file.

我想知道差异是否是由于Windows使用\ r \ n而不仅仅是\ n造成的，因为差异似乎是#newlines + 1.

I'm wondering if the discrepency is due to windows using \r\n and not just \n, since the discrepency seems to be #newlines+1.

下面是我正在处理的脚本:

Below is the script I am working on:

#include <stdio.h>
#include <stdlib.h>

int main()
{
        FILE * fp = fopen("test.txt", "r");

        fseek(fp, 0, SEEK_END);
        char * stringOfFile = malloc(ftell(fp));
        printf("allocated %d characters for file\n", ftell(fp));
        fseek(fp,0,SEEK_SET);//reset pointer

        char tmp = getc(fp); //current letter in file
        int i=0;
        while (tmp != EOF) //End-Of-File (defined in stdio.h)
        {
                *(stringOfFile+i) = tmp;
                tmp = getc(fp);
                i++;
        }
        fclose(fp);
        printf("Turns out we had %d characters to store.\nThe file was as follows:\n", i);
        printf("%s", stringOfFile);
}

我得到的输出(带有一个简单的测试文件，您可以从输出中看到)是:

And the output I get (with a simple test file you can see from the output) is:

allocated 67 characters for file
Turns out we had 60 characters to store.
The file was as follows:
line1
line2
line3
line4
line5
(last)line6

lmnopqrstuvw▬$YL Æ

打印的尾部似乎由于为字符串分配了过多的内存而造成了垃圾.

where the tail bits of the printing seem to be garbage from allocating too much memory to the string.

在此先感谢您提供的任何帮助/答案！

Thanks in advance for any help/answer you can provide!

推荐答案

如果您正在运行Windows:

If you're running windows:

FILE * fp = fopen("test.txt", "r");

以 text 模式打开文件，这意味着将\r\n转换为\n

opens the file in text mode which implies \r\n conversion to \n

因此，如果您的文件有7行，则转换将删除7个字符(即，如果文件使用的是Windows样式的行终止)

So if your file has 7 lines, the conversion removes 7 chars (that is, if the file was using Windows-style line termination)

解决方法是以二进制模式打开它

The fix is to open it in binary mode

FILE * fp = fopen("test.txt", "rb");

所以ftell和一一读取字符应该匹配.

so ftell and reading chars one by one should match.

当然，那是在浪费空间&在文本中包含\r字符不是很方便，因此您可以像执行操作那样进行分配，最后执行realloc来以实际的字符数缩小分配的内存(因为它较小，所以好)

Of course, that's wasting space & not very convenient to have \r chars in your text, so you could allocate like you're doing, and in the end perform a realloc to shrink down the allocated memory with the actual number of chars (since it's smaller, it's ok)

stringOfFile = realloc(stringOfFile,i+1);

请注意，由于考虑到需要添加nul-terminator，因此我在字符数上加了1，因此，如果文件中没有任何\r字符，则可以将块的大小增加1.

Note that since I've taken the need to add the nul-terminator into account, I've added 1 to the number of chars, so if there aren't any \r chars in the file, the realloc could increase the size of the block by 1.

因此，正如我所暗示的那样，请不要忘记对字符串进行nul终止，或者printf不能正确停止:

So, as I was hinting at, don't forget to nul-terminate your string or printf doesn't stop properly:

stringOfFile[i] = '\0';

(除非您不关心创建C字符串，因为存储字符串大小+按字符逐个显示也是正确的)

(unless you don't care about creating a C-string, since storing the string size + display char-by-char is also correct)

我们已经看到ftell方法很棘手，在某些情况下，例如当流是命令的输出时(popen返回一个FILE *，但您不能fseek它)，或者套接字，无论如何，由于我们事先不知道数据的大小，因此无法应用此原理.

We've see that the ftell method is tricky, and in some cases, when the stream is for instance the output of a command (popen returns a FILE * but you cannot fseek it) or a socket, whatever, this principle cannot be applied since we don't know the size of the data in advance.

通常情况下，最好这样做:

In the general case, it would be better to:

分配一个小缓冲区
按字符读取char并存储
如果缓冲区已满，请调用realloc将大小增加一步(不是每个字符都增加，性能会很差)
最后，再次调用realloc以更精确地调整尺寸

allocate a small buffer
read char by char and store
if buffer is full, call realloc to increase the size by some step (not at every char, performance would be bad)
in the end, call realloc again to adjust the size more precisely

(也可以透明地解决二进制/文本问题)

(that solves the binary/text issue transparently as well)

请注意，如果要处理大文件(> 4GB)，则必须使用64位无符号整数作为位置和fopen64种I/O功能(并且所有偏移量变量，如i都应为无符号) /符合ftell的返回类型，否则您在2GB时会遇到问题).好吧，我认为在处理较小的文本文件时没关系.

Note that if you're working with large files (>4GB) you have to use 64-bit unsigned integers for positions and fopen64 flavours of I/O functions (and all offset variables like i should be unsigned / conform to return type of ftell or you'll start having problems at 2GB). Well, I suppose it doesn't matter much when processing moderately small text files.

也请检查David的答案.对于文本文件，将getc的结果放在char中应该可以，但是在二进制文件的一般情况下则不行.

Also, check David answer. With text files, putting the result of getc in a char should work, but not in the general case with binary files.

这篇关于C文件大小差异的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

C文件大小差异 [英] C file size discrepency

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

C文件大小差异 [英] C file size discrepency

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭