C文件大小差异 [英] C file size discrepency

查看:101
本文介绍了C文件大小差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习C,目前正在研究玩具脚本. 现在,它只是打开一个文本文件,按char读取char,然后 将其吐到命令行上.

I am trying to learn C, and am currently working on a toy script. Right now, it simply opens a text file, reads it char by char, and spits it out onto the command line.

我查找了如何查看文件的大小(先使用fseek(),然后使用ftell()), 但是返回的结果与我遍历文件时在while循环中对字符计数所得到的数字不匹配.

I looked up how to see the size of a file (using fseek() and then ftell()), but the result it returns doesn't match up with the number I get from counting the characters in a while loop as I iterate through the file.

我想知道差异是否是由于Windows使用\ r \ n而不仅仅是\ n造成的,因为差异似乎是#newlines + 1.

I'm wondering if the discrepency is due to windows using \r\n and not just \n, since the discrepency seems to be #newlines+1.

下面是我正在处理的脚本:

Below is the script I am working on:

#include <stdio.h>
#include <stdlib.h>

int main()
{
        FILE * fp = fopen("test.txt", "r");

        fseek(fp, 0, SEEK_END);
        char * stringOfFile = malloc(ftell(fp));
        printf("allocated %d characters for file\n", ftell(fp));
        fseek(fp,0,SEEK_SET);//reset pointer

        char tmp = getc(fp); //current letter in file
        int i=0;
        while (tmp != EOF) //End-Of-File (defined in stdio.h)
        {
                *(stringOfFile+i) = tmp;
                tmp = getc(fp);
                i++;
        }
        fclose(fp);
        printf("Turns out we had %d characters to store.\nThe file was as follows:\n", i);
        printf("%s", stringOfFile);
}

我得到的输出(带有一个简单的测试文件,您可以从输出中看到)是:

And the output I get (with a simple test file you can see from the output) is:

allocated 67 characters for file
Turns out we had 60 characters to store.
The file was as follows:
line1
line2
line3
line4
line5
(last)line6

lmnopqrstuvw▬$YL Æ

打印的尾部似乎由于为字符串分配了过多的内存而造成了垃圾.

where the tail bits of the printing seem to be garbage from allocating too much memory to the string.

在此先感谢您提供的任何帮助/答案!

Thanks in advance for any help/answer you can provide!

推荐答案

如果您正在运行Windows:

If you're running windows:

FILE * fp = fopen("test.txt", "r");

text 模式打开文件,这意味着将\r\n转换为\n

opens the file in text mode which implies \r\n conversion to \n

因此,如果您的文件有7行,则转换将删除7个字符(即,如果文件使用的是Windows样式的行终止)

So if your file has 7 lines, the conversion removes 7 chars (that is, if the file was using Windows-style line termination)

解决方法是以二进制模式打开它

The fix is to open it in binary mode

FILE * fp = fopen("test.txt", "rb");

所以ftell和一一读取字符应该匹配.

so ftell and reading chars one by one should match.

当然,那是在浪费空间&在文本中包含\r字符不是很方便,因此您可以像执行操作那样进行分配,最后执行realloc来以实际的字符数缩小分配的内存(因为它较小,所以好)

Of course, that's wasting space & not very convenient to have \r chars in your text, so you could allocate like you're doing, and in the end perform a realloc to shrink down the allocated memory with the actual number of chars (since it's smaller, it's ok)

stringOfFile = realloc(stringOfFile,i+1);

请注意,由于考虑到需要添加nul-terminator,因此我在字符数上加了1,因此,如果文件中没有任何\r字符,则可以将块的大小增加1.

Note that since I've taken the need to add the nul-terminator into account, I've added 1 to the number of chars, so if there aren't any \r chars in the file, the realloc could increase the size of the block by 1.

因此,正如我所暗示的那样,请不要忘记对字符串进行nul终止,或者printf不能正确停止:

So, as I was hinting at, don't forget to nul-terminate your string or printf doesn't stop properly:

stringOfFile[i] = '\0';

(除非您不关心创建C字符串,因为存储字符串大小+按字符逐个显示也是正确的)

(unless you don't care about creating a C-string, since storing the string size + display char-by-char is also correct)

我们已经看到ftell方法很棘手,在某些情况下,例如当流是命令的输出时(popen返回一个FILE *,但您不能fseek它),或者套接字,无论如何,由于我们事先不知道数据的大小,因此无法应用此原理.

We've see that the ftell method is tricky, and in some cases, when the stream is for instance the output of a command (popen returns a FILE * but you cannot fseek it) or a socket, whatever, this principle cannot be applied since we don't know the size of the data in advance.

通常情况下,最好这样做:

In the general case, it would be better to:

  • 分配一个小缓冲区
  • 按字符读取char并存储
  • 如果缓冲区已满,请调用realloc将大小增加一步(不是每个字符都增加,性能会很差)
  • 最后,再次调用realloc以更精确地调整尺寸
  • allocate a small buffer
  • read char by char and store
  • if buffer is full, call realloc to increase the size by some step (not at every char, performance would be bad)
  • in the end, call realloc again to adjust the size more precisely

(也可以透明地解决二进制/文本问题)

(that solves the binary/text issue transparently as well)

请注意,如果要处理大文件(> 4GB),则必须使用64位无符号整数作为位置和fopen64种I/O功能(并且所有偏移量变量,如i都应为无符号) /符合ftell的返回类型,否则您在2GB时会遇到问题).好吧,我认为在处理较小的文本文件时没关系.

Note that if you're working with large files (>4GB) you have to use 64-bit unsigned integers for positions and fopen64 flavours of I/O functions (and all offset variables like i should be unsigned / conform to return type of ftell or you'll start having problems at 2GB). Well, I suppose it doesn't matter much when processing moderately small text files.

也请检查David的答案.对于文本文件,将getc的结果放在char中应该可以,但是在二进制文件的一般情况下则不行.

Also, check David answer. With text files, putting the result of getc in a char should work, but not in the general case with binary files.

这篇关于C文件大小差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆