使用POSIX API读取文件 [英] File read using POSIX API's

查看:496
本文介绍了使用POSIX API读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下代码,以将文件内容读入缓冲区

Consider the following piece of code for reading the contents of the file into a buffer

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define BLOCK_SIZE 4096

int main()
{
   int fd=-1;
   ssize_t bytes_read=-1;
   int i=0;
   char buff[50];
   //Arbitary size for the buffer?? How to optimise.
   //Dynamic allocation is a choice but what is the
   //right way to relate the file size to bufffer size.

   fd=open("./file-to-buff.txt",O_RDONLY);
   if(-1 == fd)
   {
      perror("Open Failed");
      return 1;
   }

   while((bytes_read=read(fd,buff,BLOCK_SIZE))>0)
   {
      printf("bytes_read=%d\n",bytes_read);
   }

   //Test to characters read from the file to buffer.The file contains "Hello"
   while(buff[i]!='\0')
   {
      printf("buff[%d]=%d\n",i,buff[i]);
      i++;
      //buff[5]=\n-How?
   }
   //buff[6]=`\0`-How?
   close(fd);
   return 0;
}

代码说明:

  • 输入文件包含字符串"Hello"
  • 此内容需要复制到缓冲区中.
  • 该目标是通过openread POSIX API实现的.
  • 读取的API使用指向任意大小的缓冲区的指针 *来复制数据.
  • The input file contains a string "Hello"
  • This content needs to be copied into the buffer.
  • The objective is acheived by open and read POSIX API's.
  • The read API uses a pointer to a buffer of an*arbitary size* to copy the data in.

问题:

  • 动态分配是用于优化缓冲区大小的方法.从输入文件大小关联/推导缓冲区大小的正确过程是什么?
  • 我看到在read操作的末尾,读取的字符除了复制了"Hello" 字符外,还复制了new line characterNULL字符.请详细说明这种读取行为.
  • Dynamic allocation is the method that must be used to optimize the size of the buffer.What is the right procedure to relate/derive the buffer size from the input file size?
  • I see at the end of the read operation the read has copied a new line character and a NULL character in addition to the characters "Hello". Please elaborate more on this behavior of read.

示例输出

bytes_read = 6

bytes_read=6

buff [0] = H

buff[0]=H

buff [1] = e

buff[1]=e

buff [2] = l

buff[2]=l

buff [3] = l

buff[3]=l

buff [4] = o

buff[4]=o

buff [5] =

buff[5]=

PS:输入文件是用户创建的文件,不是由程序创建的(使用write API).这里只提一下,以防万一.

PS: Input file is user created file not created by a program (using write API). Just to mention here, in case if it makes any difference.

推荐答案

由于要读取整个文件,因此最好的方法是使缓冲区与文件大小一样大.调整缓冲区大小没有意义.毫无根据,这只会损害性能.

Since you want to read the whole file, the best way is to make the buffer as big as the file size. There's no point in resizing the buffer as you go. That just hurts performance without good reason.

您可以通过多种方式获取文件大小.快捷方式是lseek()到文件末尾:

You can get the file size in several ways. The quick-and-dirty way is to lseek() to the end of the file:

// Get size.
off_t size = lseek(fd, 0, SEEK_END); // You should check for an error return in real code
// Seek back to the beginning.
lseek(fd, 0, SEEK_SET);
// Allocate enough to hold the whole contents plus a '\0' char.
char *buff = malloc(size + 1);

另一种方法是使用fstat()获取信息:

The other way is to get the information using fstat():

struct stat fileStat;
fstat(fd, &fileStat); // Don't forget to check for an error return in real code
// Allocate enough to hold the whole contents plus a '\0' char.
char *buff = malloc(fileStat.st_size + 1);

要获取所有所需的类型和函数原型,请确保包含所需的标头:

To get all the needed types and function prototypes, make sure you include the needed header:

#include <sys/stat.h> // For fstat()
#include <unistd.h>   // For lseek()

请注意,read()不会自动使用\0终止数据.您需要手动执行此操作,因此我们为缓冲区分配了额外的字符(大小+1).您的案例中已经有一个\0字符的原因是纯粹的随机机会.

Note that read() does not automatically terminate the data with \0. You need to do that manually, which is why we allocate an extra character (size+1) for the buffer. The reason why there's already a \0 character there in your case is pure random chance.

当然,由于buf现在是动态分配的数组,所以当您不再需要它时,不要忘记再次释放它:

Of course, since buf is now a dynamically allocated array, don't forget to free it again when you don't need it anymore:

free(buff);

但是请注意,分配与要读取的文件一样大的缓冲区可能很危险.试想一下,如果文件(错误或有意无所谓)为几GB大.对于这种情况,最好设置一个最大允许大小.但是,如果您不希望受到任何限制,则应切换到另一种读取文件的方法:mmap().使用mmap(),您可以将文件的一部分映射到内存.这样,文件的大小无关紧要,因为您一次只能处理部分文件,从而可以控制内存使用量.

Be aware though, that allocating a buffer that's as large as the file you want to read into it can be dangerous. Imagine if (by mistake or on purpose, doesn't matter) the file is several GB big. For cases like this, it's good to have a maximum allowable size in place. If you don't want any such limitations, however, then you should switch to another method of reading from files: mmap(). With mmap(), you can map parts of a file to memory. That way, it doesn't matter how big the file is, since you can work only on parts of it at a time, keeping memory usage under control.

这篇关于使用POSIX API读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆