在 C 中尝试和真正简单的文件复制代码? [英] Tried and true simple file copying code in C?

查看:21
本文介绍了在 C 中尝试和真正简单的文件复制代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这看起来像一个简单的问题,但我在这里没有找到类似的东西.

This looks like a simple question, but I didn't find anything similar here.

由于C中没有文件复制功能,所以我们必须自己实现文件复制,但我不喜欢为这种琐碎的事情重新发明轮子,所以我想问问云:

Since there is no file copy function in C, we have to implement file copying ourselves, but I don't like reinventing the wheel even for trivial stuff like that, so I'd like to ask the cloud:

  1. 对于使用 fopen()/fread()/fwrite() 进行文件复制,您会推荐什么代码?
    • 对于使用 open()/read()/write() 进行文件复制,您会推荐什么代码?

此代码应该是可移植的(windows/mac/linux/bsd/qnx/younameit)、稳定、经过时间测试、快速、内存高效等.欢迎进入特定系统的内部以获取更多性能(例如获取文件系统簇大小).

This code should be portable (windows/mac/linux/bsd/qnx/younameit), stable, time tested, fast, memory efficient and etc. Getting into specific system's internals to squeeze some more performance is welcomed (like getting filesystem cluster size).

这似乎是一个微不足道的问题,但例如,CP 命令的源代码不是 10 行 C 代码.

This seems like a trivial question but, for example, source code for CP command isn't 10 lines of C code.

推荐答案

就实际的 I/O 而言,我已经编写了一百万次以各种形式将数据从一个流复制到另一个流的代码类似于这.成功时返回 0,错误时返回 -1 并设置 errno(在这种情况下,可能已复制任意数量的字节).

As far as the actual I/O goes, the code I've written a million times in various guises for copying data from one stream to another goes something like this. It returns 0 on success, or -1 with errno set on error (in which case any number of bytes might have been copied).

请注意,对于复制常规文件,您可以跳过 EAGAIN 内容,因为常规文件总是阻塞 I/O.但不可避免地,如果您编写此代码,有人会在其他类型的文件描述符上使用它,所以将其视为免费赠品.

Note that for copying regular files, you can skip the EAGAIN stuff, since regular files are always blocking I/O. But inevitably if you write this code, someone will use it on other types of file descriptors, so consider it a freebie.

GNU cp 有一个特定于文件的优化,我在这里没有打扰,对于 0 字节的长块而不是编写你只需通过寻找结束.

There's a file-specific optimisation that GNU cp does, which I haven't bothered with here, that for long blocks of 0 bytes instead of writing you just extend the output file by seeking off the end.

void block(int fd, int event) {
    pollfd topoll;
    topoll.fd = fd;
    topoll.events = event;
    poll(&topoll, 1, -1);
    // no need to check errors - if the stream is bust then the
    // next read/write will tell us
}

int copy_data_buffer(int fdin, int fdout, void *buf, size_t bufsize) {
    for(;;) {
       void *pos;
       // read data to buffer
       ssize_t bytestowrite = read(fdin, buf, bufsize);
       if (bytestowrite == 0) break; // end of input
       if (bytestowrite == -1) {
           if (errno == EINTR) continue; // signal handled
           if (errno == EAGAIN) {
               block(fdin, POLLIN);
               continue;
           }
           return -1; // error
       }

       // write data from buffer
       pos = buf;
       while (bytestowrite > 0) {
           ssize_t bytes_written = write(fdout, pos, bytestowrite);
           if (bytes_written == -1) {
               if (errno == EINTR) continue; // signal handled
               if (errno == EAGAIN) {
                   block(fdout, POLLOUT);
                   continue;
               }
               return -1; // error
           }
           bytestowrite -= bytes_written;
           pos += bytes_written;
       }
    }
    return 0; // success
}

// Default value. I think it will get close to maximum speed on most
// systems, short of using mmap etc. But porters / integrators
// might want to set it smaller, if the system is very memory
// constrained and they don't want this routine to starve
// concurrent ops of memory. And they might want to set it larger
// if I'm completely wrong and larger buffers improve performance.
// It's worth trying several MB at least once, although with huge
// allocations you have to watch for the linux 
// "crash on access instead of returning 0" behaviour for failed malloc.
#ifndef FILECOPY_BUFFER_SIZE
    #define FILECOPY_BUFFER_SIZE (64*1024)
#endif

int copy_data(int fdin, int fdout) {
    // optional exercise for reader: take the file size as a parameter,
    // and don't use a buffer any bigger than that. This prevents 
    // memory-hogging if FILECOPY_BUFFER_SIZE is very large and the file
    // is small.
    for (size_t bufsize = FILECOPY_BUFFER_SIZE; bufsize >= 256; bufsize /= 2) {
        void *buffer = malloc(bufsize);
        if (buffer != NULL) {
            int result = copy_data_buffer(fdin, fdout, buffer, bufsize);
            free(buffer);
            return result;
        }
    }
    // could use a stack buffer here instead of failing, if desired.
    // 128 bytes ought to fit on any stack worth having, but again
    // this could be made configurable.
    return -1; // errno is ENOMEM
}

打开输入文件:

int fdin = open(infile, O_RDONLY|O_BINARY, 0);
if (fdin == -1) return -1;

打开输出文件很棘手.作为基础,您希望:

Opening the output file is tricksy. As a basis, you want:

int fdout = open(outfile, O_WRONLY|O_BINARY|O_CREAT|O_TRUNC, 0x1ff);
if (fdout == -1) {
    close(fdin);
    return -1;
}

但也有混杂因素:

  • 当文件相同时你需要特殊情况,我不记得如何便携.
  • 如果输出文件名是目录,您可能需要将文件复制到目录中.
  • 如果输出文件已经存在(使用 O_EXCL 打开以确定这一点并检查 EEXIST 是否有错误),您可能想要做一些不同的事情,就像 cp -i 所做的那样.
  • 您可能希望输出文件的权限反映输入文件的权限.
  • 您可能希望复制其他特定于平台的元数据.
  • 您可能希望也可能不希望在出错时取消链接输出文件.

显然,所有这些问题的答案都可能是与 cp 相同".在这种情况下,原始问题的答案是忽略我或其他任何人所说的一切,并使用 cp 的来源".

Obviously the answers to all these questions could be "do the same as cp". In which case the answer to the original question is "ignore everything I or anyone else has said, and use the source of cp".

顺便说一句,获取文件系统的集群大小几乎没有用.在超过磁盘块的大小很久之后,您几乎总是会看到速度随着缓冲区大小的增加而增加.

Btw, getting the filesystem's cluster size is next to useless. You'll almost always see speed increasing with buffer size long after you've passed the size of a disk block.

这篇关于在 C 中尝试和真正简单的文件复制代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆