以理智、安全和有效的方式复制文件 [英] Copy a file in a sane, safe and efficient way

查看:31
本文介绍了以理智、安全和有效的方式复制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种复制文件(二进制或文本)的好方法.我写了几个样本,每个人都在工作.但我想听听经验丰富的程序员的意见.

I search for a good way to copy a file (binary or text). I've written several samples, everyone works. But I want hear the opinion of seasoned programmers.

我错过了很好的例子并寻找了一种适用于 C++ 的方法.

I missing good examples and search a way which works with C++.

ANSI-C-WAY

#include <iostream>
#include <cstdio>    // fopen, fclose, fread, fwrite, BUFSIZ
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    // BUFSIZE default is 8192 bytes
    // BUFSIZE of 1 means one chareter at time
    // good values should fit to blocksize, like 1024 or 4096
    // higher values reduce number of system calls
    // size_t BUFFER_SIZE = 4096;

    char buf[BUFSIZ];
    size_t size;

    FILE* source = fopen("from.ogv", "rb");
    FILE* dest = fopen("to.ogv", "wb");

    // clean and more secure
    // feof(FILE* stream) returns non-zero if the end of file indicator for stream is set

    while (size = fread(buf, 1, BUFSIZ, source)) {
        fwrite(buf, 1, size, dest);
    }

    fclose(source);
    fclose(dest);

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "
";
    cout << "CPU-TIME START " << start << "
";
    cout << "CPU-TIME END " << end << "
";
    cout << "CPU-TIME END - START " << end - start << "
";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "
";

    return 0;
}

POSIX-WAY(K&R 在C 编程语言"中使用它,更底层)

POSIX-WAY (K&R use this in "The C programming language", more low-level)

#include <iostream>
#include <fcntl.h>   // open
#include <unistd.h>  // read, write, close
#include <cstdio>    // BUFSIZ
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    // BUFSIZE defaults to 8192
    // BUFSIZE of 1 means one chareter at time
    // good values should fit to blocksize, like 1024 or 4096
    // higher values reduce number of system calls
    // size_t BUFFER_SIZE = 4096;

    char buf[BUFSIZ];
    size_t size;

    int source = open("from.ogv", O_RDONLY, 0);
    int dest = open("to.ogv", O_WRONLY | O_CREAT /*| O_TRUNC/**/, 0644);

    while ((size = read(source, buf, BUFSIZ)) > 0) {
        write(dest, buf, size);
    }

    close(source);
    close(dest);

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "
";
    cout << "CPU-TIME START " << start << "
";
    cout << "CPU-TIME END " << end << "
";
    cout << "CPU-TIME END - START " << end - start << "
";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "
";

    return 0;
}

KISS-C++-Streambuffer-WAY

#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    ifstream source("from.ogv", ios::binary);
    ofstream dest("to.ogv", ios::binary);

    dest << source.rdbuf();

    source.close();
    dest.close();

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "
";
    cout << "CPU-TIME START " << start << "
";
    cout << "CPU-TIME END " << end << "
";
    cout << "CPU-TIME END - START " <<  end - start << "
";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "
";

    return 0;
}

COPY-ALGORITHM-C++-WAY

#include <iostream>
#include <fstream>
#include <ctime>
#include <algorithm>
#include <iterator>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    ifstream source("from.ogv", ios::binary);
    ofstream dest("to.ogv", ios::binary);

    istreambuf_iterator<char> begin_source(source);
    istreambuf_iterator<char> end_source;
    ostreambuf_iterator<char> begin_dest(dest); 
    copy(begin_source, end_source, begin_dest);

    source.close();
    dest.close();

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "
";
    cout << "CPU-TIME START " << start << "
";
    cout << "CPU-TIME END " << end << "
";
    cout << "CPU-TIME END - START " <<  end - start << "
";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "
";

    return 0;
}

OWN-BUFFER-C++-WAY

#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    ifstream source("from.ogv", ios::binary);
    ofstream dest("to.ogv", ios::binary);

    // file size
    source.seekg(0, ios::end);
    ifstream::pos_type size = source.tellg();
    source.seekg(0);
    // allocate memory for buffer
    char* buffer = new char[size];

    // copy file    
    source.read(buffer, size);
    dest.write(buffer, size);

    // clean up
    delete[] buffer;
    source.close();
    dest.close();

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "
";
    cout << "CPU-TIME START " << start << "
";
    cout << "CPU-TIME END " << end << "
";
    cout << "CPU-TIME END - START " <<  end - start << "
";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "
";

    return 0;
}

LINUX-WAY//需要内核 >= 2.6.33

LINUX-WAY // requires kernel >= 2.6.33

#include <iostream>
#include <sys/sendfile.h>  // sendfile
#include <fcntl.h>         // open
#include <unistd.h>        // close
#include <sys/stat.h>      // fstat
#include <sys/types.h>     // fstat
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    int source = open("from.ogv", O_RDONLY, 0);
    int dest = open("to.ogv", O_WRONLY | O_CREAT /*| O_TRUNC/**/, 0644);

    // struct required, rationale: function stat() exists also
    struct stat stat_source;
    fstat(source, &stat_source);

    sendfile(dest, source, 0, stat_source.st_size);

    close(source);
    close(dest);

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "
";
    cout << "CPU-TIME START " << start << "
";
    cout << "CPU-TIME END " << end << "
";
    cout << "CPU-TIME END - START " <<  end - start << "
";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "
";

    return 0;
}

环境

  • GNU/LINUX (Archlinux)
  • 内核 3.3
  • GLIBC-2.15、LIBSTDC++ 4.7 (GCC-LIBS)、GCC 4.7、Coreutils 8.16
  • 使用 RUNLEVEL 3(多用户、网络、终端、无 GUI)
  • INTEL SSD-Postville 80 GB,已满 50%
  • 复制 270 MB 的 OGG-VIDEO-FILE

重现步骤

 1. $ rm from.ogg
 2. $ reboot                           # kernel and filesystem buffers are in regular
 3. $ (time ./program) &>> report.txt  # executes program, redirects output of program and append to file
 4. $ sha256sum *.ogv                  # checksum
 5. $ rm to.ogg                        # remove copy, but no sync, kernel and fileystem buffers are used
 6. $ (time ./program) &>> report.txt  # executes program, redirects output of program and append to file

结果(使用的 CPU 时间)

Program  Description                 UNBUFFERED|BUFFERED
ANSI C   (fread/frwite)                 490,000|260,000  
POSIX    (K&R, read/write)              450,000|230,000  
FSTREAM  (KISS, Streambuffer)           500,000|270,000 
FSTREAM  (Algorithm, copy)              500,000|270,000
FSTREAM  (OWN-BUFFER)                   500,000|340,000  
SENDFILE (native LINUX, sendfile)       410,000|200,000  

文件大小不会改变.
sha256sum 打印相同的结果.
视频文件仍可播放.

Filesize doesn't change.
sha256sum print the same results.
The video file is still playable.

问题

  • 您更喜欢哪种方法?
  • 您知道更好的解决方案吗?
  • 您是否发现我的代码中有任何错误?
  • 你知道避免解决方案的理由吗?

  • What method would you prefer?
  • Do you know better solutions?
  • Do you see any mistakes in my code?
  • Do you know a reason to avoid a solution?

FSTREAM (KISS, Streambuffer)
我真的很喜欢这个,因为它真的很短很简单.据我所知,运营商 <<为 rdbuf() 重载并且不转换任何东西.正确吗?

FSTREAM (KISS, Streambuffer)
I really like this one, because it is really short and simple. As far is I know the operator << is overloaded for rdbuf() and doesn't convert anything. Correct?

谢谢

更新 1
我以这种方式更改了所有示例中的源,文件描述符的打开和关闭包含在 clock() 的测量中.它们的源代码没有其他重大变化.结果没变!我还使用时间来仔细检查我的结果.

Update 1
I changed the source in all samples in that way, that the open and close of the file descriptors is include in the measurement of clock(). Their are no other significant changes in the source code. The results doesn't changed! I also used time to double-check my results.

更新 2
ANSI C 示例已更改:while 循环 的条件不再调用 feof() 而是我将 fread() 移动到健康)状况.看起来,代码现在运行速度快了 10,000 个时钟.

Update 2
ANSI C sample changed: The condition of the while-loop doesn't call any longer feof() instead I moved fread() into the condition. It looks like, the code runs now 10,000 clocks faster.

测量改变:以前的结果总是被缓冲,因为我重复了旧的命令行rm to.ogv &&同步&&time ./program 为每个程序几次.现在我为每个程序重新启动系统.无缓冲的结果是新的,没有任何意外.无缓冲的结果并没有真正改变.

Measurement changed: The former results were always buffered, because I repeated the old command line rm to.ogv && sync && time ./program for each program a few times. Now I reboot the system for every program. The unbuffered results are new and show no surprise. The unbuffered results didn't changed really.

如果我不删除旧副本,程序的反应就会不同.使用 POSIX 和 SENDFILE 覆盖现有文件缓冲更快,所有其他程序都更慢.也许选项 truncatecreate 对这种行为有影响.但是用相同的副本覆盖现有文件并不是现实世界的用例.

If i don't delete the old copy, the programs react different. Overwriting a existing file buffered is faster with POSIX and SENDFILE, all other programs are slower. Maybe the options truncate or create have a impact on this behaviour. But overwriting existing files with the same copy is not a real world use-case.

使用 cp 执行复制需要 0.44 秒无缓冲和 0.30 秒缓冲.所以 cp 比 POSIX 示例慢一点.对我来说看起来不错.

Performing the copy with cp takes 0.44 seconds unbuffered und 0.30 seconds buffered. So cp is a little bit slower than the POSIX sample. Looks fine for me.

也许我还添加了来自 boost::filesystem 的 mmap()copy_file() 的示例和结果.

Maybe I add also samples and results of mmap() and copy_file() from boost::filesystem.

更新 3
我也把它放在一个博客页面上并稍微扩展了它.包括 splice(),它是来自 Linux 内核的低级函数.也许会有更多的 Java 示例.http://www.ttyhoney.com/blog/?page_id=69

Update 3
I've put this also on a blog page and extended it a little bit. Including splice(), which is a low-level function from the Linux kernel. Maybe more samples with Java will follow. http://www.ttyhoney.com/blog/?page_id=69

推荐答案

以理智的方式复制文件:

Copy a file in a sane way:

#include <fstream>

int main()
{
    std::ifstream  src("from.ogv", std::ios::binary);
    std::ofstream  dst("to.ogv",   std::ios::binary);

    dst << src.rdbuf();
}

阅读起来非常简单直观,值得付出额外的代价.如果我们经常这样做,最好依靠操作系统对文件系统的调用.我确信 boost 在其文件系统类中有一个复制文件方法.

This is so simple and intuitive to read it is worth the extra cost. If we were doing it a lot, better to fall back on OS calls to the file system. I am sure boost has a copy file method in its filesystem class.

有一种与文件系统交互的C方法:

There is a C method for interacting with the file system:

#include <copyfile.h>

int
copyfile(const char *from, const char *to, copyfile_state_t state, copyfile_flags_t flags);

这篇关于以理智、安全和有效的方式复制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆