以一种安全,高效的方式复制文件 [英] Copy a file in a sane, safe and efficient way

查看:156
本文介绍了以一种安全,高效的方式复制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索一个好的方法来复制文件(二进制或文本)。我写了几个样本,大家工作。但我想听听经验丰富的程序员的意见。



我错过了很好的例子,并且使用C ++工作。



ANSI-C-WAY

  #include< iostream& 
#include< cstdio> // fopen,fclose,fread,fwrite,BUFSIZ
#include< ctime>
using namespace std;

int main(){
clock_t start,end;
start = clock();

// BUFSIZE默认值为8192字节
// BUFSIZE为1表示一个chareter在时间
//好的值应该适合块大小,如1024或4096
//更高的值减少系统调用的数量
// size_t BUFFER_SIZE = 4096;

char buf [BUFSIZ];
size_t size;

FILE * source = fopen(from.ogv,rb);
FILE * dest = fopen(to.ogv,wb);

//清除和更安全
//如果流的文件指示符结束,feof(FILE * stream)返回非零

while size = fread(buf,1,BUFSIZ,source)){
fwrite(buf,1,size,dest);
}

fclose(source);
fclose(dest);

end = clock();

cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;

return 0;
}

POSIX-WAY (K& R使用此在C编程语言,更低级)

  #include< iostream> 
#include< fcntl.h> // open
#include< unistd.h> //读,写,关闭
#include< cstdio> // BUFSIZ
#include< ctime>
using namespace std;

int main(){
clock_t start,end;
start = clock();

// BUFSIZE默认为8192
// BUFSIZE为1表示一个chareter在时间
//好的值应该适合块大小,例如1024或4096
/ /更高的值减少系统调用的数量
// size_t BUFFER_SIZE = 4096;

char buf [BUFSIZ];
size_t size;

int source = open(from.ogv,O_RDONLY,0);
int dest = open(to.ogv,O_WRONLY | O_CREAT / * | O_TRUNC / ** /,0644)

while((size = read(source,buf,BUFSIZ))> 0){
write(dest,buf,size);
}

close(source);
close(dest);

end = clock();

cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;

return 0;
}

KISS-C ++ - Streambuffer-WAY p>

  #include< iostream> 
#include< fstream>
#include< ctime>
using namespace std;

int main(){
clock_t start,end;
start = clock();

ifstream source(from.ogv,ios :: binary);
ofstream dest(to.ogv,ios :: binary);

dest<< source.rdbuf();

source.close();
dest.close();

end = clock();

cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;

return 0;
}

COPY-ALGORITHM-C ++ - WAY p>

  #include< iostream> 
#include< fstream>
#include< ctime>
#include< algorithm>
#include< iterator>
using namespace std;

int main(){
clock_t start,end;
start = clock();

ifstream source(from.ogv,ios :: binary);
ofstream dest(to.ogv,ios :: binary);

istreambuf_iterator< char> begin_source(source);
istreambuf_iterator< char> end_source;
ostreambuf_iterator< char> begin_dest(dest);
copy(begin_source,end_source,begin_dest);

source.close();
dest.close();

end = clock();

cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)<< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;

return 0;
}

OWN-BUFFER-C ++ - WAY p>

  #include< iostream> 
#include< fstream>
#include< ctime>
using namespace std;

int main(){
clock_t start,end;
start = clock();

ifstream source(from.ogv,ios :: binary);
ofstream dest(to.ogv,ios :: binary);

//文件大小
source.seekg(0,ios :: end);
ifstream :: pos_type size = source.tellg();
source.seekg(0);
//为缓冲区分配内存
char * buffer = new char [size];

//复制文件
source.read(buffer,size);
dest.write(buffer,size);

//清除
delete [] buffer;
source.close();
dest.close();

end = clock();

cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;

return 0;
}

LINUX-WAY 2.6.33

  #include< iostream> 
#include< sys / sendfile.h> // sendfile
#include< fcntl.h> // open
#include< unistd.h> // close
#include< sys / stat.h> // fstat
#include< sys / types.h> // fstat
#include< ctime>
using namespace std;

int main(){
clock_t start,end;
start = clock();

int source = open(from.ogv,O_RDONLY,0);
int dest = open(to.ogv,O_WRONLY | O_CREAT / * | O_TRUNC / ** /,0644);

// struct required,reasone:function stat()exists also
struct stat stat_source;
fstat(source,& stat_source);

sendfile(dest,source,0,stat_source.st_size);

close(source);
close(dest);

end = clock();

cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;

return 0;
}

环境


  • GNU / LINUX(Archlinux)

  • Kernel 3.3

  • GLIBC-2.15,LIBSTDC ++ 4.7 (GCC-LIBS),GCC 4.7,Coreutils 8.16

  • 使用RUNLEVEL 3(多用户,网络,终端,无GUI)

  • INTEL SSD- 80 GB,最高可达50%

  • 复制270 MB OGG-视频文件



重现步骤

  1. $ rm from.ogg 
2. $ reboot #内核和文件系统缓冲区是正常的
3. $(time ./program)&>> report.txt#执行程序,重定向程序的输出并附加到文件
4. $ sha256sum * .ogv#checksum
5. $ rm to.ogg#删除副本,但没有同步,内核和文件系统使用缓冲区
6. $(time ./program)&>> report.txt#执行程序,重定向程序的输出并附加到文件

TIME使用)

 计划说明UNBUFFEREDED | BUFFERED 
ANSI C(fread / frwite)490,000 |
POSIX(K& R,读/写)450,000 | 230,000
FSTREAM(KISS,Streambuffer)500,000 | 270,000
FSTREAM(算法,副本)500,000 | 270,000
FSTREAM -BUFFER)500,000 | 340,000
SENDFILE(原生LINUX,sendfile)410,000 | 200,000

不会改变。

sha256sum打印相同的结果。

视频文件仍然可以播放。



问题





  • 您认为我的代码有什么错误吗?


  • FSTREAM(KISS,Streambuffer)

    我真的很喜欢这个,因为它真的很短,简单。我知道运营商<<是重载的rdbuf()和不转换任何东西。正确?




感谢





我以这种方式更改了所有示例中的源,文件描述符的打开和关闭都包含在 clock()的度量中。它们在源代码中没有其他重大变化。结果没有改变!我也使用时间来仔细检查我的结果。



更新2

ANSI C样本已更改: while-loop 不再调用feof(),而是将 fread()移入条件。它看起来像,代码运行现在10,000时钟更快。



测量改变:前面的结果总是被缓冲,因为我重复了旧的命令行 rm to.ogv&同步&&时间./program 。现在我重新启动系统的每个程序。没有缓冲的结果是新的,没有惊喜。没有缓冲的结果没有改变。



如果我不删除旧的副本,程序反应不同。使用POSIX和SENDFILE覆盖现有文件已缓冲更快,所有其他程序都较慢。可能 truncate 创建选项会对此行为产生影响。但是用相同的副本覆盖现有文件不是真实的用例。



使用 cp 执行复制需要0.44秒未缓冲和0.30秒缓冲。因此, cp 比POSIX示例慢一点。看起来对我好。



也可以添加 mmap() copy_file() from boost :: filesystem。



更新3

在博客页面上扩展它一点。包括 splice(),这是来自Linux内核的低级函数。也许更多的样本与Java将跟随。
http://www.ttyhoney.com/blog/?page_id=69

解决方案

以相同的方式复制档案:

  int main()
{
std :: ifstream src(from.ogv,std :: ios :: binary);
std :: ofstream dst(to.ogv,std :: ios :: binary);

dst<< src.rdbuf();
}

这么简单直观,值得额外付出。如果我们做了很多,更好地回到操作系统调用文件系统。我相信 boost 在其文件系统类中有一个副本文件方法。



有一个与文件系统交互的C方法:

  #include< copyfile.h> 

int
copyfile(const char * from,const char * to,copyfile_state_t state,copyfile_flags_t flags);


I search for a good way to copy a file (binary or text). I've written several samples, everyone works. But I want hear the opinion of seasoned programmers.

I missing good examples and search a way which works with C++.

ANSI-C-WAY

#include <iostream>
#include <cstdio>    // fopen, fclose, fread, fwrite, BUFSIZ
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    // BUFSIZE default is 8192 bytes
    // BUFSIZE of 1 means one chareter at time
    // good values should fit to blocksize, like 1024 or 4096
    // higher values reduce number of system calls
    // size_t BUFFER_SIZE = 4096;

    char buf[BUFSIZ];
    size_t size;

    FILE* source = fopen("from.ogv", "rb");
    FILE* dest = fopen("to.ogv", "wb");

    // clean and more secure
    // feof(FILE* stream) returns non-zero if the end of file indicator for stream is set

    while (size = fread(buf, 1, BUFSIZ, source)) {
        fwrite(buf, 1, size, dest);
    }

    fclose(source);
    fclose(dest);

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " << end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

POSIX-WAY (K&R use this in "The C programming language", more low-level)

#include <iostream>
#include <fcntl.h>   // open
#include <unistd.h>  // read, write, close
#include <cstdio>    // BUFSIZ
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    // BUFSIZE defaults to 8192
    // BUFSIZE of 1 means one chareter at time
    // good values should fit to blocksize, like 1024 or 4096
    // higher values reduce number of system calls
    // size_t BUFFER_SIZE = 4096;

    char buf[BUFSIZ];
    size_t size;

    int source = open("from.ogv", O_RDONLY, 0);
    int dest = open("to.ogv", O_WRONLY | O_CREAT /*| O_TRUNC/**/, 0644);

    while ((size = read(source, buf, BUFSIZ)) > 0) {
        write(dest, buf, size);
    }

    close(source);
    close(dest);

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " << end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

KISS-C++-Streambuffer-WAY

#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    ifstream source("from.ogv", ios::binary);
    ofstream dest("to.ogv", ios::binary);

    dest << source.rdbuf();

    source.close();
    dest.close();

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " <<  end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

COPY-ALGORITHM-C++-WAY

#include <iostream>
#include <fstream>
#include <ctime>
#include <algorithm>
#include <iterator>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    ifstream source("from.ogv", ios::binary);
    ofstream dest("to.ogv", ios::binary);

    istreambuf_iterator<char> begin_source(source);
    istreambuf_iterator<char> end_source;
    ostreambuf_iterator<char> begin_dest(dest); 
    copy(begin_source, end_source, begin_dest);

    source.close();
    dest.close();

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " <<  end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

OWN-BUFFER-C++-WAY

#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    ifstream source("from.ogv", ios::binary);
    ofstream dest("to.ogv", ios::binary);

    // file size
    source.seekg(0, ios::end);
    ifstream::pos_type size = source.tellg();
    source.seekg(0);
    // allocate memory for buffer
    char* buffer = new char[size];

    // copy file    
    source.read(buffer, size);
    dest.write(buffer, size);

    // clean up
    delete[] buffer;
    source.close();
    dest.close();

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " <<  end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

LINUX-WAY // requires kernel >= 2.6.33

#include <iostream>
#include <sys/sendfile.h>  // sendfile
#include <fcntl.h>         // open
#include <unistd.h>        // close
#include <sys/stat.h>      // fstat
#include <sys/types.h>     // fstat
#include <ctime>
using namespace std;

int main() {
    clock_t start, end;
    start = clock();

    int source = open("from.ogv", O_RDONLY, 0);
    int dest = open("to.ogv", O_WRONLY | O_CREAT /*| O_TRUNC/**/, 0644);

    // struct required, rationale: function stat() exists also
    struct stat stat_source;
    fstat(source, &stat_source);

    sendfile(dest, source, 0, stat_source.st_size);

    close(source);
    close(dest);

    end = clock();

    cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
    cout << "CPU-TIME START " << start << "\n";
    cout << "CPU-TIME END " << end << "\n";
    cout << "CPU-TIME END - START " <<  end - start << "\n";
    cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";

    return 0;
}

Environment

  • GNU/LINUX (Archlinux)
  • Kernel 3.3
  • GLIBC-2.15, LIBSTDC++ 4.7 (GCC-LIBS), GCC 4.7, Coreutils 8.16
  • Using RUNLEVEL 3 (Multiuser, Network, Terminal, no GUI)
  • INTEL SSD-Postville 80 GB, filled up to 50%
  • Copy a 270 MB OGG-VIDEO-FILE

Steps to reproduce

 1. $ rm from.ogg
 2. $ reboot                           # kernel and filesystem buffers are in regular
 3. $ (time ./program) &>> report.txt  # executes program, redirects output of program and append to file
 4. $ sha256sum *.ogv                  # checksum
 5. $ rm to.ogg                        # remove copy, but no sync, kernel and fileystem buffers are used
 6. $ (time ./program) &>> report.txt  # executes program, redirects output of program and append to file

Results (CPU TIME used)

Program  Description                 UNBUFFERED|BUFFERED
ANSI C   (fread/frwite)                 490,000|260,000  
POSIX    (K&R, read/write)              450,000|230,000  
FSTREAM  (KISS, Streambuffer)           500,000|270,000 
FSTREAM  (Algorithm, copy)              500,000|270,000
FSTREAM  (OWN-BUFFER)                   500,000|340,000  
SENDFILE (native LINUX, sendfile)       410,000|200,000  

Filesize doesn't change.
sha256sum print the same results.
The video file is still playable.

Questions

  • What method would you prefer?
  • Do you know better solutions?
  • Do you see any mistakes in my code?
  • Do you know a reason to avoid a solution?

  • FSTREAM (KISS, Streambuffer)
    I really like this one, because it is really short and simple. As far is I know the operator << is overloaded for rdbuf() and doesn't convert anything. Correct?

Thanks

Update 1
I changed the source in all samples in that way, that the open and close of the file descriptors is include in the measurement of clock(). Their are no other significant changes in the source code. The results doesn't changed! I also used time to double-check my results.

Update 2
ANSI C sample changed: The condition of the while-loop doesn't call any longer feof() instead I moved fread() into the condition. It looks like, the code runs now 10,000 clocks faster.

Measurement changed: The former results were always buffered, because I repeated the old command line rm to.ogv && sync && time ./program for each program a few times. Now I reboot the system for every program. The unbuffered results are new and show no surprise. The unbuffered results didn't changed really.

If i don't delete the old copy, the programs react different. Overwriting a existing file buffered is faster with POSIX and SENDFILE, all other programs are slower. Maybe the options truncate or create have a impact on this behaviour. But overwriting existing files with the same copy is not a real world use-case.

Performing the copy with cp takes 0.44 seconds unbuffered und 0.30 seconds buffered. So cp is a little bit slower than the POSIX sample. Looks fine for me.

Maybe I add also samples and results of mmap() and copy_file() from boost::filesystem.

Update 3
I've put this also on a blog page and extended it a little bit. Including splice(), which is a low-level function from the Linux kernel. Maybe more samples with Java will follow. http://www.ttyhoney.com/blog/?page_id=69

解决方案

Copy a file in a sane way:

int main()
{
    std::ifstream  src("from.ogv", std::ios::binary);
    std::ofstream  dst("to.ogv",   std::ios::binary);

    dst << src.rdbuf();
}

This is so simple and intuitive to read it is worth the extra cost. If we were doing it a lot, better to fall back on OS calls to the file system. I am sure boost has a copy file method in its filesystem class.

There is a C method for interacting with the file system:

#include <copyfile.h>

int
copyfile(const char *from, const char *to, copyfile_state_t state, copyfile_flags_t flags);

这篇关于以一种安全,高效的方式复制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆