以一种安全,高效的方式复制文件 [英] Copy a file in a sane, safe and efficient way
问题描述
我搜索一个好的方法来复制文件(二进制或文本)。我写了几个样本,大家工作。但我想听听经验丰富的程序员的意见。
我错过了很好的例子,并且使用C ++工作。
ANSI-C-WAY
#include< iostream&
#include< cstdio> // fopen,fclose,fread,fwrite,BUFSIZ
#include< ctime>
using namespace std;
int main(){
clock_t start,end;
start = clock();
// BUFSIZE默认值为8192字节
// BUFSIZE为1表示一个chareter在时间
//好的值应该适合块大小,如1024或4096
//更高的值减少系统调用的数量
// size_t BUFFER_SIZE = 4096;
char buf [BUFSIZ];
size_t size;
FILE * source = fopen(from.ogv,rb);
FILE * dest = fopen(to.ogv,wb);
//清除和更安全
//如果流的文件指示符结束,feof(FILE * stream)返回非零
while size = fread(buf,1,BUFSIZ,source)){
fwrite(buf,1,size,dest);
}
fclose(source);
fclose(dest);
end = clock();
cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;
return 0;
}
POSIX-WAY (K& R使用此在C编程语言,更低级)
#include< iostream>
#include< fcntl.h> // open
#include< unistd.h> //读,写,关闭
#include< cstdio> // BUFSIZ
#include< ctime>
using namespace std;
int main(){
clock_t start,end;
start = clock();
// BUFSIZE默认为8192
// BUFSIZE为1表示一个chareter在时间
//好的值应该适合块大小,例如1024或4096
/ /更高的值减少系统调用的数量
// size_t BUFFER_SIZE = 4096;
char buf [BUFSIZ];
size_t size;
int source = open(from.ogv,O_RDONLY,0);
int dest = open(to.ogv,O_WRONLY | O_CREAT / * | O_TRUNC / ** /,0644)
while((size = read(source,buf,BUFSIZ))> 0){
write(dest,buf,size);
}
close(source);
close(dest);
end = clock();
cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;
return 0;
}
KISS-C ++ - Streambuffer-WAY p>
#include< iostream>
#include< fstream>
#include< ctime>
using namespace std;
int main(){
clock_t start,end;
start = clock();
ifstream source(from.ogv,ios :: binary);
ofstream dest(to.ogv,ios :: binary);
dest<< source.rdbuf();
source.close();
dest.close();
end = clock();
cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;
return 0;
}
COPY-ALGORITHM-C ++ - WAY p>
#include< iostream>
#include< fstream>
#include< ctime>
#include< algorithm>
#include< iterator>
using namespace std;
int main(){
clock_t start,end;
start = clock();
ifstream source(from.ogv,ios :: binary);
ofstream dest(to.ogv,ios :: binary);
istreambuf_iterator< char> begin_source(source);
istreambuf_iterator< char> end_source;
ostreambuf_iterator< char> begin_dest(dest);
copy(begin_source,end_source,begin_dest);
source.close();
dest.close();
end = clock();
cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)<< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;
return 0;
}
OWN-BUFFER-C ++ - WAY p>
#include< iostream>
#include< fstream>
#include< ctime>
using namespace std;
int main(){
clock_t start,end;
start = clock();
ifstream source(from.ogv,ios :: binary);
ofstream dest(to.ogv,ios :: binary);
//文件大小
source.seekg(0,ios :: end);
ifstream :: pos_type size = source.tellg();
source.seekg(0);
//为缓冲区分配内存
char * buffer = new char [size];
//复制文件
source.read(buffer,size);
dest.write(buffer,size);
//清除
delete [] buffer;
source.close();
dest.close();
end = clock();
cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;
return 0;
}
LINUX-WAY 2.6.33
#include< iostream>
#include< sys / sendfile.h> // sendfile
#include< fcntl.h> // open
#include< unistd.h> // close
#include< sys / stat.h> // fstat
#include< sys / types.h> // fstat
#include< ctime>
using namespace std;
int main(){
clock_t start,end;
start = clock();
int source = open(from.ogv,O_RDONLY,0);
int dest = open(to.ogv,O_WRONLY | O_CREAT / * | O_TRUNC / ** /,0644);
// struct required,reasone:function stat()exists also
struct stat stat_source;
fstat(source,& stat_source);
sendfile(dest,source,0,stat_source.st_size);
close(source);
close(dest);
end = clock();
cout<< CLOCKS_PER_SEC< CLOCKS_PER_SEC<< \\\
;
cout<< CPU-TIME START<开始<< \\\
;
cout<< CPU-TIME END<<端<< \\\
;
cout<< CPU-TIME END-START< end-start<< \\\
;
cout<< TIME(SEC)< static_cast< double>(end-start)/ CLOCKS_PER_SEC< \\\
;
return 0;
}
环境
- GNU / LINUX(Archlinux)
- Kernel 3.3
- GLIBC-2.15,LIBSTDC ++ 4.7 (GCC-LIBS),GCC 4.7,Coreutils 8.16
- 使用RUNLEVEL 3(多用户,网络,终端,无GUI)
- INTEL SSD- 80 GB,最高可达50%
- 复制270 MB OGG-视频文件
重现步骤
1. $ rm from.ogg
2. $ reboot #内核和文件系统缓冲区是正常的
3. $(time ./program)&>> report.txt#执行程序,重定向程序的输出并附加到文件
4. $ sha256sum * .ogv#checksum
5. $ rm to.ogg#删除副本,但没有同步,内核和文件系统使用缓冲区
6. $(time ./program)&>> report.txt#执行程序,重定向程序的输出并附加到文件
TIME使用)
计划说明UNBUFFEREDED | BUFFERED
ANSI C(fread / frwite)490,000 |
POSIX(K& R,读/写)450,000 | 230,000
FSTREAM(KISS,Streambuffer)500,000 | 270,000
FSTREAM(算法,副本)500,000 | 270,000
FSTREAM -BUFFER)500,000 | 340,000
SENDFILE(原生LINUX,sendfile)410,000 | 200,000
不会改变。
sha256sum打印相同的结果。
视频文件仍然可以播放。
问题
- 您认为我的代码有什么错误吗?
-
FSTREAM(KISS,Streambuffer)
我真的很喜欢这个,因为它真的很短,简单。我知道运营商<<是重载的rdbuf()和不转换任何东西。正确?
感谢
我以这种方式更改了所有示例中的源,文件描述符的打开和关闭都包含在 clock()的度量中。它们在源代码中没有其他重大变化。结果没有改变!我也使用时间来仔细检查我的结果。
更新2
ANSI C样本已更改: while-loop 不再调用feof(),而是将 fread()移入条件。它看起来像,代码运行现在10,000时钟更快。
测量改变:前面的结果总是被缓冲,因为我重复了旧的命令行 rm to.ogv&同步&&时间./program 。现在我重新启动系统的每个程序。没有缓冲的结果是新的,没有惊喜。没有缓冲的结果没有改变。
如果我不删除旧的副本,程序反应不同。使用POSIX和SENDFILE覆盖现有文件已缓冲更快,所有其他程序都较慢。可能 truncate 或创建选项会对此行为产生影响。但是用相同的副本覆盖现有文件不是真实的用例。
使用 cp 执行复制需要0.44秒未缓冲和0.30秒缓冲。因此, cp 比POSIX示例慢一点。看起来对我好。
也可以添加 mmap()和 copy_file()
from boost :: filesystem。
更新3
在博客页面上扩展它一点。包括 splice(),这是来自Linux内核的低级函数。也许更多的样本与Java将跟随。
http://www.ttyhoney.com/blog/?page_id=69
以相同的方式复制档案:
int main()
{
std :: ifstream src(from.ogv,std :: ios :: binary);
std :: ofstream dst(to.ogv,std :: ios :: binary);
dst<< src.rdbuf();
}
这么简单直观,值得额外付出。如果我们做了很多,更好地回到操作系统调用文件系统。我相信 boost
在其文件系统类中有一个副本文件方法。
有一个与文件系统交互的C方法:
#include< copyfile.h>
int
copyfile(const char * from,const char * to,copyfile_state_t state,copyfile_flags_t flags);
I search for a good way to copy a file (binary or text). I've written several samples, everyone works. But I want hear the opinion of seasoned programmers.
I missing good examples and search a way which works with C++.
ANSI-C-WAY
#include <iostream>
#include <cstdio> // fopen, fclose, fread, fwrite, BUFSIZ
#include <ctime>
using namespace std;
int main() {
clock_t start, end;
start = clock();
// BUFSIZE default is 8192 bytes
// BUFSIZE of 1 means one chareter at time
// good values should fit to blocksize, like 1024 or 4096
// higher values reduce number of system calls
// size_t BUFFER_SIZE = 4096;
char buf[BUFSIZ];
size_t size;
FILE* source = fopen("from.ogv", "rb");
FILE* dest = fopen("to.ogv", "wb");
// clean and more secure
// feof(FILE* stream) returns non-zero if the end of file indicator for stream is set
while (size = fread(buf, 1, BUFSIZ, source)) {
fwrite(buf, 1, size, dest);
}
fclose(source);
fclose(dest);
end = clock();
cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
cout << "CPU-TIME START " << start << "\n";
cout << "CPU-TIME END " << end << "\n";
cout << "CPU-TIME END - START " << end - start << "\n";
cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";
return 0;
}
POSIX-WAY (K&R use this in "The C programming language", more low-level)
#include <iostream>
#include <fcntl.h> // open
#include <unistd.h> // read, write, close
#include <cstdio> // BUFSIZ
#include <ctime>
using namespace std;
int main() {
clock_t start, end;
start = clock();
// BUFSIZE defaults to 8192
// BUFSIZE of 1 means one chareter at time
// good values should fit to blocksize, like 1024 or 4096
// higher values reduce number of system calls
// size_t BUFFER_SIZE = 4096;
char buf[BUFSIZ];
size_t size;
int source = open("from.ogv", O_RDONLY, 0);
int dest = open("to.ogv", O_WRONLY | O_CREAT /*| O_TRUNC/**/, 0644);
while ((size = read(source, buf, BUFSIZ)) > 0) {
write(dest, buf, size);
}
close(source);
close(dest);
end = clock();
cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
cout << "CPU-TIME START " << start << "\n";
cout << "CPU-TIME END " << end << "\n";
cout << "CPU-TIME END - START " << end - start << "\n";
cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";
return 0;
}
KISS-C++-Streambuffer-WAY
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;
int main() {
clock_t start, end;
start = clock();
ifstream source("from.ogv", ios::binary);
ofstream dest("to.ogv", ios::binary);
dest << source.rdbuf();
source.close();
dest.close();
end = clock();
cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
cout << "CPU-TIME START " << start << "\n";
cout << "CPU-TIME END " << end << "\n";
cout << "CPU-TIME END - START " << end - start << "\n";
cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";
return 0;
}
COPY-ALGORITHM-C++-WAY
#include <iostream>
#include <fstream>
#include <ctime>
#include <algorithm>
#include <iterator>
using namespace std;
int main() {
clock_t start, end;
start = clock();
ifstream source("from.ogv", ios::binary);
ofstream dest("to.ogv", ios::binary);
istreambuf_iterator<char> begin_source(source);
istreambuf_iterator<char> end_source;
ostreambuf_iterator<char> begin_dest(dest);
copy(begin_source, end_source, begin_dest);
source.close();
dest.close();
end = clock();
cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
cout << "CPU-TIME START " << start << "\n";
cout << "CPU-TIME END " << end << "\n";
cout << "CPU-TIME END - START " << end - start << "\n";
cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";
return 0;
}
OWN-BUFFER-C++-WAY
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;
int main() {
clock_t start, end;
start = clock();
ifstream source("from.ogv", ios::binary);
ofstream dest("to.ogv", ios::binary);
// file size
source.seekg(0, ios::end);
ifstream::pos_type size = source.tellg();
source.seekg(0);
// allocate memory for buffer
char* buffer = new char[size];
// copy file
source.read(buffer, size);
dest.write(buffer, size);
// clean up
delete[] buffer;
source.close();
dest.close();
end = clock();
cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
cout << "CPU-TIME START " << start << "\n";
cout << "CPU-TIME END " << end << "\n";
cout << "CPU-TIME END - START " << end - start << "\n";
cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";
return 0;
}
LINUX-WAY // requires kernel >= 2.6.33
#include <iostream>
#include <sys/sendfile.h> // sendfile
#include <fcntl.h> // open
#include <unistd.h> // close
#include <sys/stat.h> // fstat
#include <sys/types.h> // fstat
#include <ctime>
using namespace std;
int main() {
clock_t start, end;
start = clock();
int source = open("from.ogv", O_RDONLY, 0);
int dest = open("to.ogv", O_WRONLY | O_CREAT /*| O_TRUNC/**/, 0644);
// struct required, rationale: function stat() exists also
struct stat stat_source;
fstat(source, &stat_source);
sendfile(dest, source, 0, stat_source.st_size);
close(source);
close(dest);
end = clock();
cout << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << "\n";
cout << "CPU-TIME START " << start << "\n";
cout << "CPU-TIME END " << end << "\n";
cout << "CPU-TIME END - START " << end - start << "\n";
cout << "TIME(SEC) " << static_cast<double>(end - start) / CLOCKS_PER_SEC << "\n";
return 0;
}
Environment
- GNU/LINUX (Archlinux)
- Kernel 3.3
- GLIBC-2.15, LIBSTDC++ 4.7 (GCC-LIBS), GCC 4.7, Coreutils 8.16
- Using RUNLEVEL 3 (Multiuser, Network, Terminal, no GUI)
- INTEL SSD-Postville 80 GB, filled up to 50%
- Copy a 270 MB OGG-VIDEO-FILE
Steps to reproduce
1. $ rm from.ogg
2. $ reboot # kernel and filesystem buffers are in regular
3. $ (time ./program) &>> report.txt # executes program, redirects output of program and append to file
4. $ sha256sum *.ogv # checksum
5. $ rm to.ogg # remove copy, but no sync, kernel and fileystem buffers are used
6. $ (time ./program) &>> report.txt # executes program, redirects output of program and append to file
Results (CPU TIME used)
Program Description UNBUFFERED|BUFFERED
ANSI C (fread/frwite) 490,000|260,000
POSIX (K&R, read/write) 450,000|230,000
FSTREAM (KISS, Streambuffer) 500,000|270,000
FSTREAM (Algorithm, copy) 500,000|270,000
FSTREAM (OWN-BUFFER) 500,000|340,000
SENDFILE (native LINUX, sendfile) 410,000|200,000
Filesize doesn't change.
sha256sum print the same results.
The video file is still playable.
Questions
- What method would you prefer?
- Do you know better solutions?
- Do you see any mistakes in my code?
Do you know a reason to avoid a solution?
FSTREAM (KISS, Streambuffer)
I really like this one, because it is really short and simple. As far is I know the operator << is overloaded for rdbuf() and doesn't convert anything. Correct?
Thanks
Update 1
I changed the source in all samples in that way, that the open and close of the file descriptors is include in the measurement of clock(). Their are no other significant changes in the source code. The results doesn't changed! I also used time to double-check my results.
Update 2
ANSI C sample changed: The condition of the while-loop doesn't call any longer feof() instead I moved fread() into the condition. It looks like, the code runs now 10,000 clocks faster.
Measurement changed: The former results were always buffered, because I repeated the old command line rm to.ogv && sync && time ./program for each program a few times. Now I reboot the system for every program. The unbuffered results are new and show no surprise. The unbuffered results didn't changed really.
If i don't delete the old copy, the programs react different. Overwriting a existing file buffered is faster with POSIX and SENDFILE, all other programs are slower. Maybe the options truncate or create have a impact on this behaviour. But overwriting existing files with the same copy is not a real world use-case.
Performing the copy with cp takes 0.44 seconds unbuffered und 0.30 seconds buffered. So cp is a little bit slower than the POSIX sample. Looks fine for me.
Maybe I add also samples and results of mmap() and copy_file()
from boost::filesystem.
Update 3
I've put this also on a blog page and extended it a little bit. Including splice(), which is a low-level function from the Linux kernel. Maybe more samples with Java will follow.
http://www.ttyhoney.com/blog/?page_id=69
Copy a file in a sane way:
int main()
{
std::ifstream src("from.ogv", std::ios::binary);
std::ofstream dst("to.ogv", std::ios::binary);
dst << src.rdbuf();
}
This is so simple and intuitive to read it is worth the extra cost. If we were doing it a lot, better to fall back on OS calls to the file system. I am sure boost
has a copy file method in its filesystem class.
There is a C method for interacting with the file system:
#include <copyfile.h>
int
copyfile(const char *from, const char *to, copyfile_state_t state, copyfile_flags_t flags);
这篇关于以一种安全,高效的方式复制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!