使用sendfile()以线程或其他高效的复制文件方法复制文件 [英] Use sendfile() to copy file with threads or other efficient copy file method

查看:248
本文介绍了使用sendfile()以线程或其他高效的复制文件方法复制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用Linux系统调用 sendfile()使用线程复制文件。

I'm trying to use the Linux system call sendfile() to copy a file using threads.

我有兴趣优化代码的这些部分:

I'm interested in optimizing these parts of the code:

fseek(fin, size * (number) / MAX_THREADS, SEEK_SET);  
fseek(fout, size * (number) / MAX_THREADS, SEEK_SET); 
/* ... */
fwrite(buff, 1, len, fout);  

代码

void* FileOperate::FileCpThread::threadCp(void *param)
{
    Info *ft = (Info *)param;
    FILE *fin = fopen(ft->fromfile, "r+");
    FILE *fout = fopen(ft->tofile, "w+");

    int size = getFileSize(ft->fromfile);

    int number =  ft->num;
    fseek(fin, size * (number) / MAX_THREADS, SEEK_SET);
    fseek(fout, size * (number) / MAX_THREADS, SEEK_SET);

    char buff[1024] = {'\0'};
    int len = 0;
    int total = 0;

    while((len = fread(buff, 1, sizeof(buff), fin)) > 0)
    {
        fwrite(buff, 1, len, fout);
        total += len;

        if(total > size/MAX_THREADS)
        {
            break;
        }
    }

    fclose(fin);
    fclose(fout);
}


推荐答案

文件复制不受CPU限制;如果你可能发现限制是在内核级别,你无法在用户级别上并行化。

File copying is not CPU bound; if it were you're likely to find that the limitation is at the kernel level and nothing you can do at the user leve would parallelize it.

这样的改进机械驱动器实际上将降低吞吐量。

Such "improvements" done on mechanical drives will in fact degrade the throughput. You're wasting time seeking along the file instead of reading and writing it.

如果文件很长,并且不希望随时需要读取或写入数据,很快,打开时可能会使用 O_DIRECT 标志。这是一个坏主意,因为 O_DIRECT API本质上是

If the file is long and you don't expect to need the read or written data anytime soon, it might be tempting to use the O_DIRECT flag on open. That's a bad idea, since the O_DIRECT API is essentially broken by design.

而是应该在源文件和目标文件上使用 posix_fadvise 与POSIX_FADV_SEQUENTIAL和POSIX_FADV_NOREUSE标志。在write(或sendfile)调用完成后,您需要建议不再需要数据 - 传递POSIX_FADV_DONTNEED。这样,页面缓存只用于保持数据流动所需的程度,并且一旦数据被消耗(写入磁盘),页面就会被回收。

Instead, you should use posix_fadvise on both source and destination files, with POSIX_FADV_SEQUENTIAL and POSIX_FADV_NOREUSE flags. After the write (or sendfile) call is finished, you need to advise that the data is not needed anymore - pass POSIX_FADV_DONTNEED. That way the page cache will only be used to the extent needed to keep the data flowing, and the pages will be recycled as soon as the data has been consumed (written to disk).

sendfile 不会将文件数据推送到用户空间,因此进一步放宽了内存和处理器缓存的一些压力。这是关于复制非设备特定文件的唯一其他明显改进。

The sendfile will not push file data over to the user space, so it further relaxes some of the pressure from memory and processor cache. That's about the only other sensible improvement you can make for copying of files that's not device-specific.

选择合理的块大小也是可取的。考虑到现代驱动器推送超过100Mbytes / s,您可能希望一次推送一个兆字节,并且总是4096字节页面大小的倍数,因此(4096 * 256)是在单个 sendfile 读取 / 中处理的正常起始块大小

Choosing a sensible chunk size is also desirable. Given that modern drives push over a 100Mbytes/s, you might want to push a megabyte at a time, and always a multiple of the 4096 byte page size - thus (4096*256) is a decent starting chunk size to handle in a single sendfile or read/write calls.

根据您的建议读取并行化仅对RAID 0卷有意义,并且只有当输入和输出文件跨接时才有意义物理磁盘。然后,您可以按照由文件跨接的源卷和目标卷物理磁盘数量的较小值来拥有一个线程。这只有在你不使用异步文件I / O时才需要。使用异步I / O,你不需要多于一个线程,尤其是如果块大小很大(兆字节+),单线程的延迟惩罚是可以忽略的。

Read parallelization, as you propose it, only makes sense on RAID 0 volumes, and only when both the input and output files straddle the physical disks. You can then have one thread per the lesser of the number of source and destination volume physical disks straddled by the file. That's only necessary if you're not using asynchronous file I/O. With async I/O you wouldn't need more than one thread anyway, especially not if the chunk sizes are large (megabyte+) and the single-thread latency penalty is negligible.

对于SSD上的单个文件副本的并行化没有任何意义,除非你确实在一些非常奇怪的系统上。

There's no sense for parallelization of a single file copy on SSDs, unless you were on some very odd system indeed.

这篇关于使用sendfile()以线程或其他高效的复制文件方法复制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆