并发使用多线程写入一个文件 [英] Concurrent writes to a file using multiple threads

查看:658
本文介绍了并发使用多线程写入一个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有打开使用标记 O_WRONLY文件中的用户级程序| O_SYNC 。程序创建256个线程其中尝试写入256个或更多个字节数据的每一给该文件。我想有一个共有128万的请求,使得它一共有大约300 MB的数据。一旦1280000请求已经完成程序结束。

I have a userlevel program which opens a file using the flags O_WRONLY|O_SYNC. The program creates 256 threads which attempt to write 256 or more bytes of data each to the file. I want to have a total of 1280000 requests, making it a total of about 300 MB of data. The program ends once 1280000 requests have been completed.

我用 pthread_spin_trylock()来增加其跟踪已完成的请求数量的变量。为了确保每个线程写入一个唯一偏移,我使用 PWRITE()和计算偏移作为已经被写入请求的数目的函数。因此,当实际写入文件(没有这个方法保证数据的完整性?)

I use pthread_spin_trylock() to increment a variable which keeps track of the number of requests that have been completed. To ensure that each thread writes to a unique offset, I use pwrite() and calculate the offset as a function of the number of requests that have been written already. Hence, I don't use any mutex when actually writing to the file (does this approach ensure data integrity?)

当我检查该 PWRITE()呼叫被阻的平均时间和相应的数字(即平均Q2C倍 - 这是的措施时间BIOS的整个生命周期)为使用找到 blktrace ,我觉得是有显著差异。事实上,平均完成时间为一个给定的BIO比 PWRITE()呼叫的平均延迟大得多。这是什么差距背后的原因是什么?应该不是这些数字是自 O_SYNC 相似确保了返回前的数据实际写入到物理媒介?

When I check the average time for which the pwrite() call was blocked and the corresponding numbers (i.e., the average Q2C times -- which is the measure of the times for the complete life cycle of BIOs) as found using blktrace, I find that there is a significant difference. In fact, the average completion time for a given BIO is much greater than the average latency of a pwrite() call. What is the reason behind this discrepancy? Shouldn't these numbers be similar since O_SYNC ensures that the data is actually written to the physical medium before returning?

推荐答案

PWRITE()是假设是原子,所以你应该是安全有...

pwrite() is suppose to be atomic, so you should be safe there ...

在问候你的系统调用和实际BIO之间的延迟的区别,根据上的男人的页面在kernel.org 开放(2):

In regards to the difference in latency between your syscall and the actual BIO, according to this information on the man-pages at kernel.org for open(2):

POSIX提供同步我的三个不同的变种/ O,
  相应
         该标志O_SYNC,O_DSYNC和O_RSYNC。目前(2.6.31)
  只有Linux
         实现了O_SYNC,但glibc的映射O_DSYNC和O_RSYNC到
  相同的数字
         值O_SYNC。大多数Linux文件系统实际上并不
  实施POSIX
         O_SYNC语义,这需要一个写的所有元数据更新
  要在磁盘上
         在返回到用户空间,但只有O_DSYNC语义,
  仅需要
         实际的文件的数据和必要的检索它是对元数据
  盘由
         时间在系统调用返回。

POSIX provides for three different variants of synchronized I/O, corresponding to the flags O_SYNC, O_DSYNC, and O_RSYNC. Currently (2.6.31), Linux only implements O_SYNC, but glibc maps O_DSYNC and O_RSYNC to the same numerical value as O_SYNC. Most Linux file systems don't actually implement the POSIX O_SYNC semantics, which require all metadata updates of a write to be on disk on returning to userspace, but only the O_DSYNC semantics, which require only actual file data and metadata necessary to retrieve it to be on disk by the time the system call returns.

因此​​,这基本上意味着与 O_SYNC 标记你试图写入数据的全部不需要系统调用返回前被刷新到磁盘,但而刚好足够的信息能够的检索的它从磁盘...这取决于你在写什么,这可能是比你打算向磁盘写入数据的整个缓冲区少了不少,因此,所有的数据将在稍后的时间,该系统调用已经完成,该进程已经转移到别的东西。之后的实际书写

So this basically implies that with the O_SYNC flag the entirety of the data you're attempting to write does not need to be flushed to disk before a syscall returns, but rather just enough information to be capable of retrieving it from disk ... depending on what you're writing, that could be quite a bit less than the entire buffer of data you were intending to write to disk, and therefore the actual writing of all the data will take place at a later time, after the syscall has been completed and the process has moved on to something else.

这篇关于并发使用多线程写入一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆