O_DIRECT到底是什么意思? [英] What does O_DIRECT really mean?
问题描述
如果我打开带有O_DIRECT
标志的文件,是否表示每当对该文件进行写(阻塞模式)返回时,数据就在磁盘上了?
If I open a file with O_DIRECT
flag, does it mean that whenever a write(blocking mode) to that file returns, the data is on disk?
推荐答案
(此答案与Linux有关-其他OS可能有不同的警告/语义)
(This answer pertains to Linux - other OSes may have different caveats/semantics)
让我们从子问题开始:
如果我打开带有O_DIRECT标志的文件,是否意味着每当对该文件进行写(阻塞模式)返回时,数据就在磁盘上了?
If I open a file with O_DIRECT flag, does it mean that whenever a write(blocking mode) to that file returns, the data is on disk?
否(如 @ michael-foukarakis评论)-如果您需要保证您的数据已存储到非易失性存储中,则必须使用/添加其他内容.
No (as @michael-foukarakis commented) - if you need a guarantee your data made it to non-volatile storage you must use/add something else.
O_DIRECT到底是什么意思?
What does O_DIRECT really mean?
这是一个提示,您希望您的I/O绕过 Linux内核的缓存.实际发生的情况取决于诸如此类的东西:
It's a hint that you want your I/O to bypass the Linux kernel's caches. What will actually happen depends on things like:
- 磁盘配置
- 您是要在文件系统中打开块设备还是文件
- 如果在文件系统中使用文件
- 使用的确切文件系统以及文件系统和文件上使用的选项
- 您是否已正确对齐I/O
- 文件系统是否必须进行新的块分配才能满足您的I/O
- Disk configuration
- Whether you are opening a block device or a file in a filesystem
- If using a file within a filesystem
- The exact filesystem used and the options in use on the filesystem and the file
- Whether you've correctly aligned your I/O
- Whether a filesystem has to do a new block allocation to satisfy your I/O
上面的列表并不详尽.
在最佳"列表中在这种情况下,设置
O_DIRECT
将避免在传输数据时制作多余的数据副本,并且通话将在传输完成后返回.在这种情况下,您更可能直接打开真实"的块设备.本地磁盘.如前所述,即使此属性也不能保证write()
呼叫的数据将在突然断电后仍然存在.如果将数据从DMA DMA出到非易失性存储(例如,电池供电的RAID控制器)中,或者RAM本身是持久性存储,那么您可以保证数据位于稳定的存储中(并且可以承受掉电的影响)但是您需要对硬件堆栈进行限定,这样才能大体上不能承担这个责任.In the "best" case, setting
O_DIRECT
will avoid making extra copies of data while transferring it and the call will return after transfer is complete. You are more likely to be in this case when directly opening block devices of "real" local disks. As previously stated, even this property doesn't guarantee that data of awrite()
call will survive sudden power loss. IF the data is DMA'd out of RAM to non-volatile storage (e.g. battery backed RAID controller) or the RAM itself is persistent storage THEN you may have have a guarantee that the data is on stable storage (and can survive power loss) but you would need to qualify your hardware stack so you can't assume this in general.在最坏"状态在这种情况下,即使未设置
O_DIRECT
并且随后的调用成功",O_DIRECT
也完全没有任何意义.有时,Linux存储堆栈中的某些内容(例如某些文件系统设置)可以选择忽略它,因为它们必须执行的操作或因为您不满足要求(合法)而只是默默地执行而是使用缓冲的I/O(即写入缓冲区/从已缓冲的数据中令人满意地读取).目前尚不清楚是否会付出额外的努力来确保已确认写入的数据至少与设备一起使用(但在O_DIRECT
和障碍线程中,Christoph Hellwig发表了In the "worst" case,
O_DIRECT
can mean nothing at all even though setting it wasn't rejected and subsequent calls "succeed". Sometimes things in the Linux storage stack (like certain filesystem setups) can choose to ignore it because of what they have to do or because you didn't satisfy the requirements (which is legal) and just silently do buffered I/O instead (i.e. write to a buffer/satisfy read from already buffered data). It is unclear whether extra effort will be made to ensure that the data of an acknowledged write was at least with the device (but in theO_DIRECT
and barriers thread Christoph Hellwig posts that theO_DIRECT
fallback will ensure data has at least been sent to the device). A further complication is that usingO_DIRECT
implies nothing about file metadata so even if write data is "with the disk" by call completion, key file metadata (like the size of the file because you were doing an append) may not be. Thus you may not actually be able to get at the data you thought had been transferred after a crash (it may appear truncated, or all zeros etc).虽然简短的测试可以使它看起来像仅使用
O_DIRECT
的数据始终意味着写入返回后数据将在磁盘上,但是更改某些内容(例如使用Ext4文件系统而不是XFS)可能会削弱实际的实现效果方式.While brief testing can make it look like data using
O_DIRECT
alone always implies data will be on disk after a write returns, changing things (e.g. using an Ext4 filesystem instead of XFS) can weaken what is actually achieved in very drastic ways.正如您提到的保证数据", (而不是元数据),也许您正在寻找
O_DSYNC
/fdatasync()
?如果要保证也写入了元数据,则必须查看O_SYNC
/fsync()
.As you mention "guarantee that the data" (rather than metadata) perhaps you're looking for
O_DSYNC
/fdatasync()
? If you want to guarantee metadata was written too, you will have to look atO_SYNC
/fsync()
.- Ext4 Wiki:澄清直接IO的语义.还包含有关
O_DIRECT
在一些非Linux操作系统上的作用的说明. - "[PATCH 1/1 linux-next] ext4:将兼容性标志检查添加到补丁" LKML线程收到了Ext4首席开发人员Ted Ts'o的回信,他谈到文件系统如何回退到
O_DIRECT
的缓冲I/O,而不是失败open()
调用. - 在"ubifs:允许O_DIRECT"中, LKML线程Btrfs首席开发人员Chris Mason指出 Btrfs诉诸于缓冲的I/O在压缩文件上请求
O_DIRECT
时. - Linux上的ZFS提交消息讨论了在不同情况下
O_DIRECT
的语义 .另请参阅(在撰写本文时2020年中) Linux上ZFS的新O_DIRECT
语义(交互非常复杂,没有简要说明). - Linux open(2)手册页(搜索说明部分中的
O_DIRECT
和注释部分) - 确保数据到达磁盘 LWN文章
- 臭名昭著的 Linus Torvalds O_DIRECT LKML线程摘要(有关更多上下文,您可以查看完整的LKML线程)
- Ext4 Wiki: Clarifying Direct IO's Semantics. Also contains notes about what
O_DIRECT
does on a few non-Linux OSes. - The "[PATCH 1/1 linux-next] ext4: add compatibility flag check to the patch" LKML thread has a reply from Ext4 lead dev Ted Ts'o talking about how filesystems can fallback to buffered I/O for
O_DIRECT
rather than failing theopen()
call. - In the "ubifs: Allow O_DIRECT" LKML thread Btrfs lead developer Chris Mason states Btrfs resorts to buffered I/O when
O_DIRECT
is requested on compressed files. - ZFS on Linux commit message discussing the semantics of
O_DIRECT
in different scenarios. Also see the (at the time of writing mid-2020) proposed newO_DIRECT
semantics for ZFS on Linux (the interactions are complex and defy a brief explanation). - Linux open(2) man page (search for
O_DIRECT
in the Description section and the Notes section) - Ensuring data reaches disk LWN article
- Infamous Linus Torvalds O_DIRECT LKML thread summary (for even more context you can see the full LKML thread)
这篇关于O_DIRECT到底是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!