调用close()后,大文件没有立即刷新到磁盘? [英] Large file not flushed to disk immediately after calling close()?
问题描述
我正在使用我的python脚本创建大文件(超过 1GB
,实际上有8个)。在我创建它们之后,我必须创建将使用这些文件的进程。
脚本如下所示:
#这是一个更复杂的函数,但它基本上是这样的:
def use_file():
subprocess.call(['C:\\use_file' ,'C:\\foo.txt']);
f = open('C:\\\!!'','wb')
for i in 10000:
f.write(one_MB_chunk)
f.flush()
os.fsync(f.fileno())
f.close()
time.sleep(5)#With this line添加它只是工作正常
t = threading.Thread(target = use_file)
t.start()
但是应用程序 use_file
的行为类似于 foo.txt
为空。有一些奇怪的事情发生:
- 如果我执行
C:\use_file C:\ foo。控制台中的txt
(脚本完成后)我得到了正确的结果 - 如果我手动执行
use_file()
在另一个python控制台中,我得到了正确的结果 -
C:\ foo.txt
在open()
被调用,但是在脚本结束之前仍然是0B
- 如果我添加
time.sleep(5)
它只是按预期开始工作(或者更确切地说是必需的)
我已经找到:
-
os.fsync()
但它似乎不起作用(来自<$ c) $ c> use_file 就好像C:\ foo.txt
是空的) - 使用
buffering =(1<<< 20)
(打开文件时)似乎无法正常工作
我我越来越好奇这种行为。
问题:
- python fork
close()
操作到后台吗?这记录在哪里? - 如何解决这个问题?
- 我错过了什么?
- 添加<$ c $后c> sleep :这是一个windows / python错误吗?
注意:(对于另一方出现问题的情况)应用程序 use_data
使用:
'pre>
处理的CreateFile =( foo.txt的,GENERIC_READ,FILE_SHARE_READ,NULL,
OPEN_EXISTING,0,NULL);
size = GetFileSize(handle,NULL)
然后处理 size
来自 foo.txt
的字节。
f.close()
调用 f.flush()
,它将数据发送到操作系统。 不必须将数据写入磁盘,因为操作系统会缓冲它。正确的解决方案是,如果要强制操作系统将其写入磁盘,则需要 os.fsync()
。
您是否考虑过将数据直接输入 use_file
?
编辑:你说 os.fsync()
'不起作用'。澄清,如果你这样做
f = open(...)
#将数据写入f
f.flush()
os.fsync(f.fileno())
f.close()
import pdb; pdb.set_trace()
然后查看磁盘上的文件,是否有数据?
I'm creating large file with my python script (more than 1GB
, actually there's 8 of them). Right after I create them I have to create process that will use those files.
The script looks like:
# This is more complex function, but it basically does this:
def use_file():
subprocess.call(['C:\\use_file', 'C:\\foo.txt']);
f = open( 'C:\\foo.txt', 'wb')
for i in 10000:
f.write( one_MB_chunk)
f.flush()
os.fsync( f.fileno())
f.close()
time.sleep(5) # With this line added it just works fine
t = threading.Thread( target=use_file)
t.start()
But application use_file
acts like foo.txt
is empty. There are some weird things going on:
- if I execute
C:\use_file C:\foo.txt
in console (after script finished) I get correct results - if I execute manually
use_file()
in another python console I get correct results C:\foo.txt
is visible on disk right afteropen()
was called, but remains size0B
until the end of script- if I add
time.sleep(5)
it just starts working as expected (or rather required)
I've already found:
os.fsync()
but it doesn't seem to work (result fromuse_file
is as ifC:\foo.txt
was empty)- Using
buffering=(1<<20)
(when opening file) doesn't seem to work either
I'm more and more curious about this behaviour.
Questions:
- Does python fork
close()
operation into background? Where is this documented? - How to work this around?
- Am I missing something?
- After adding
sleep
: is that a windows/python bug?
Notes: (for the case that there's something wrong with the other side) application use_data
uses:
handle = CreateFile("foo.txt", GENERIC_READ, FILE_SHARE_READ, NULL,
OPEN_EXISTING, 0, NULL);
size = GetFileSize(handle, NULL)
And then processes size
bytes from foo.txt
.
f.close()
calls f.flush()
, which sends the data to the OS. That doesn't necessarily write the data to disk, because the OS buffers it. As you rightly worked out, if you want to force the OS to write it to disk, you need to os.fsync()
.
Have you considered just piping the data directly into use_file
?
EDIT: you say that os.fsync()
'doesn't work'. To clarify, if you do
f = open(...)
# write data to f
f.flush()
os.fsync(f.fileno())
f.close()
import pdb; pdb.set_trace()
and then look at the file on disk, does it have data?
这篇关于调用close()后,大文件没有立即刷新到磁盘?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!