调用close()后,大文件没有立即刷新到磁盘? [英] Large file not flushed to disk immediately after calling close()?

查看:575
本文介绍了调用close()后,大文件没有立即刷新到磁盘?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用我的python脚本创建大文件(超过 1GB ,实际上有8个)。在我创建它们之后,我必须创建将使用这些文件的进程。



脚本如下所示:

 #这是一个更复杂的函数,但它基本上是这样的:
def use_file():
subprocess.call(['C:\\use_file' ,'C:\\foo.txt']);


f = open('C:\\\!!'','wb')
for i in 10000:
f.write(one_MB_chunk)
f.flush()
os.fsync(f.fileno())
f.close()

time.sleep(5)#With this line添加它只是工作正常

t = threading.Thread(target = use_file)
t.start()

但是应用程序 use_file 的行为类似于 foo.txt 为空。有一些奇怪的事情发生:




  • 如果我执行 C:\use_file C:\ foo。控制台中的txt (脚本完成后)我得到了正确的结果

  • 如果我手动执行 use_file()在另一个python控制台中,我得到了正确的结果

  • C:\ foo.txt open()被调用,但是在脚本结束之前仍然是 0B

  • 如果我添加 time.sleep(5)它只是按预期开始工作(或者更确切地说是必需的)



我已经找到:




  • os.fsync() 但它似乎不起作用(来自<$ c) $ c> use_file 就好像 C:\ foo.txt 是空的)

  • 使用 buffering =(1<<< 20)(打开文件时)似乎无法正常工作



我我越来越好奇这种行为。



问题:




  • python fork close()操作到后台吗?这记录在哪里?

  • 如何解决这个问题?

  • 我错过了什么?

  • 添加<$ c $后c> sleep :这是一个windows / python错误吗?






注意:(对于另一方出现问题的情况)应用程序 use_data 使用:



'pre> 处理的CreateFile =( foo.txt的,GENERIC_READ,FILE_SHARE_READ,NULL,
OPEN_EXISTING,0,NULL);
size = GetFileSize(handle,NULL)

然后处理 size 来自 foo.txt 的字节。

解决方案

f.close()调用 f.flush(),它将数据发送到操作系统。 必须将数据写入磁盘,因为操作系统会缓冲它。正确的解决方案是,如果要强制操作系统将其写入磁盘,则需要 os.fsync()



您是否考虑过将数据直接输入 use_file






编辑:你说 os.fsync()'不起作用'。澄清,如果你这样做

  f = open(...)
#将数据写入f
f.flush()
os.fsync(f.fileno())
f.close()

import pdb; pdb.set_trace()

然后查看磁盘上的文件,是否有数据?


I'm creating large file with my python script (more than 1GB, actually there's 8 of them). Right after I create them I have to create process that will use those files.

The script looks like:

# This is more complex function, but it basically does this:
def use_file():
    subprocess.call(['C:\\use_file', 'C:\\foo.txt']);


f = open( 'C:\\foo.txt', 'wb')
for i in 10000:
    f.write( one_MB_chunk)
f.flush()
os.fsync( f.fileno())
f.close()

time.sleep(5) # With this line added it just works fine

t = threading.Thread( target=use_file)
t.start()

But application use_file acts like foo.txt is empty. There are some weird things going on:

  • if I execute C:\use_file C:\foo.txt in console (after script finished) I get correct results
  • if I execute manually use_file() in another python console I get correct results
  • C:\foo.txt is visible on disk right after open() was called, but remains size 0B until the end of script
  • if I add time.sleep(5) it just starts working as expected (or rather required)

I've already found:

  • os.fsync() but it doesn't seem to work (result from use_file is as if C:\foo.txt was empty)
  • Using buffering=(1<<20) (when opening file) doesn't seem to work either

I'm more and more curious about this behaviour.

Questions:

  • Does python fork close() operation into background? Where is this documented?
  • How to work this around?
  • Am I missing something?
  • After adding sleep: is that a windows/python bug?

Notes: (for the case that there's something wrong with the other side) application use_data uses:

handle = CreateFile("foo.txt", GENERIC_READ, FILE_SHARE_READ, NULL,
                               OPEN_EXISTING, 0, NULL);
size = GetFileSize(handle, NULL)

And then processes size bytes from foo.txt.

解决方案

f.close() calls f.flush(), which sends the data to the OS. That doesn't necessarily write the data to disk, because the OS buffers it. As you rightly worked out, if you want to force the OS to write it to disk, you need to os.fsync().

Have you considered just piping the data directly into use_file?


EDIT: you say that os.fsync() 'doesn't work'. To clarify, if you do

f = open(...)
# write data to f
f.flush()
os.fsync(f.fileno())
f.close()

import pdb; pdb.set_trace()

and then look at the file on disk, does it have data?

这篇关于调用close()后,大文件没有立即刷新到磁盘?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆