有什么办法可以找到文件对象的缓冲区大小 [英] Is there any way to find the buffer size of a file object

查看:76
本文介绍了有什么办法可以找到文件对象的缓冲区大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试映射"一个很大的ascii文件.基本上,我会读行直到找到一个特定的标签,然后才想知道该标签的位置,以便稍后可以再次查找它以提取关联的数据.

I'm trying to "map" a very large ascii file. Basically I read lines until I find a certain tag and then I want to know the position of that tag so that I can seek to it again later to pull out the associated data.

from itertools import dropwhile
with open(datafile) as fin:
    ifin = dropwhile(lambda x:not x.startswith('Foo'), fin)
    header = next(ifin)
    position = fin.tell()

现在,此tell不能给我正确的位置.以前已经以各种形式提出了这个问题.原因大概是因为python正在缓冲文件对象.因此,python告诉我它的文件指针在哪里,而不是我的文件指针在哪里. 我不想关闭此缓冲 ...这里的性能很重要.但是,很高兴知道是否有一种方法可以确定python选择缓冲多少字节.在我的实际应用程序中,只要我关闭以Foo开头的行,就没有关系.我可以在这里和那里放几行.所以,我实际上打算做的事情是这样的:

Now this tell doesn't give me the right position. This question has been asked in various forms before. The reason is presumably because python is buffering the file object. So, python is telling me where it's file-pointer is, not where my file pointer is. I don't want to turn off this buffering ... The performance here is important. However, it would be nice to know if there is a way to determine how many bytes python chooses to buffer. In my actual application, as long as I'm close the the lines which start with Foo, it doesn't matter. I can drop a few lines here and there. So, what I'm actually planning on doing is something like:

position = fin.tell() - buffer_size(fin)

有什么办法可以找到缓冲区大小吗?

Is there any way to go about finding the buffer size?

推荐答案

在我看来,缓冲区大小为

To me, it looks like the buffer size is hard-coded in Cpython to be 8192. As far as I can tell, there is no way to get this number from the python interface other than to read a single line when you open the file, do a f.tell() to figure out how much data python actually read and then seek back to the start of the file before continuing.

with open(datafile) as fin:
    next(fin)
    bufsize = fin.tell()
    fin.seek(0)

    ifin = dropwhile(lambda x:not x.startswith('Foo'), fin)
    header = next(ifin)
    position = fin.tell()

当然,如果第一行的长度大于8192字节,则此操作将失败,但这对我的应用程序没有任何实际影响.

Of course, this fails in the event that the first line is longer than 8192 bytes long, but that's not of any real consequence for my application.

这篇关于有什么办法可以找到文件对象的缓冲区大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆