f.seek()在Python中的复杂性 [英] Complexity of f.seek() in Python

查看:153
本文介绍了f.seek()在Python中的复杂性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

f.seek(500000,0)在到达第500000位数之前是否要遍历文件的所有前499999个字符? 换句话说,f.seek(n,0)是O(n)还是O(1)?

Does f.seek(500000,0) go through all the first 499999 characters of the file before getting to the 500000th? In other words, is f.seek(n,0) of order O(n) or O(1)?

推荐答案

您需要更具体地说明对象f是什么类型.

You need to be a bit more specific on what type of object f is.

如果f是正常的 io模块对象,文件存储在磁盘上,您必须确定是否要处理:

If f is a normal io module object for a file stored on disk, you have to determine if you are dealing with:

  • 原始二进制文件对象
  • 缓冲对象,包装原始二进制文件
  • TextIO对象,包装缓冲区
  • 内存BytesIOTextIO对象
  • The raw binary file object
  • A buffer object, wrapping the raw binary file
  • A TextIO object, wrapping the buffer
  • An in-memory BytesIO or TextIO object

第一个选项仅使用 lseek系统调用重新定位文件描述符的位置.如果此调用为O(1),则取决于操作系统以及所拥有的文件系统类型.对于具有ext4文件系统的Linux系统, lseek为O(1).

The first option just uses the lseek system call to reposition the file descriptor position. If this call is O(1) depends on the OS and what kind of file system you have. For a Linux system with ext4 filesystem, lseek is O(1).

缓冲区仅清除缓冲区如果您要查找目标位于当前缓冲区域之外并读取新的缓冲数据.也是O(1),但是固定成本更高.

Buffers just clear the buffer if your seek target is outside of the current buffered region and read in new buffer data. That's O(1) too, but the fixed cost is higher.

对于文本文件,事情变得更加复杂,因为字节长度可变的编解码器和行尾转换意味着您无法始终将二进制流位置映射到文本位置,而无需从头开始进行扫描.该实现不允许非零的当前位置或相对终点搜索,并且最好是最大程度地减少为绝对搜索读取的数据量. 与文本解码器共享的内部状态跟踪最近的安全点"以回溯并向前读取到所需位置.最坏的情况是O(n).

For text files, things are more complicated as variable-byte-length codecs and line-ending translation mean you can't always map the binary stream position to a text position without scanning from the start. The implementation doesn't allow for non-zero current-position- or end-relative seeks, and does it's best to minimise how much data is read for absolute seeks. Internal state shared with the text decoder tracks a recent 'safe point' to seek back to and read forward to the desired position. Worst-case this is O(n).

内存文件对象实际上只是可寻址的长数组.求是O(1),因为您可以更改当前位置指针值.

The in-memory file objects are just long, addressable arrays really. Seeking is O(1) because you can just alter the current position pointer value.

还有许多其他文件状对象,它们可能支持也可能不支持搜索.他们如何处理搜寻取决于实现方式.

There are legion other file-like objects that may or may not support seeking. How they handle seeking is implementation dependent.

  • zipfile模块支持在打开的zip文件中进行搜索在只读模式下,要找到位于当前缓冲区所覆盖的数据段之前的点,需要对数据进行完全重新读取和解压缩,直到达到所需的点为止;而寻找之后,则需要从当前位置进行读取,直到到达新的. gzip lzma

  • The zipfile module supports seeking on zip files opened in read-only mode, and seeking to a point that lies before the data section covered by the current buffer requires a full re-read and decompression of the data up to the desired point, seeking after requires reading from the current position until you reach the new. The gzip, lzma and bz2 modules all use the same shared implementation, that also starts reading from the start if you seek to a point before the current read position (and there's no larger buffer to avoid this).

chunk模块允许在块边界内搜索并委托给基础对象.如果基础文件查找操作为O(1),则此操作为O(1).

The chunk module allows seeking within the chunk boundaries and delegates to the underlying object. This is an O(1) operation if the underlying file seek operation is O(1).

等等.因此,这取决于.

这篇关于f.seek()在Python中的复杂性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆