Python在任意位置就地写入文件 [英] Python in-place write to file at arbitrary position

查看:568
本文介绍了Python在任意位置就地写入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在python中就地编辑文本文件。它非常大(因此无法将其加载到内存中)。我打算替换我在其中找到的逐字节字符串。

I'm trying to edit a text file in-place in python. It is very large (so loading it into memory is not an option). I intend to replace byte-for-byte strings I find inside.

with f as open("filename.txt", "r+b"):
    if f.read(8) == "01234567":
        f.seek(-8, 1)
        f.write("87654321")

但是,当我尝试写入时,write()操作会添加到文件的末尾:

However, the write() operation adds onto the end of the file when I tried it:

>>> n.read()
'sdf'
>>> n.read(1)
''
>>> n.seek(0,0)
>>> n.read(1)
's'
>>> n.read(1)
'd'
>>> n.write("sdf")
>>> n.read(1)
''
>>> n.seek(0,0)
>>> n.read()
'sdfsdf'
`

我想要结果

推荐答案

原始ANSI / ISO C是其中的 sdsdf 。当将读写模式流从读取模式切换到写入模式时,标准要求进行寻道操作,反之亦然。该限制仍然存在,例如, n1570 包括以下文本:

The original ANSI / ISO C standards required a seek operation when switching a read-write mode stream from read mode to write mode, and vice versa. This restriction persists, e.g., n1570 includes this text:


以更新模式打开文件时('+'作为上面的第二个或第三个字符模式参数值的列表),输入和输出都可以在关联的流上执行。但是,在没有中间调用 fflush 函数或文件定位函数( fseek fsetpos rewind ),并且在没有中间调用a的情况下,输入后不能直接跟随输出文件定位功能,除非输入操作遇到文件结尾。在某些实现中,以更新模式打开(或创建)文本文件可能会打开(或创建)二进制流。

When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.

为什么此限制已导入到Python, 1 中,即使Python包装器有可能自动处理它也是如此。

For whatever reason this restriction has been imported into Python,1 even though it would be possible for the Python wrappers to handle it automatically.

值得一提的是,最初的ANSI C限制的原因是在许多基于Unix的系统上发现的低预算实现:它们为每个流保留当前字节数和当前指针。如果宏化的 getc putc 操作必须调用基础实现,则当前字节数为0。检查是否在更新模式下打开了流,并根据需要进行切换。但是一旦成功获得一个字符,计数器将保存可以继续从基础流中读取的字符数。并且一旦成功编写了一个字符,计数器就会保存允许添加字符的缓冲区位置数。

For what it's worth, the reason for the original ANSI C restriction was the low-budget implementation found on many Unix-based systems: they kept, for each stream, a "current byte count" and "current pointer". The current byte count was 0 if the macro-ized getc and putc operations had to call into underlying implementation, which could check whether a stream was opened in update mode and switch it as needed. But once you successfully obtained a character, the counter would hold the number of characters that could continue to be read from the underlying stream; and once you successfully wrote a character, the counter would hold the number of buffer-locations that allowed adding characters.

这意味着如果您成功完成了 getc 填充了一个内部缓冲区,但后跟一个 putc ,它是 putc 只会覆盖缓冲的数据。如果您成功执行了 putc ,但是执行不佳的 getc ,则会看到未设置的价值

This meant that if you did a successful getc that filled an internal buffer, but followed it by a putc, the "written" character from putc would simply overwrite the buffered data. If you had a successful putc but followed with a poorly-implemented getc, you would see un-set value out of the buffer.

这个问题很难解决(只需提供单独的输入和输出计数器,其中一个始终为零,并具有实现缓冲区填充检查的功能)

This problem was trivial to fix (just provide separate input and output counters, one of which is always zero, and have the functions that implement buffer-refill check for mode-switch as well).

1 需要引用:-)

这篇关于Python在任意位置就地写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆